All posts
DEEP DIVE6 min read

Traces, spans, and generations: the LLM data model

The Currai team, EngineeringMay 6, 2026

Most confusion about instrumenting an LLM app comes from not knowing which noun to reach for. Currai's data model is deliberately small — a trace that contains spans and generations — and once the three click, deciding what to wrap becomes obvious.

The trace is the root

A trace is one logical unit of work from your product's point of view: answering a question, running an agent turn, completing a chat exchange. It has a name, an optional user and session, tags, and metadata. Everything else nests inside it.

trace = currai.trace(
    name="support-answer",
    user_id="user-1",
    session_id="sess-42",
    tags=["support", "production"],
)

Rule of thumb: a trace is the thing you'd want to find later when a user says "this specific request went wrong."

A generation is a single model call

A generation is exactly one call to a model. It records the input messages, the output completion, the model name and parameters, the timing, and the token usage. If your code calls an LLM, that call is a generation — no more, no less.

gen = trace.generation(
    name="openai.chat",
    model="gpt-4o-mini",
    model_parameters={"temperature": 0.2, "max_tokens": 512},
    input=messages,
)
gen.end(output=reply, usage={"input": 312, "output": 88, "unit": "TOKENS"})

Splitting generations finely pays off: two model calls in one trace should be two generations, so you can see which one was slow or expensive.

A span is everything else

Retrieval, tool calls, parsing, a guardrail check — any non-model step worth timing is a span. Spans and generations are siblings; both nest under the trace, and both can nest inside each other for deep agent loops.

retrieval = trace.span(name="retrieve-docs", input={"query": question})
docs = vector_store.search(question, k=4)
retrieval.end(output={"doc_ids": [d.id for d in docs]})

The shape mirrors your code

The reason this model works is that it maps onto how the code already reads. A function that orchestrates a request is a trace; a function that calls a model is a generation; a function that does I/O is a span. You're not inventing a parallel structure — you're annotating the one you have.

Why a small model is the point

You could imagine a dozen specialized event types, but they'd just be spans with extra names. Keeping the vocabulary to three means every tool, dashboard, and query in the system speaks the same language — and your instrumentation stays legible to the next engineer who has to read it at 2am.