DEEP DIVE6 min read

Traces, spans, and generations: the LLM data model

The Currai team, Engineering — May 6, 2026

Most confusion about instrumenting an LLM app comes from not knowing which noun to reach for. Currai's data model is deliberately small — a trace that contains spans and generations — and once the three click, deciding what to wrap becomes obvious.

The trace is the root

A trace is one logical unit of work from your product's point of view: answering a question, running an agent turn, completing a chat exchange. It has a name, an optional user and session, tags, and metadata. Everything else nests inside it.

trace = currai.trace(
    name="support-answer",
    user_id="user-1",
    session_id="sess-42",
    tags=["support", "production"],
)

Rule of thumb: a trace is the thing you'd want to find later when a user says "this specific request went wrong."

A generation is a single model call

A generation is exactly one call to a model. It records the input messages, the output completion, the model name and parameters, the timing, and the token usage. If your code calls an LLM, that call is a generation — no more, no less.

gen = trace.generation(
    name="openai.chat",
    model="gpt-4o-mini",
    model_parameters={"temperature": 0.2, "max_tokens": 512},
    input=messages,
)
gen.end(output=reply, usage={"input": 312, "output": 88, "unit": "TOKENS"})

Splitting generations finely pays off: two model calls in one trace should be two generations, so you can see which one was slow or expensive.

A span is everything else

Retrieval, tool calls, parsing, a guardrail check — any non-model step worth timing is a span. Spans and generations are siblings; both nest under the trace, and both can nest inside each other for deep agent loops.

retrieval = trace.span(name="retrieve-docs", input={"query": question})
docs = vector_store.search(question, k=4)
retrieval.end(output={"doc_ids": [d.id for d in docs]})

The shape mirrors your code

The reason this model works is that it maps onto how the code already reads. A function that orchestrates a request is a trace; a function that calls a model is a generation; a function that does I/O is a span. You're not inventing a parallel structure — you're annotating the one you have.

Why a small model is the point

You could imagine a dozen specialized event types, but they'd just be spans with extra names. Keeping the vocabulary to three means every tool, dashboard, and query in the system speaks the same language — and your instrumentation stays legible to the next engineer who has to read it at 2am.