BEST PRACTICES5 min read

Sampling and PII redaction for production tracing

The Currai team, Engineering — Mar 24, 2026

Tracing in development is free and safe — low volume, fake data. Tracing in production introduces two constraints that change the calculus: prompts now contain real user data you may not be allowed to store, and traffic volume makes "capture everything" expensive. Sampling and redaction are how you keep tracing useful under both.

Redact before the trace leaves your process

The golden rule: sensitive data should never reach the backend in the first place. Scrub it at the source, in the same code that builds the trace, so what's stored is already safe.

def redact(text: str) -> str:
    text = EMAIL_RE.sub("[email]", text)
    text = PHONE_RE.sub("[phone]", text)
    return text

gen = trace.generation(
    name="answer",
    model="gpt-4o-mini",
    input=[redact(m) for m in messages],
)
gen.end(output=redact(reply))

Redact inputs and outputs — models happily echo back a phone number a user typed, so the completion is as sensitive as the prompt.

Keep the signal you actually debug with

Aggressive redaction can leave you with traces too scrubbed to be useful. The fix is to redact the values, not the structure: replace the email with [email], but keep the shape of the prompt, the token counts, the model, and the timing. You can debug almost everything from structure and metadata without ever storing the raw PII.

Sample for volume, but sample smart

At full traffic you rarely need every trace. Head sampling — keep 10% at random — cuts cost linearly but throws away rare failures. The better pattern is to bias the sample:

Always keep errors and slow requests. They're rare and they're the whole point of tracing.
Always keep a slice tagged for evals, so your quality dataset stays fresh.
Down-sample the boring successful middle, where one trace looks much like the next.

Make the policy auditable

Whatever you choose, write it down and tag traces with the policy that captured them. When a privacy review asks "what user data do you store and for how long?" the answer should be a documented redaction rule plus a plan-based retention window — not an engineer's memory. Observability that can't survive an audit isn't observability you can keep running.