Sampling and PII redaction for production tracing
The Currai team, Engineering — Mar 24, 2026
Tracing in development is free and safe — low volume, fake data. Tracing in production introduces two constraints that change the calculus: prompts now contain real user data you may not be allowed to store, and traffic volume makes "capture everything" expensive. Sampling and redaction are how you keep tracing useful under both.
Redact before the trace leaves your process
The golden rule: sensitive data should never reach the backend in the first place. Scrub it at the source, in the same code that builds the trace, so what's stored is already safe.
Redact inputs and outputs — models happily echo back a phone number a user typed, so the completion is as sensitive as the prompt.
Keep the signal you actually debug with
Aggressive redaction can leave you with traces too scrubbed to be useful. The fix
is to redact the values, not the structure: replace the email with [email], but
keep the shape of the prompt, the token counts, the model, and the timing. You can
debug almost everything from structure and metadata without ever storing the raw
PII.
Sample for volume, but sample smart
At full traffic you rarely need every trace. Head sampling — keep 10% at random — cuts cost linearly but throws away rare failures. The better pattern is to bias the sample:
- Always keep errors and slow requests. They're rare and they're the whole point of tracing.
- Always keep a slice tagged for evals, so your quality dataset stays fresh.
- Down-sample the boring successful middle, where one trace looks much like the next.
Make the policy auditable
Whatever you choose, write it down and tag traces with the policy that captured them. When a privacy review asks "what user data do you store and for how long?" the answer should be a documented redaction rule plus a plan-based retention window — not an engineer's memory. Observability that can't survive an audit isn't observability you can keep running.
currai