Every AI Call
Can Teach You Something.

Trace LLM calls. Test prompts. Ship better AI.

Get Started

Currai has a 7 day free trial.

Data usage56.5 KB / 50.0 MB

Documentation

Billing

yasser

Admin · yasser's Individual Org

Traces

Every trace-create event lands here in real time.

Select trace

Trace name

Filters

No filters applied. Add one to narrow your traces.

Timestamp	Name	Level	Env	User	Session	Input	Tags	Latency	Obs	Tokens	Output	Cost
Jun 23, 03:18 PM	chat-turn	OK	production	user-184	sess-7b2	summarize refund policy	support, prod	842 ms	2	1.3k	The refund window is 30 days...	$0.0018
Jun 23, 03:16 PM	rag-answer	WARNING	production	user-092	sess-a19	compare enterprise plans	rag, pricing	2.4 s	4	3.8k	The Pro tier includes...	$0.0069
Jun 23, 03:14 PM	openai.chat.completions	OK	staging	qa-12	sess-41f	draft onboarding reply	email	690 ms	1	884	Welcome aboard. Here are...	$0.0009
Jun 23, 03:11 PM	support-agent	ERROR	production	user-318	sess-c03	change my billing email	agent, tool	5.8 s	5.1err	5.6k	tool timeout: crm.updateUser	$0.0112
Jun 23, 03:08 PM	eval-run	OK	production	system	eval-22	score answer relevance	evals	1.1 s	3	2.1k	score: 0.91	$0.0034

Ship quality AI at scale

Observability

Capture every LLM call, tool execution, and retrieval step in hierarchical traces. Filter by user, session, latency, cost, or custom metadata.

Evaluation

Evaluate outputs with LLM judges, custom heuristics, or human review. Run evaluations on production traffic or prompt experiments

Prompt Management

Manage prompts outside your codebase with one-click deployments and rollbacks. Collaborate on prompt improvements with your entire team

Everything You Need to Improve Your AI

From production traces to prompt experiments, Currai gives your team one place to measure, test, and improve every AI interaction.

When you need to

Understand why an AI response failed

Without Currai

“Search disconnected logs and try to reconstruct what happened.”

1Opening the complete production trace

2Following retrieval and tool calls

3Inspecting the prompt, model output, latency, and error

The retrieval step timed out after 4.2s, leaving the model without context. The failing span and its inputs are ready to inspect.

When you need to

Measure the quality of production responses

Without Currai

“Manually review a small sample and rely on intuition.”

1Selecting recent production traces

2Running LLM judges and heuristic evaluators

3Grouping low-scoring responses by failure reason

Evaluation complete: 91% passed. The 23 low-scoring responses are grouped and ready for review.

When you need to

Know which prompt performs better

Without Currai

“Deploy a new prompt and hope the results improve.”

1Splitting production traffic between prompt versions

2Measuring quality, latency, tokens, and cost

3Comparing results on real user requests

Version B improved quality by 18% and reduced token usage by 12%. It is ready to promote.

When you need to

Update a prompt without redeploying your app

Without Currai

“Edit hard-coded prompts, open a pull request, and redeploy.”

1Creating a new version in the prompt registry

2Previewing the compiled prompt with variables

3Publishing with a complete version history

Version 12 is live. Previous versions remain available for an instant rollback.

When you need to

Test a prompt before sending it to production

Without Currai

“Copy inputs between scripts and compare outputs manually.”

1Replaying representative production inputs

2Comparing prompts and models side by side

3Scoring every output with the same evaluators

The strongest prompt and model combination is identified using real inputs and consistent scores.

When you need to

Find what is making your AI slow or expensive

Without Currai

“See one total duration and a monthly provider bill.”

1Breaking down latency across generations and spans

2Comparing token usage and cost by model and prompt

3Filtering expensive traces by user, session, and environment

Repeated tool calls account for 31% of cost, while retrieval adds 5.1s to the slowest requests.

Launch, observe, improve — repeat.

(and better!)

Runner seamlessly integrates with the tools you already rely on, streamlining your workflow and ensuring that tasks are completed efficiently and effectively. It takes care of the details so you can focus on what truly matters.

Concurrent Runners

Run multiple tasks in parallel. Draft follow-ups while pulling analytics while updating your CRM. Receipts and timestamps for everything.

Local + Cloud

Works across your local machine and cloud services. Manages files, apps, and workflows wherever they live. Your data stays yours.

Memory Across Sessions

Runner remembers what matters across sessions: your contacts, your preferences, your unfinished work. Context that compounds over time.

Ship AI. Improve continuously

(and better!)

AI drifts and regresses silently. With patterns surfaced automatically, the best teams can evaluate against expectations and iterate continuously

Connects to the stack you already use

OpenaiModel Providers

MistralModel Providers

AnthropicModel Providers

GeminiModel Providers

PerplexityModel Providers

LovableNo Code

QwenModel Providers

CursorDeveloper Tools

Github CopilotDeveloper Tools

OpencodeDeveloper Tools

VscodeDeveloper Tools

ReplitNo Code

ExaOther

Vercel AI SDKFrameworks

TypescriptNative

PythonNative

Open TelemetryNative

BoltNo Code

CURRAI GUIDES

Learn what Currai makes possible

Start with practical posts on tracing, evals, prompts, cost, and agent workflows so you know what to instrument first.

TUTORIAL / 7 min read

The easiest way to add LLM observability to your AI app in 2026

The fastest useful LLM observability setup is one trace, one generation, and one flush. Currai gets you there without running collectors or rebuilding your app.

GUIDE / 6 min read

What is LLM observability?

LLM observability captures every prompt, completion, token, and tool call so you can explain what your model did and debug it faster.

TUTORIAL / 6 min read

Debug a slow RAG pipeline with nested traces

A RAG answer that takes four seconds could be slow retrieval, a fat prompt, or the model itself. Nested traces tell you which one — here's how to find the bottleneck.

GUIDE / 6 min read

Run LLM evals on production traces

Offline test sets go stale fast. Currai runs LLM-as-judge evals on real traced outputs, so you can compare prompt quality on live traffic.

GUIDE / 6 min read

Why you should A/B test your LLM prompts

A one-word prompt change can shift cost, latency, and quality. A/B testing prompt versions shows which wording wins in production.

Pricing that tracks real volume

Starter

Free to get started.

$0/mo

50 MB included

3-day retention

Drop-in Python & TypeScript SDKs
Full traces, tokens & cost in one view
Langfuse & OpenTelemetry compatible

Pro

For teams shipping to production.

$8/mo

2 GB included

14-day retention

Everything in Starter
Run evals and A/B test prompt versions in production
Sessions & users roll-ups
Cost, token & latency dashboards

Business

Higher volume, longer history.

$20/mo

4 GB included

30-day retention

Everything in Pro
Hosted ingestion, storage & dashboards
Priority support

Need a custom plan?
Higher volume, longer retention, or specific terms — tell us about your usage and we'll tailor a plan to fit.

Questions, answered

What is Currai?

Currai is an LLM observability and evaluation platform. It traces every prompt, token, and tool call your app makes so you can debug, measure, and ship with confidence — full traces, token usage, and cost in one view. It also supports LLM evals, prompt A/B testing, and OpenTelemetry/Langfuse-compatible ingestion.

Is Currai an LLM evaluation platform?

Yes. Beyond LLM observability, Currai is an AI evaluation platform: run LLM-as-a-judge and heuristic evals with your own evaluation metrics, A/B test and regression-test prompts, and score agent evaluation and multi-turn conversation testing on real production traces. Build golden datasets from those traces and wire evals into CI/CD to catch hallucinations and quality regressions before they ship.

What can Currai monitor in production?

Currai is an AI observability and LLM monitoring platform for production. You get LLM tracing across every prompt, token, and tool call, plus AI quality monitoring, hallucination detection, prompt drift detection, and latency and cost tracking — rolled up per trace, model, session, and user so you can watch quality and spend on live traffic.

How long does it take to get my first trace?

About five minutes. Install the SDK with pip install currai or npm i currai, paste your public / secret key pair, and wrap a single LLM call. There's no agent to deploy and no collector to run — the first request you make shows up in the dashboard right away.

Which languages and frameworks do you support?

Currai ships first-class Python and TypeScript SDKs, and it's byte-compatible with the Langfuse SDKs and OpenTelemetry. If you're already instrumented, you migrate by changing a single host line — your existing trace code, spans, and exporters keep working.

Can Currai replace Langfuse, Braintrust, or DeepEval?

Currai is an option for teams comparing Langfuse alternatives, Braintrust alternatives, or DeepEval alternatives for hosted LLM observability, production traces, evals, prompt A/B tests, token cost tracking, and OpenTelemetry-compatible ingestion. Test your exact workflow before switching.

Will tracing slow down my app?

No. Traces are batched and flushed in the background, so instrumentation never blocks a request. In short-lived processes you call flush() (or flush_async()) before exit to make sure everything is sent.

How is pricing calculated?

Usage is billed on the data Currai actually processes — measured in processed bytes — not on a flat per-trace fee. Large traces and tiny traces are priced for what they are, so your cost tracks real volume instead of row counts.

How long is my data retained?

Retention follows your plan, so you keep traces for as long as your plan's window allows. You can export or delete your data on demand at any time.

Do I have to run any infrastructure?

None. Currai hosts ingestion, columnar storage, and dashboards for you — there's no ClickHouse to babysit. It scales with your traffic so you watch data, not infrastructure.

Every AI Call
Can Teach You Something.

Traces