Tracing and Debugging
Connect agent runs, tool calls, retrieval evidence, product logs, and eval scores.
Tracing gives you the evidence needed to debug an agent harness: prompt runs, model generations, tool calls, errors, usage, retrieval evidence, trace metadata, and eval scores.
Use traces for runtime debugging. Use product logs and audit records for product accountability.
Scenario
A support answer was wrong. You need to know which prompt ran, which user and conversation it belonged to, which tools were available, which retrieval documents were injected, what the model returned, and whether the same case fails in evals.
When to Use It
Use this pattern for every production harness that handles real users, private data, side effects, or recurring quality checks.
Architecture Shape
| Layer | Responsibility |
|---|---|
| observer | records runs, tool calls, model calls, errors, and usage |
.withTrace(...) | attaches workflow name, user id, session id, tags, version, and safe metadata |
| product logs | store trace ids next to product events |
| Studio | inspect local runs, tools, MCPs, context, approvals, and traces |
| external tracing | long-term search, metrics, dashboards, eval scores |
| shutdown | flush and close buffered telemetry |
Code Example
import { AgentBuilder } from "@anvia/core";
import { langfuse } from "@anvia/langfuse";
const tracing = langfuse.create({
publicKey,
secretKey,
baseUrl,
});
const agent = new AgentBuilder("support", model)
.instructions("Answer support questions clearly.")
.observe(tracing)
.defaultMaxTurns(3)
.build();
const response = await agent
.prompt(message)
.withTrace({
name: "support-chat",
userId: user.id,
sessionId: conversationId,
tags: ["support", channel],
version: "2026-05-11",
metadata: {
tenantId: user.tenantId,
conversationId,
plan: user.plan,
},
})
.send();
logger.info({
traceId: response.trace?.traceId,
observationId: response.trace?.observationId,
conversationId,
}, "support agent completed");Trace Metadata Rules
| Include | Avoid |
|---|---|
| stable workflow name | raw secrets or API keys |
| user id when safe | full prompts in app logs |
| tenant id | large records |
| conversation, ticket, or job id | private document bodies in metadata |
| model or prompt version | unbounded objects |
| channel, route, or feature flag | values your policy forbids storing |
Trace metadata should connect systems. It should not become a second database.
Debugging Flow
- Find the product event, ticket, conversation, or eval case.
- Open the linked trace id.
- Check prompt input, instructions, retrieved context, and available tools.
- Check tool calls, tool outputs, errors, and turn count.
- Reproduce with the same case in Studio or an eval suite.
- Fix the deterministic boundary first: tool, retrieval, runner, or prompt.
- Add an eval case when the issue is model-dependent.
Local Studio vs External Tracing
| Use Studio | Use external tracing |
|---|---|
| local development | production traffic |
| inspect agent registration | long-term trace search |
| exercise approvals and questions | aggregate cost and usage |
| debug context and MCP visibility | dashboards and alerting |
| iterate before product UI exists | eval score reporting |
Studio and tracing are complementary. Studio shortens local iteration; tracing preserves production evidence.
Flush and Shutdown
await tracing.flush?.();
await tracing.shutdown?.();Call flush() before process exit when pending events matter. Call shutdown() during application shutdown for long-lived integrations.
Failure Modes
| Failure | Fix |
|---|---|
| traces cannot be tied to product events | log traceId and product ids together |
| trace metadata leaks sensitive data | restrict metadata to ids and small safe fields |
| eval scores are disconnected | report evals with trace ids or case metadata |
| tool failures are invisible | attach observer before building production agents |
| buffered traces are missing | flush or shutdown the observer on exit |
Test Checklist
- Assert runners attach stable trace names.
- Assert safe metadata includes product correlation ids.
- Verify
response.traceis logged or returned where needed. - Inspect one local Studio run for context, tools, and trace evidence.
- Verify external observer flushes during shutdown.
