Quality and Observability

Tracing and Debugging

Connect agent runs, tool calls, retrieval evidence, product logs, and eval scores.

Tracing gives you the evidence needed to debug an agent harness: prompt runs, model generations, tool calls, errors, usage, retrieval evidence, trace metadata, and eval scores.

Use traces for runtime debugging. Use product logs and audit records for product accountability.

Scenario

A support answer was wrong. You need to know which prompt ran, which user and conversation it belonged to, which tools were available, which retrieval documents were injected, what the model returned, and whether the same case fails in evals.

When to Use It

Use this pattern for every production harness that handles real users, private data, side effects, or recurring quality checks.

Architecture Shape

LayerResponsibility
observerrecords runs, tool calls, model calls, errors, and usage
.withTrace(...)attaches workflow name, user id, session id, tags, version, and safe metadata
product logsstore trace ids next to product events
Studioinspect local runs, tools, MCPs, context, approvals, and traces
external tracinglong-term search, metrics, dashboards, eval scores
shutdownflush and close buffered telemetry

Code Example

import { AgentBuilder } from "@anvia/core";
import { langfuse } from "@anvia/langfuse";

const tracing = langfuse.create({
  publicKey,
  secretKey,
  baseUrl,
});

const agent = new AgentBuilder("support", model)
  .instructions("Answer support questions clearly.")
  .observe(tracing)
  .defaultMaxTurns(3)
  .build();

const response = await agent
  .prompt(message)
  .withTrace({
    name: "support-chat",
    userId: user.id,
    sessionId: conversationId,
    tags: ["support", channel],
    version: "2026-05-11",
    metadata: {
      tenantId: user.tenantId,
      conversationId,
      plan: user.plan,
    },
  })
  .send();

logger.info({
  traceId: response.trace?.traceId,
  observationId: response.trace?.observationId,
  conversationId,
}, "support agent completed");

Trace Metadata Rules

IncludeAvoid
stable workflow nameraw secrets or API keys
user id when safefull prompts in app logs
tenant idlarge records
conversation, ticket, or job idprivate document bodies in metadata
model or prompt versionunbounded objects
channel, route, or feature flagvalues your policy forbids storing

Trace metadata should connect systems. It should not become a second database.

Debugging Flow

  1. Find the product event, ticket, conversation, or eval case.
  2. Open the linked trace id.
  3. Check prompt input, instructions, retrieved context, and available tools.
  4. Check tool calls, tool outputs, errors, and turn count.
  5. Reproduce with the same case in Studio or an eval suite.
  6. Fix the deterministic boundary first: tool, retrieval, runner, or prompt.
  7. Add an eval case when the issue is model-dependent.

Local Studio vs External Tracing

Use StudioUse external tracing
local developmentproduction traffic
inspect agent registrationlong-term trace search
exercise approvals and questionsaggregate cost and usage
debug context and MCP visibilitydashboards and alerting
iterate before product UI existseval score reporting

Studio and tracing are complementary. Studio shortens local iteration; tracing preserves production evidence.

Flush and Shutdown

await tracing.flush?.();
await tracing.shutdown?.();

Call flush() before process exit when pending events matter. Call shutdown() during application shutdown for long-lived integrations.

Failure Modes

FailureFix
traces cannot be tied to product eventslog traceId and product ids together
trace metadata leaks sensitive datarestrict metadata to ids and small safe fields
eval scores are disconnectedreport evals with trace ids or case metadata
tool failures are invisibleattach observer before building production agents
buffered traces are missingflush or shutdown the observer on exit

Test Checklist

  • Assert runners attach stable trace names.
  • Assert safe metadata includes product correlation ids.
  • Verify response.trace is logged or returned where needed.
  • Inspect one local Studio run for context, tools, and trace evidence.
  • Verify external observer flushes during shutdown.