@anvia/langfuse

Langfuse tracing and scoring for Anvia agents.

@anvia/langfuse sends agent lifecycle events (runs, generations, tool calls, usage, errors) to Langfuse. It also supports scoring traces and reporting eval outcomes.

Install

pnpm add @anvia/langfuse

The package uses OpenTelemetry under the hood via @langfuse/otel and @opentelemetry/sdk-node.

Quick Start

import { AgentBuilder } from "@anvia/core";
import { OpenAIClient } from "@anvia/openai";
import { langfuse } from "@anvia/langfuse";

const tracing = langfuse.create({
  publicKey,
  secretKey,
  baseUrl,       // optional, defaults to https://cloud.langfuse.com
  environment,   // optional
  release,       // optional
});

const client = new OpenAIClient({ apiKey });
const agent = new AgentBuilder("support", client.completionModel())
  .instructions("Answer support questions.")
  .observe(tracing)
  .build();

const response = await agent
  .prompt("How do I reset my password?")
  .withTrace({
    name: "support-question",
    userId: "user_123",
    sessionId: "session_456",
    metadata: { surface: "docs" },
    tags: ["support"],
  })
  .send();

Every run, model generation, tool call, usage metric, and error is now visible in Langfuse.

Configuration

type LangfuseTracingOptions = {
  publicKey?: string;    // Langfuse public key
  secretKey?: string;    // Langfuse secret key
  baseUrl?: string;      // Langfuse API URL (default: https://cloud.langfuse.com)
  environment?: string;  // deployment environment label
  release?: string;      // application release version
};

Trace Metadata

Attach metadata to individual requests using .withTrace(...):

const response = await agent
  .prompt("Summarize ticket TICKET-1001.")
  .withTrace({
    name: "support-ticket-summary",
    userId: "user_123",
    sessionId: "session_456",
    metadata: { ticketId: "TICKET-1001" },
    tags: ["support", "anvia"],
    version: "1.0.0",
  })
  .send();

console.log(response.trace?.traceId);

The traceId and observationId are available on response.trace after the run completes.

Scoring Traces

await tracing.score({
  traceId: response.trace?.traceId,
  name: "quality",
  value: 1,
  comment: "Good answer",
  metadata: { source: "human-review" },
});

Scoring requires publicKey, secretKey, and a trace ID.

Eval Reporting

Use createLangfuseEvalReporter(...) to publish eval outcomes as Langfuse scores:

import { agentEvalTarget, contains, runEvalSuite } from "@anvia/core/evals";
import { createLangfuseEvalReporter } from "@anvia/langfuse";

await runEvalSuite({
  name: "support-agent-regression",
  cases: [
    { id: "refund-window", input: "How long are refunds available?", expected: "30 days" },
  ],
  target: agentEvalTarget(agent),
  metrics: [contains()],
  reporters: [createLangfuseEvalReporter(tracing)],
});

The reporter uses the prompt response trace when available. You can also attach traceId and observationId to eval case metadata.

Lifecycle Events

The observer captures:

EventData
Run startagent name, prompt, history, max turns, trace metadata
Generation start/endturn number, model, tools, input/output, usage, latency
Tool start/endtool name, arguments, result, skipped status
Run endoutput, total usage, all messages
Run errorerror message, partial usage

Flushing and Shutdown

await tracing.flush();
await tracing.shutdown();

Use flush() after short-lived jobs (scripts, CI). Use shutdown() when the process is exiting. The OpenTelemetry SDK handles batching automatically for long-running servers.

Multi-Agent Tracing

When an agent uses another agent as a tool, the observer creates nested traces automatically. Child agent runs appear as sub-spans under the parent tool call.