Operations

Production Readiness Checklist

Check an agent harness before it handles real users, data, and side effects.

Use this checklist before deploying an agent harness to production or expanding it from read-only behavior into product actions.

Runtime Boundaries

CheckReady when
stable agent idtraces, sessions, Studio, and logs use a stable id
runner boundaryroute or job calls a named runner instead of embedding all prompt logic inline
scoped toolsrequest-local user, tenant, services, and transactions are passed explicitly
turn limitsevery agent has .defaultMaxTurns(...), and high-risk runs override lower
timeoutsHTTP routes, jobs, and approval waits have app-owned timeout policy

Tools

CheckReady when
input schemasevery tool validates model arguments
output contractsimportant tool results are typed states or validated output schemas
permission checksevery data-bearing tool checks actor and tenant scope
side effectswrites use service-layer transactions, idempotency, and audit records
direct testshigh-risk tools are tested through direct calls or ToolSet.call(...)

Dynamic Tools

CheckReady when
catalog ownershipa ToolSet owns the full catalog
index lifecycletool index is built at startup, deploy time, or another explicit boundary
search testsrepresentative prompts select expected tool ids
critical toolsrequired tools are either static or reliably selected
trace reviewtraces show which dynamic tools were available during runs

MCP

CheckReady when
lifecyclerequired servers fail fast, optional servers degrade gracefully
tool inspectionrequired MCP tools are validated after connection
filteringagents receive only the MCP tools they should use
reconnectreconnect closes the previous server and updates future agents
cleanuplong-lived servers close during shutdown, scoped servers close in finally

Context and Memory

CheckReady when
instructionsdurable behavior is in instructions
request factscurrent user, tenant, plan, locale, or flags are loaded by the runner
retrieval filtersindexes or searches enforce tenant and access boundaries
historyconversation history is stored and replayed deliberately
session policyworkflows use explicit history or sessions deliberately, not casually mixed

Observability

CheckReady when
trace namesevery workflow uses stable names such as support-chat
metadatatraces include safe ids such as tenant, conversation, ticket, or channel
usageusage is available for cost and regression checks
external telemetryobservers or integrations are attached where needed
Studiolocal Studio can inspect tools, MCPs, context, sessions, approvals, and traces

Testing and Evals

CheckReady when
unit teststools, services, runners, filters, and known errors are tested with fakes
integration testsnarrow provider-backed tests cover model-dependent behavior
evalsknown prompts or regression cases have repeatable evals
approval testsapproval accepted, rejected, and timeout paths are covered
MCP testsmissing, unavailable, and filtered MCP tools are covered

Deployment

CheckReady when
secretsprovider and MCP credentials are server-only
storagehistory, memory, traces, audit, approval, and idempotency stores are durable
retriesjob retries are safe for side effects or disabled for non-idempotent runs
streamingproxies and platforms do not buffer streams that need live events
rollbackagents can be rebuilt with a previous prompt, tool catalog, or MCP config

Smoke Test Shape

const result = await runSupportTurn({
  conversationId: "smoke_test",
  message: "Say hello and do not call tools.",
  auth: smokeAuth,
  conversations: smokeConversations,
  services: smokeServices,
});

expect(result.ok).toBe(true);

For tool workflows, add a smoke test that calls one read-only tool and verifies trace metadata. For side-effect workflows, run against sandbox services with explicit idempotency records.