Tool Patterns

Tool Validation and Contracts

Design tool schemas, outputs, and test boundaries that hold up in production.

Tool contracts are the strongest deterministic boundary in an agent harness. Use schemas to validate arguments and outputs, return typed product states for expected outcomes, and test tools directly before model runs.

Scenario

A model can choose when to call a tool, but the tool owns the contract. The model should not be able to pass arbitrary unvalidated data into your service layer, and downstream code should not need to parse vague natural-language tool results.

When to Use It

Use this pattern for every tool that reads product state, writes product state, or returns values used by downstream code.

Architecture Shape

LayerResponsibility
Zod input schemavalidate model arguments before execution
Zod output schemavalidate tool result before it is serialized
tool executecall product services and return typed states
runnermap expected states and thrown errors to product responses
testscall tools directly with valid and invalid arguments

Code Example

import { createTool } from "@anvia/core";
import { z } from "zod";

const lookupOrderOutput = z.discriminatedUnion("status", [
  z.object({
    status: z.literal("found"),
    orderId: z.string(),
    fulfillmentStatus: z.enum(["processing", "shipped", "delivered"]),
  }),
  z.object({
    status: z.literal("not_found"),
    orderId: z.string(),
  }),
  z.object({
    status: z.literal("blocked"),
    reason: z.literal("access_denied"),
  }),
]);

export function createLookupOrderTool(scope: OrderToolScope) {
  return createTool({
    name: "lookup_order",
    description: "Look up one order owned by the current customer.",
    input: z.object({
      orderId: z.string().min(1),
    }),
    output: lookupOrderOutput,
    async execute({ orderId }) {
      const allowed = await scope.orders.canRead({
        userId: scope.userId,
        tenantId: scope.tenantId,
        orderId,
      });

      if (!allowed) {
        return { status: "blocked" as const, reason: "access_denied" as const };
      }

      const order = await scope.orders.find(orderId);

      if (!order) {
        return { status: "not_found" as const, orderId };
      }

      return {
        status: "found" as const,
        orderId,
        fulfillmentStatus: order.fulfillmentStatus,
      };
    },
  });
}

Expected States vs Errors

SituationReturn stateThrow
record not foundyesno
user lacks access and the model can continue safelyyesno
malformed model argumentsschema handles itno
database unavailablenoyes
invariant violatednoyes
downstream service timeoutnoyes

Expected states are useful model input. Unexpected failures belong to the runner, logs, retries, or product error boundary.

Direct Tool Tests

const tool = createLookupOrderTool({
  userId: "user_123",
  tenantId: "tenant_123",
  orders: fakeOrders,
});

const result = await tool.call({ orderId: "A-100" });

expect(result).toEqual({
  status: "found",
  orderId: "A-100",
  fulfillmentStatus: "shipped",
});

Use ToolSet.call(...) when you want to exercise JSON parsing and serialized output.

const tools = ToolSet.fromTools([tool]);

await expect(
  tools.call("lookup_order", JSON.stringify({ orderId: "" })),
).rejects.toThrow();

Runner Error Mapping

The runner should decide which failures become user-facing product errors.

try {
  const response = await agent.prompt(message).send();
  return { ok: true as const, output: response.output };
} catch (error) {
  if (isTemporaryStorageError(error)) {
    return { ok: false as const, error: "temporarily_unavailable" };
  }

  throw error;
}

Failure Modes

FailureFix
model keeps passing invalid argumentstighten description, schema descriptions, or ask for missing data first
downstream code parses prosereturn structured tool output or agent output schema
permission errors leak detailsreturn a compact blocked state or generic product error
tests only cover provider runsadd direct tool and runner tests with fakes

Test Checklist

  • Test valid inputs, invalid inputs, and missing required fields.
  • Test permission allowed and denied paths.
  • Test expected states such as not_found and blocked.
  • Test unexpected service failures at the runner boundary.
  • Inspect tool result text in traces for readability.