Real Cases

Coding Agent

A codebase assistant harness with repo search, file tools, command guardrails, patches, and evals.

A coding agent is a high-risk harness because it can inspect source code, run commands, and propose or apply changes. Treat it as an application over a workspace: your app owns the repository boundary, allowed paths, command policy, patch approval, git behavior, audit records, and trace metadata.

This pattern does not require new SDK primitives. It composes agents, tools, MCP, approvals, tracing, and evals around a codebase workflow.

Scenario

A user asks, "Find why the checkout test fails and propose a fix." The agent should search files, read relevant code, optionally run allowed test commands, propose a patch, and wait for approval before any write.

When to Use It

Use this pattern when:

  • an agent assists with code review, debugging, test triage, migration, or docs changes
  • workspace access must be scoped to allowed repositories and paths
  • command execution must be allow-listed
  • writes require preview, approval, idempotency, and git diff inspection
  • behavior should be checked with coding-task evals

Architecture Shape

LayerResponsibility
runnerresolve user, repo, branch, task, allowed paths, trace metadata
read toolssearch files, read files, inspect git diff, list tests
command toolrun only allow-listed commands with timeouts and sandbox policy
patch toolpropose or apply patches behind approval
MCP toolsoptional filesystem or git server tools, filtered before registration
auditrecord command runs, patch proposals, approvals, and applied changes
evalsregression tasks for search, diagnosis, patch proposal, and no-write behavior

Code Example

import { AgentBuilder, createHook } from "@anvia/core";
import { model } from "./model";
import { createCodebaseTools } from "./tools";

export async function runCodingAgent(input: CodingAgentInput) {
  const user = await input.auth.requireUser();
  const workspace = await input.workspaces.open({
    repoId: input.repoId,
    userId: user.id,
  });

  const approvalHook = createHook({
    async onToolCall({ toolName, tool }) {
      if (!["apply_patch", "run_command"].includes(toolName)) {
        return tool.run();
      }

      const approved = await input.approvals.waitForDecision({
        actorId: user.id,
        repoId: input.repoId,
        toolName,
        reason: "Codebase mutation or command execution requires approval.",
      });

      return approved ? tool.run() : tool.cancel("Operation was not approved.");
    },
  });

  const agent = new AgentBuilder("coding", model)
    .instructions(`
Help with codebase tasks.
Search and read files before proposing changes.
Prefer minimal patches.
Do not run commands unless a tool allows them.
Do not claim a patch was applied unless the tool confirms it.
    `)
    .tools(
      createCodebaseTools({
        workspace,
        allowedPaths: input.allowedPaths,
        allowedCommands: ["pnpm test", "pnpm lint", "pnpm typecheck"],
        audit: input.audit,
      }),
    )
    .hook(approvalHook)
    .defaultMaxTurns(8)
    .build();

  const response = await agent
    .prompt(input.task)
    .withTrace({
      name: "coding-agent-task",
      userId: user.id,
      metadata: {
        repoId: input.repoId,
        branch: workspace.branch,
        taskId: input.taskId,
      },
    })
    .send();

  return {
    output: response.output,
    trace: response.trace,
  };
}

Tool Boundaries

Keep read tools separate from mutation tools.

export function createCodebaseTools(scope: CodebaseToolScope) {
  return [
    createSearchFilesTool(scope),
    createReadFileTool(scope),
    createGitDiffTool(scope),
    createRunCommandTool(scope),
    createApplyPatchTool(scope),
  ];
}

Read tools should enforce allowed paths.

async execute({ path }) {
  scope.workspace.requireAllowedPath(path, scope.allowedPaths);
  return scope.workspace.readFile(path);
}

Command tools should enforce exact allow lists, timeouts, and working directory policy.

async execute({ command }) {
  if (!scope.allowedCommands.includes(command)) {
    return { status: "blocked" as const, reason: "command_not_allowed" };
  }

  return scope.workspace.run(command, {
    timeoutMs: 60_000,
    audit: scope.audit,
  });
}

Patch tools should support preview-first behavior.

async execute({ patch, mode }) {
  if (mode === "preview") {
    return scope.workspace.previewPatch(patch);
  }

  return scope.workspace.applyPatch({
    patch,
    operationId: `patch:${scope.workspace.id}:${hashPatch(patch)}`,
  });
}

MCP Filesystem Tools

If a filesystem MCP server is used, filter or wrap its tools before the coding agent sees them. Prefer local wrapper tools when you need path allow lists, audit records, command policy, or patch approval.

const filesystem = await connectMcp(
  mcp.stdio({
    name: "filesystem",
    command: "npx",
    args: ["-y", "@modelcontextprotocol/server-filesystem", workspace.root],
  }),
);

const readOnlyTools = filesystem.tools.filter((tool) =>
  ["read_file", "list_directory"].includes(tool.name),
);

Failure Modes

FailureFix
agent reads outside repoenforce allowed paths in every file tool
command is too broadexact allow list and timeout in command tool
patch applies twiceidempotency key from patch hash
write happens without approvalhook or approval metadata on mutation tools
traces leak source codekeep trace metadata to ids, paths, and summaries
evals only check final proseadd task cases for tool choice, no-write mode, and patch preview

Test Checklist

  • Test file reads inside and outside allowed paths.
  • Test blocked commands and approved commands.
  • Test patch preview without mutation.
  • Test approved and rejected patch application.
  • Test git diff inspection after a patch.
  • Add eval cases for diagnosis, minimal patch proposal, and refusal to run disallowed commands.