Sandbox Patterns

Sandbox Best Practices

Run agent file and command workflows inside a bounded workspace.

Use @anvia/sandbox when an agent needs to execute commands, write files, inspect generated output, or perform multi-step file workflows without touching the host project directly.

A sandbox is a boundary, not a permission system by itself. Your app still owns which files are staged, which tools are exposed, which commands are allowed, and how outputs are returned to users.

Default Shape

BoundaryPractice
workspacecreate one short-lived session per user task
filesstage only the files needed for the task
commandspass structured command and args; avoid shell strings
networkkeep network disabled unless the workflow requires it
limitsset timeout, memory, CPU, and output limits
cleanupalways destroy sessions in finally
audittrace command names, exit codes, and selected files

Create Sessions Per Task

Keep sandbox lifetime scoped to one request, job, or approval window.

import { DockerSandbox } from "@anvia/sandbox";

export async function runCodeCheck(source: string) {
  const sandbox = new DockerSandbox({
    image: "node:22-bookworm",
    limits: {
      timeoutMs: 30_000,
      maxOutputBytes: 64_000,
      memoryMb: 512,
      cpus: 1,
    },
  });

  const session = await sandbox.createSession({
    manifest: {
      files: {
        "index.js": source,
      },
    },
  });

  try {
    return await session.exec({
      command: "node",
      args: ["index.js"],
    });
  } finally {
    await session.destroy();
  }
}

Avoid long-lived shared sessions unless your product explicitly needs a persistent workspace model. Shared sessions make cleanup, permissions, and audit harder.

Stage Inputs Explicitly

Do not bind mount the whole host repository into a sandbox for untrusted work. Copy in the minimum task files.

const session = await sandbox.createSession({
  manifest: {
    files: {
      "package.json": JSON.stringify(packageJson, null, 2),
      "src/task.ts": taskSource,
      "README.md": readme,
    },
  },
});

This keeps host secrets, unrelated source files, local credentials, and build artifacts out of the container.

Expose Narrow Tools

If an agent only needs file operations, do not expose command execution.

import { createSandboxTools } from "@anvia/sandbox";

const tools = createSandboxTools(session, {
  include: ["read_file", "write_file", "list_files"],
});

Expose exec_command only when command execution is part of the product workflow. For higher-risk flows, put command execution behind approval or wrap it in your own tool that validates the command allowlist.

Validate Commands

Prefer structured command execution over shell strings:

await session.exec({
  command: "npm",
  args: ["test", "--", "--runInBand"],
  timeoutMs: 60_000,
});

Avoid:

await session.exec({
  command: "sh",
  args: ["-c", userGeneratedCommand],
});

Shell execution is harder to inspect, quote, approve, and audit. Use it only for trusted scripts that your app controls.

Handle Outputs as Data

Sandbox command failures are product states. Inspect exit code, timeout state, and truncated output before sending results back to the model or user.

const result = await session.exec({
  command: "npm",
  args: ["test"],
  timeoutMs: 60_000,
});

if (result.timedOut) {
  return { status: "timeout", summary: "The test command exceeded 60 seconds." };
}

if (result.exitCode !== 0) {
  return {
    status: "failed",
    stdout: result.stdout,
    stderr: result.stderr,
  };
}

return { status: "passed", stdout: result.stdout };

Keep large logs out of prompt context. Summarize or truncate before returning them to the model.

Testing

Test your app's sandbox workflow by creating a real session, staging representative files, and asserting the command or tool result. See Sandbox Testing for app-level test and GitHub Actions examples.

Checklist

  • Create sessions per task or request.
  • Stage only task-specific files.
  • Keep network disabled by default.
  • Use structured command and args.
  • Set explicit limits for timeout, memory, CPU, and output size.
  • Destroy sessions in finally.
  • Expose only the sandbox tools the agent needs.
  • Trace command names, exit codes, timeout state, and output truncation.