Sandboxing AI agent tool calls is no longer optional. The moment your agent executes a shell command, installs a package, runs a Python script, or fetches a URL, you’re running code with the agent’s privileges. In a multi-tenant context that means tenant A’s prompt can issue code that tries to read tenant B’s data. The defense is a sandbox.

Two options dominate in 2026: bubblewrap and Docker. This article is the trade-off matrix.

What the sandbox needs to do

Whatever you pick, the sandbox must:

  1. Isolate the filesystem. The agent sees only the tenant’s work directory and the system libraries it needs.
  2. Restrict network access. Default-deny; opt-in to specific hostnames + ports.
  3. Drop capabilities. No CAP_SYS_ADMIN, no raw network sockets, no kernel-module loading.
  4. Filter syscalls. Default-deny seccomp profile; allow only what tool execution needs.
  5. Cap resources. Memory limit, CPU quota, wall-clock timeout.
  6. Be cheap. Cold start measured in tens of milliseconds, not seconds — because you’ll spawn one per tool call.

Both bubblewrap and Docker can satisfy all six. The question is which trade-offs each makes.

Bubblewrap

Bubblewrap is the sandboxing primitive that powers Flatpak. It’s a small setuid-free binary that builds a Linux user-namespace + mount-namespace sandbox using only kernel features. No daemon, no Docker, no root.

Cold start: ~30 ms on a modern x86 box. That’s per-tool-call cheap — you can spawn a fresh sandbox every time your agent issues a shell command.

Isolation mechanism: user namespaces + mount namespaces + seccomp.

Pros:

  • Lightning fast cold start.
  • No daemon. Bubblewrap is a CLI you invoke; nothing keeps running between calls.
  • Rootless. Doesn’t need elevated privileges to set up.
  • Small audit surface. The bwrap binary is a few thousand lines.
  • Plays nicely with the existing host filesystem.

Cons:

  • Linux only. macOS and Windows have no equivalent.
  • Relies on user namespaces — a kernel namespace bug breaks isolation.
  • No built-in resource limits beyond what you wire up via cgroups separately.
  • Less battle-tested in adversarial multi-tenant production than Docker.

Use it when: you’re running on Linux, the code you’re executing is trusted-but-isolated (agent tools you control, but you want defense in depth), and cold-start latency matters.

Docker

Docker’s runc runtime (or any OCI-compatible runtime) gives you full container isolation: namespaces, cgroups, seccomp, AppArmor / SELinux, capability dropping, and optional GPU passthrough.

Cold start: 200–500 ms for a typical sandbox image, faster if you keep a warm pool of pre-spawned containers.

Isolation mechanism: kernel namespaces + cgroups + seccomp + AppArmor + capability drops.

Pros:

  • Cross-platform. Linux native; macOS and Windows via Docker Desktop / Orbstack.
  • Mature security ecosystem. Default-deny seccomp profile, AppArmor profiles, gVisor runtime option.
  • Battle-tested. Every public container service uses some variant.
  • Easy to plug into your existing container infrastructure.
  • Supports GPU access if your agent needs ML model inference inside the sandbox.

Cons:

  • Slower cold start. 10–15x bubblewrap.
  • Daemon required. Adds an operational dependency.
  • Larger audit surface. More features means more potential bugs.
  • Resource overhead. Memory + CPU per container.

Use it when: you’re running code from genuinely untrusted sources (user-submitted scripts, plugins from unverified publishers), you need cross-platform support, or you’re already deeply invested in container infrastructure.

The hybrid pattern

Real deployments use both. OpenClawMU’s default: bubblewrap for the standard agent tool surface (shell, file_read, file_write, package install), Docker for tools explicitly marked as untrusted (custom plugins from unverified ClawHub publishers, user-submitted code).

The choice is per-tool, configurable in the tenant config:

sandbox:
  default_mode: bwrap
  modes:
    untrusted_code:
      runtime: docker
      image: openclaw/sandbox-untrusted:latest
      memory_limit_mb: 512
      cpu_quota: 0.5
      runtime_class: runsc  # gVisor

Tools annotated @sandbox("untrusted_code") get the heavier isolation. Everything else gets the fast bubblewrap path.

Cold-start cost in production numbers

A typical agent run issues 3–10 tool calls. Bubblewrap × 10 = ~300 ms total sandboxing overhead — negligible against the 2–5 second LLM response time. Docker × 10 = 2–5 seconds, which doubles the perceived latency.

If you can keep a warm Docker pool, the cold-start cost drops to ~50 ms per call. That’s a reasonable tradeoff for the heavier isolation surface.

Seccomp profiles

Both bubblewrap and Docker accept seccomp profiles that restrict the syscalls a process can issue. A reasonable default for agent code:

  • Allow: read, write, openat, close, exec, fork, mmap, brk, exit, futex, clock_gettime, getpid, getuid, getgid (the boring stuff).
  • Deny: ptrace, mount, umount2, reboot, kexec_load, sysctl, perf_event_open (anything that touches the kernel or other processes).

OpenClawMU ships a default-deny seccomp profile that allows the syscalls a typical Python / Node / shell tool needs. Custom profiles are configurable per sandbox mode.

Network policy

Default to no network. Tools that need network access opt in with an allow-list:

sandbox:
  network:
    default: deny
    allow:
      - "api.weather.gov:443"
      - "*.anthropic.com:443"

Implementation differs: bubblewrap can run without network namespaces or with a unshare(CLONE_NEWNET) for full isolation; Docker uses --network=none plus a per-container network namespace if you want allow-listing.

When neither is enough

For truly hostile workloads (security research, user-submitted attack payloads), neither bubblewrap nor Docker-on-runc is sufficient. Step up to:

  • Docker + gVisor (runsc): kernel-level isolation, ~30% syscall overhead. The pragmatic next step.
  • Kata Containers: lightweight VMs as containers. Stronger isolation, heavier cold start.
  • Firecracker: AWS’s MicroVM. Used by Lambda. Cold start ~125 ms; very strong isolation.

For most multi-tenant AI gateway use cases, Docker + gVisor is the right ceiling. Beyond that you’re paying overhead you don’t need.

Recommendation

  • Linux + trusted-but-isolated workloads: bubblewrap. Fast, simple, well-suited.
  • Cross-platform or moderately-untrusted workloads: Docker with default seccomp + cap-drop.
  • Genuinely-untrusted (user-submitted code): Docker + gVisor.
  • Hostile workloads: Firecracker.

Pick per workload, not per cluster. The right choice for one tool isn’t the right choice for all of them.