Can I use both in the same deployment?

Yes — that's the recommended pattern. Bubblewrap for trusted-but-isolated default (fast cold start, low overhead), Docker for genuinely untrusted workloads (full container isolation, accept the ~500 ms cost). OpenClawMU lets you pick per workload.

What about gVisor / Kata / Firecracker?

gVisor sits between bubblewrap and Docker on the cost/isolation curve — kernel-level isolation, slightly higher overhead than runc. Kata and Firecracker are heavier (lightweight VMs). For multi-tenant AI gateways, Docker + gVisor (runtime: runsc) is a common production choice when you want stronger isolation than runc with manageable cold-start cost.

Are there security risks I should know about?

Bubblewrap is small and audited but relies on Linux user namespaces — a kernel namespace bug breaks isolation. Docker has more surface area but a much more mature security ecosystem (seccomp, AppArmor, gVisor). Treat sandbox escape as a CVE in either case; subscribe to the upstream security mailing lists; patch promptly.

Engineering

Bubblewrap vs Docker: choosing a sandbox for AI agent tool calls

Sandboxing AI-agent tool calls is mandatory. Bubblewrap and Docker are both viable; here's the trade-off matrix on cold start, isolation surface, OS support, and operational complexity — and when each one wins.

By Dipankar Sarkar May 26, 2026 4 min read View raw .md

sandbox
bubblewrap
docker
ai agents
security

Sandboxing AI agent tool calls is no longer optional. The moment your agent executes a shell command, installs a package, runs a Python script, or fetches a URL, you’re running code with the agent’s privileges. In a multi-tenant context that means tenant A’s prompt can issue code that tries to read tenant B’s data. The defense is a sandbox.

Two options dominate in 2026: bubblewrap and Docker. This article is the trade-off matrix.

What the sandbox needs to do

Whatever you pick, the sandbox must:

Isolate the filesystem. The agent sees only the tenant’s work directory and the system libraries it needs.
Restrict network access. Default-deny; opt-in to specific hostnames + ports.
Drop capabilities. No CAP_SYS_ADMIN, no raw network sockets, no kernel-module loading.
Filter syscalls. Default-deny seccomp profile; allow only what tool execution needs.
Cap resources. Memory limit, CPU quota, wall-clock timeout.
Be cheap. Cold start measured in tens of milliseconds, not seconds — because you’ll spawn one per tool call.

Both bubblewrap and Docker can satisfy all six. The question is which trade-offs each makes.

Bubblewrap

Bubblewrap is the sandboxing primitive that powers Flatpak. It’s a small setuid-free binary that builds a Linux user-namespace + mount-namespace sandbox using only kernel features. No daemon, no Docker, no root.

Cold start: ~30 ms on a modern x86 box. That’s per-tool-call cheap — you can spawn a fresh sandbox every time your agent issues a shell command.

Isolation mechanism: user namespaces + mount namespaces + seccomp.

Pros:

Lightning fast cold start.
No daemon. Bubblewrap is a CLI you invoke; nothing keeps running between calls.
Rootless. Doesn’t need elevated privileges to set up.
Small audit surface. The bwrap binary is a few thousand lines.
Plays nicely with the existing host filesystem.

Cons:

Linux only. macOS and Windows have no equivalent.
Relies on user namespaces — a kernel namespace bug breaks isolation.
No built-in resource limits beyond what you wire up via cgroups separately.
Less battle-tested in adversarial multi-tenant production than Docker.

Use it when: you’re running on Linux, the code you’re executing is trusted-but-isolated (agent tools you control, but you want defense in depth), and cold-start latency matters.

Docker

Docker’s runc runtime (or any OCI-compatible runtime) gives you full container isolation: namespaces, cgroups, seccomp, AppArmor / SELinux, capability dropping, and optional GPU passthrough.

Cold start: 200–500 ms for a typical sandbox image, faster if you keep a warm pool of pre-spawned containers.

Isolation mechanism: kernel namespaces + cgroups + seccomp + AppArmor + capability drops.

Pros:

Cross-platform. Linux native; macOS and Windows via Docker Desktop / Orbstack.
Mature security ecosystem. Default-deny seccomp profile, AppArmor profiles, gVisor runtime option.
Battle-tested. Every public container service uses some variant.
Easy to plug into your existing container infrastructure.
Supports GPU access if your agent needs ML model inference inside the sandbox.

Cons:

Slower cold start. 10–15x bubblewrap.
Daemon required. Adds an operational dependency.
Larger audit surface. More features means more potential bugs.
Resource overhead. Memory + CPU per container.

Use it when: you’re running code from genuinely untrusted sources (user-submitted scripts, plugins from unverified publishers), you need cross-platform support, or you’re already deeply invested in container infrastructure.

The hybrid pattern

Real deployments use both. OpenClawMU’s default: bubblewrap for the standard agent tool surface (shell, file_read, file_write, package install), Docker for tools explicitly marked as untrusted (custom plugins from unverified ClawHub publishers, user-submitted code).

The choice is per-tool, configurable in the tenant config:

sandbox:
  default_mode: bwrap
  modes:
    untrusted_code:
      runtime: docker
      image: openclaw/sandbox-untrusted:latest
      memory_limit_mb: 512
      cpu_quota: 0.5
      runtime_class: runsc  # gVisor

Tools annotated @sandbox("untrusted_code") get the heavier isolation. Everything else gets the fast bubblewrap path.

Cold-start cost in production numbers

A typical agent run issues 3–10 tool calls. Bubblewrap × 10 = ~300 ms total sandboxing overhead — negligible against the 2–5 second LLM response time. Docker × 10 = 2–5 seconds, which doubles the perceived latency.

If you can keep a warm Docker pool, the cold-start cost drops to ~50 ms per call. That’s a reasonable tradeoff for the heavier isolation surface.

Seccomp profiles

Both bubblewrap and Docker accept seccomp profiles that restrict the syscalls a process can issue. A reasonable default for agent code:

Allow: read, write, openat, close, exec, fork, mmap, brk, exit, futex, clock_gettime, getpid, getuid, getgid (the boring stuff).
Deny: ptrace, mount, umount2, reboot, kexec_load, sysctl, perf_event_open (anything that touches the kernel or other processes).

OpenClawMU ships a default-deny seccomp profile that allows the syscalls a typical Python / Node / shell tool needs. Custom profiles are configurable per sandbox mode.

Network policy

Default to no network. Tools that need network access opt in with an allow-list:

sandbox:
  network:
    default: deny
    allow:
      - "api.weather.gov:443"
      - "*.anthropic.com:443"

Implementation differs: bubblewrap can run without network namespaces or with a unshare(CLONE_NEWNET) for full isolation; Docker uses --network=none plus a per-container network namespace if you want allow-listing.

When neither is enough

For truly hostile workloads (security research, user-submitted attack payloads), neither bubblewrap nor Docker-on-runc is sufficient. Step up to:

Docker + gVisor (runsc): kernel-level isolation, ~30% syscall overhead. The pragmatic next step.
Kata Containers: lightweight VMs as containers. Stronger isolation, heavier cold start.
Firecracker: AWS’s MicroVM. Used by Lambda. Cold start ~125 ms; very strong isolation.

For most multi-tenant AI gateway use cases, Docker + gVisor is the right ceiling. Beyond that you’re paying overhead you don’t need.

Recommendation

Linux + trusted-but-isolated workloads: bubblewrap. Fast, simple, well-suited.
Cross-platform or moderately-untrusted workloads: Docker with default seccomp + cap-drop.
Genuinely-untrusted (user-submitted code): Docker + gVisor.
Hostile workloads: Firecracker.

Pick per workload, not per cluster. The right choice for one tool isn’t the right choice for all of them.

Frequently asked

Can I use both in the same deployment?: Yes — that's the recommended pattern. Bubblewrap for trusted-but-isolated default (fast cold start, low overhead), Docker for genuinely untrusted workloads (full container isolation, accept the ~500 ms cost). OpenClawMU lets you pick per workload.
What about gVisor / Kata / Firecracker?: gVisor sits between bubblewrap and Docker on the cost/isolation curve — kernel-level isolation, slightly higher overhead than runc. Kata and Firecracker are heavier (lightweight VMs). For multi-tenant AI gateways, Docker + gVisor (runtime: runsc) is a common production choice when you want stronger isolation than runc with manageable cold-start cost.
Are there security risks I should know about?: Bubblewrap is small and audited but relies on Linux user namespaces — a kernel namespace bug breaks isolation. Docker has more surface area but a much more mature security ecosystem (seccomp, AppArmor, gVisor). Treat sandbox escape as a CVE in either case; subscribe to the upstream security mailing lists; patch promptly.