---
title: "Bubblewrap vs Docker: choosing a sandbox for AI agent tool calls"
description: "Sandboxing AI-agent tool calls is mandatory. Bubblewrap and Docker are both viable; here's the trade-off matrix on cold start, isolation surface, OS support, and operational complexity — and when each one wins."
url: https://openclawmu.neullabs.com/blog/bubblewrap-vs-docker-sandbox-for-agents
publishedAt: 2026-05-26T00:00:00.000Z
tags: ["sandbox", "bubblewrap", "docker", "ai agents", "security"]
cluster: technical
source: OpenClawMU
---

Sandboxing AI agent tool calls is no longer optional. The moment your agent executes a shell command, installs a package, runs a Python script, or fetches a URL, you're running code with the agent's privileges. In a multi-tenant context that means tenant A's prompt can issue code that tries to read tenant B's data. The defense is a sandbox.

Two options dominate in 2026: **bubblewrap** and **Docker**. This article is the trade-off matrix.

## What the sandbox needs to do

Whatever you pick, the sandbox must:

1. **Isolate the filesystem.** The agent sees only the tenant's work directory and the system libraries it needs.
2. **Restrict network access.** Default-deny; opt-in to specific hostnames + ports.
3. **Drop capabilities.** No `CAP_SYS_ADMIN`, no raw network sockets, no kernel-module loading.
4. **Filter syscalls.** Default-deny seccomp profile; allow only what tool execution needs.
5. **Cap resources.** Memory limit, CPU quota, wall-clock timeout.
6. **Be cheap.** Cold start measured in tens of milliseconds, not seconds — because you'll spawn one per tool call.

Both bubblewrap and Docker can satisfy all six. The question is which trade-offs each makes.

## Bubblewrap

[Bubblewrap](https://github.com/containers/bubblewrap) is the sandboxing primitive that powers Flatpak. It's a small setuid-free binary that builds a Linux user-namespace + mount-namespace sandbox using only kernel features. No daemon, no Docker, no root.

**Cold start**: ~30 ms on a modern x86 box. That's per-tool-call cheap — you can spawn a fresh sandbox every time your agent issues a shell command.

**Isolation mechanism**: user namespaces + mount namespaces + seccomp.

**Pros**:
- Lightning fast cold start.
- No daemon. Bubblewrap is a CLI you invoke; nothing keeps running between calls.
- Rootless. Doesn't need elevated privileges to set up.
- Small audit surface. The bwrap binary is a few thousand lines.
- Plays nicely with the existing host filesystem.

**Cons**:
- Linux only. macOS and Windows have no equivalent.
- Relies on user namespaces — a kernel namespace bug breaks isolation.
- No built-in resource limits beyond what you wire up via cgroups separately.
- Less battle-tested in adversarial multi-tenant production than Docker.

**Use it when**: you're running on Linux, the code you're executing is trusted-but-isolated (agent tools you control, but you want defense in depth), and cold-start latency matters.

## Docker

Docker's `runc` runtime (or any OCI-compatible runtime) gives you full container isolation: namespaces, cgroups, seccomp, AppArmor / SELinux, capability dropping, and optional GPU passthrough.

**Cold start**: 200–500 ms for a typical sandbox image, faster if you keep a warm pool of pre-spawned containers.

**Isolation mechanism**: kernel namespaces + cgroups + seccomp + AppArmor + capability drops.

**Pros**:
- Cross-platform. Linux native; macOS and Windows via Docker Desktop / Orbstack.
- Mature security ecosystem. Default-deny seccomp profile, AppArmor profiles, gVisor runtime option.
- Battle-tested. Every public container service uses some variant.
- Easy to plug into your existing container infrastructure.
- Supports GPU access if your agent needs ML model inference inside the sandbox.

**Cons**:
- Slower cold start. 10–15x bubblewrap.
- Daemon required. Adds an operational dependency.
- Larger audit surface. More features means more potential bugs.
- Resource overhead. Memory + CPU per container.

**Use it when**: you're running code from genuinely untrusted sources (user-submitted scripts, plugins from unverified publishers), you need cross-platform support, or you're already deeply invested in container infrastructure.

## The hybrid pattern

Real deployments use both. OpenClawMU's default: bubblewrap for the standard agent tool surface (shell, file_read, file_write, package install), Docker for tools explicitly marked as untrusted (custom plugins from unverified ClawHub publishers, user-submitted code).

The choice is per-tool, configurable in the tenant config:

```yaml
sandbox:
  default_mode: bwrap
  modes:
    untrusted_code:
      runtime: docker
      image: openclaw/sandbox-untrusted:latest
      memory_limit_mb: 512
      cpu_quota: 0.5
      runtime_class: runsc  # gVisor
```

Tools annotated `@sandbox("untrusted_code")` get the heavier isolation. Everything else gets the fast bubblewrap path.

## Cold-start cost in production numbers

A typical agent run issues 3–10 tool calls. Bubblewrap × 10 = ~300 ms total sandboxing overhead — negligible against the 2–5 second LLM response time. Docker × 10 = 2–5 seconds, which doubles the perceived latency.

If you can keep a warm Docker pool, the cold-start cost drops to ~50 ms per call. That's a reasonable tradeoff for the heavier isolation surface.

## Seccomp profiles

Both bubblewrap and Docker accept seccomp profiles that restrict the syscalls a process can issue. A reasonable default for agent code:

- **Allow**: read, write, openat, close, exec, fork, mmap, brk, exit, futex, clock_gettime, getpid, getuid, getgid (the boring stuff).
- **Deny**: ptrace, mount, umount2, reboot, kexec_load, sysctl, perf_event_open (anything that touches the kernel or other processes).

OpenClawMU ships a default-deny seccomp profile that allows the syscalls a typical Python / Node / shell tool needs. Custom profiles are configurable per sandbox mode.

## Network policy

Default to no network. Tools that need network access opt in with an allow-list:

```yaml
sandbox:
  network:
    default: deny
    allow:
      - "api.weather.gov:443"
      - "*.anthropic.com:443"
```

Implementation differs: bubblewrap can run without network namespaces or with a unshare(CLONE_NEWNET) for full isolation; Docker uses `--network=none` plus a per-container network namespace if you want allow-listing.

## When neither is enough

For truly hostile workloads (security research, user-submitted attack payloads), neither bubblewrap nor Docker-on-runc is sufficient. Step up to:

- **Docker + gVisor (`runsc`)**: kernel-level isolation, ~30% syscall overhead. The pragmatic next step.
- **Kata Containers**: lightweight VMs as containers. Stronger isolation, heavier cold start.
- **Firecracker**: AWS's MicroVM. Used by Lambda. Cold start ~125 ms; very strong isolation.

For most multi-tenant AI gateway use cases, Docker + gVisor is the right ceiling. Beyond that you're paying overhead you don't need.

## Recommendation

- **Linux + trusted-but-isolated workloads**: bubblewrap. Fast, simple, well-suited.
- **Cross-platform or moderately-untrusted workloads**: Docker with default seccomp + cap-drop.
- **Genuinely-untrusted (user-submitted code)**: Docker + gVisor.
- **Hostile workloads**: Firecracker.

Pick per workload, not per cluster. The right choice for one tool isn't the right choice for all of them.