# OpenClawMU — full content corpus

> OpenClawMU is the multi-tenant fork of OpenClaw — a self-hosted AI gateway across WhatsApp, Telegram, Slack, Discord, Signal, iMessage, Teams, Matrix and more. Add tenant isolation, sandboxed agents, per-tenant tokens and quotas, web terminals, and S3 backup. Apache-2.0, self-hosted, no SaaS lock-in.

Source: https://openclawmu.neullabs.com
Repo: https://github.com/neul-labs/openclawmu
Upstream: https://github.com/openclaw/openclaw
Generated: 2026-06-04T11:50:17.141Z

---

# What is a multi-tenant AI gateway? The architecture, explained.

URL: https://openclawmu.neullabs.com/blog/multi-tenant-ai-gateway-explained
Published: 2026-06-03
Tags: multi-tenant, AI gateway, architecture, infrastructure, LLM
Cluster: cornerstone

> A multi-tenant AI gateway is the layer between your messaging channels and your LLM that isolates per-customer state. Here's how the pattern works, why it matters now, and what a defensible implementation looks like.

A **multi-tenant AI gateway** is the layer that sits between your messaging channels (WhatsApp, Slack, your in-app chat widget) and your LLM provider, and that gives each customer their own isolated workspace inside a single deployment. Sessions, memory, sandboxes, channel pairings, cron jobs, and cost accounting are all scoped to a tenant boundary.

This article walks the architecture: what a multi-tenant AI gateway actually does, what the alternatives are, why the pattern matters now, and what a defensible implementation looks like.

## The problem the pattern solves

You're building an AI product. The MVP works: one user, one chat, one assistant. To productize you need to support N customers from the same deployment. Each customer has their own conversation history, their own preferences, possibly their own LLM key, their own scheduled jobs, their own channel pairing (their WhatsApp number, their Slack workspace).

You can grow into this from three directions, none of them great:

1. **Spin up a new server per customer.** Works for 5 customers, breaks at 50.
2. **Add `tenant_id` to every table and pray your code path doesn't forget it.** This is the classic mistake — it survives until the day the cache key, the cron schedule, or the file path doesn't carry the ID and a customer sees another customer's data.
3. **Buy a closed-source bot platform.** Solves it, but they take a margin forever and your customer data lives in their database.

The multi-tenant AI gateway is the fourth option: one process, structural isolation at the directory/sandbox boundary, no per-customer server cost, no platform fee.

## What gets isolated

A defensible multi-tenant AI gateway isolates everything that bears state.

- **Sessions.** Each tenant has its own chat-session store. A session ID is meaningful within a tenant but never across tenants.
- **Memory.** Vector embeddings + content store, per tenant. Sharing memory across tenants would leak whatever the previous user told the assistant.
- **Plugins / skills.** Tenant A can install a custom tool without Tenant B seeing it.
- **Sandboxes.** When the agent executes code, it does so in a sandbox whose root is the tenant's directory. No cross-tenant filesystem access.
- **Cron jobs.** Scheduled "remind me on Friday" tasks belong to the tenant that created them.
- **Channel credentials.** Tenant A's WhatsApp pairing and Tenant B's Slack OAuth are stored separately, encrypted at rest.
- **Devices and nodes.** Paired clients for distributed setups, per tenant.
- **Config overlay.** Each tenant has a YAML overlay that can override model choice, max tokens, system prompt — but cannot override admin-only keys (API credentials, rate cards).

What's *shared* across tenants is everything stateless: the Node process, the LLM HTTP client (calls tagged with tenant ID for metering), the channel adapter classes, the dispatcher logic.

## The shape of the tenant boundary

The cleanest implementation makes the tenant ID structural — the root of the filesystem tree, the prefix of the auth token, the dimension of every billing row. Treating the tenant as a *namespace* rather than a *column* eliminates whole categories of bugs.

```
data/
├── tenants/
│   ├── acme/
│   │   ├── sessions/
│   │   ├── memory/
│   │   ├── plugins/
│   │   ├── sandbox/
│   │   ├── cron/
│   │   ├── channels/
│   │   └── config.yaml
│   ├── globex/
│   │   └── (same layout)
│   └── initech/
│       └── (same layout)
└── gateway.log
```

If every code path that takes a tenant-relevant input also resolves a tenant-rooted path, then forgetting the tenant becomes a *type error* rather than a runtime data leak.

## The token model

Each tenant gets a token. The token authenticates every inbound request — JSON-RPC, the OpenAI-compatible HTTP shim, channel webhooks, terminal WebSocket. Three properties matter:

- **Hashed at rest.** Store SHA-256(token), never the plaintext. A gateway-disk compromise should not leak live tokens.
- **Constant-time compared.** Use `crypto.timingSafeEqual`. Any short-circuit on prefix mismatch enables token-fishing attacks.
- **Tenant-prefixed.** Token format `tk_<tenant_id>_<32 hex chars>` lets you grep log lines without leaking the secret half.

Rotation should be a single command — `tenants token rotate acme` — and instant. There's no recovery if you lose a token; rotate to issue a new one.

## Path-traversal protection

Every API that takes a path string is an opportunity for one tenant to escape into another. The defense is uniform:

1. Resolve the path with `path.resolve` against the tenant root.
2. Assert the resolved path is a descendant of the tenant root.
3. Reject symlinks that point outside the root.
4. Reject absolute paths in user input.

Apply this to: file_read / file_write tools, plugin loaders, sandbox mount specs, S3 backup target keys, S3 restore source keys, config overlay file paths. It's repetitive code; it's the most important repetitive code in the system.

## Per-tenant cost accounting

If you're charging customers, you need to know what each customer cost you. The gateway records a billing row for every LLM call: tenant, model, input tokens, output tokens, cached tokens, reasoning tokens, timestamp, rate-card snapshot.

The rate-card snapshot is critical — if you change your pricing later, historical reports still reflect what was billed at the time. Audit-friendly.

```csv
date,tenant,model,tokens_in,tokens_out,cost_usd
2026-06-03,acme,claude-opus-4-7,142500,38200,4.78
2026-06-03,acme,claude-sonnet-4-6,891200,201400,5.92
```

Pipe the CSV into Stripe Billing's usage-record API on a cron and you have automatic invoicing for your customers.

## Quota enforcement

Three knobs, hard-stop semantics:

- **Tokens per day.** Sum of input + output + reasoning, reset at UTC midnight.
- **Cost per day (USD).** Rate-card-driven, reset at UTC midnight.
- **Requests per minute.** Sliding-window count of inbound calls.

Exceed any quota and the gateway returns `429 Too Many Requests` with `Retry-After` set to the next reset boundary. Quotas are the difference between "a runaway tenant blows up your AWS bill" and "a runaway tenant gets throttled and pages you to investigate."

## Why now

The pattern is suddenly important because the underlying ingredients are suddenly cheap. LLM API costs are falling 4x/year. Channel SDKs (Baileys for WhatsApp, grammY for Telegram, Bolt for Slack) are mature. Sandboxing primitives (bubblewrap, Docker, gVisor) are battle-tested. The hard part used to be the LLM; now the hard part is the multi-tenant glue.

The teams that ship this glue cleanly will eat the bot-platform incumbents. The teams that don't will end up paying the bot-platform margin or building a fragile in-house version.

## What good looks like

A defensible multi-tenant AI gateway has:

- **Structural tenant isolation** — per-tenant directories, not per-row columns.
- **Hashed token auth** with constant-time comparison and rotation.
- **Sandboxed code execution** per tool call, with the sandbox rooted in the tenant directory.
- **Path-traversal protection** on every path-taking API.
- **Admin / tenant key separation** — config overlay cannot override credentials.
- **Per-tenant quotas and cost accounting** with rate-card snapshotting.
- **Backup/restore** for tenant portability.
- **A web terminal** for operator and tenant access (because someone always needs a shell).
- **An audit log** capturing every state-changing operation.

Build it yourself in 6–8 weeks, or adopt one and ship the product on top. Either way, the multi-tenant AI gateway is the layer you need.

---

# Self-hosted WhatsApp bot platform: the open-source playbook

URL: https://openclawmu.neullabs.com/blog/self-hosted-whatsapp-bot-platform
Published: 2026-06-02
Tags: WhatsApp, bot platform, self-hosted, Baileys, open source
Cluster: cornerstone

> Run a multi-tenant WhatsApp bot platform on your own infra without the WhatsApp Business API contract. Open-source, BYO-LLM, with per-customer isolation and cost tracking. Here's the architecture.

You want to run a WhatsApp bot platform. Either as a SaaS product for customers, or as the messaging layer for your own product. The path most people take is signing up for the WhatsApp Business API via a BSP (Twilio, Vonage, 360dialog) — which works, but binds you to per-message fees, a contract, and their infrastructure.

There's a self-hosted alternative that's a much better fit for many use cases: run your own multi-tenant gateway on a VM you control, connect each tenant to their WhatsApp via the multi-device protocol, and pay only for your LLM provider and your VM.

## Why self-host?

- **No per-message fee.** WhatsApp Cloud API charges per conversation; self-hosted via Baileys has no marginal cost beyond your VM.
- **Bring your own LLM.** Anthropic, OpenAI, Llama, Mistral — your choice. The cloud bot platforms typically lock you to one.
- **Data residency.** Conversations stay on your hardware in your region.
- **Customization.** Drop in any tool, any prompt, any agent behavior. No proprietary flow language.
- **Per-customer billing.** Meter each tenant's LLM cost and charge them what makes sense for your business.

The trade-offs are real: you operate the gateway, you handle the QR-code re-pairing when WhatsApp deauthorizes a session, and Baileys is unofficial (so a particularly hostile Meta policy change could break it). For most SMB use cases, the trade-offs land in your favor.

## The stack

A self-hosted WhatsApp bot platform needs four things:

1. **A WhatsApp adapter.** Baileys is the standard for the multi-device protocol.
2. **An agent runtime.** Something that takes an inbound message and produces a reply, with tool-use, memory, and personality.
3. **Tenant isolation.** Each customer's conversations, memory, and credentials kept separate.
4. **Cost accounting.** Per-tenant token tracking so you can bill rationally.

OpenClawMU bundles all four. The flow:

```
WhatsApp ──Baileys──► OpenClawMU ──tenant-routed──► Agent runtime
                          │                              │
                          ├── per-tenant session store ──┘
                          ├── per-tenant memory (sqlite-vec)
                          ├── per-tenant sandbox
                          └── per-tenant cost accounting
```

## Pairing a tenant's WhatsApp

The CLI walks the QR-code dance. The end-user's phone scans the QR; Baileys negotiates the device-paired session; the credentials are stored in the tenant's directory.

```bash
openclaw channels pair whatsapp --tenant acme
# → scans QR; on success, /tenants/acme/channels/whatsapp.json is written
```

Once paired, inbound messages from that WhatsApp account route to the `acme` tenant's agent. The agent's reply is sent back through the same Baileys session.

## Inbound message flow

Every inbound is normalized into a tenant-tagged envelope:

```json
{
  "tenant": "acme",
  "channel": "whatsapp",
  "user": {
    "id": "wa:+15551234567",
    "display_name": "Jane Doe"
  },
  "session_id": "wa:+15551234567:default",
  "content": { "type": "text", "text": "How many invoices are overdue?" },
  "received_at": "2026-06-03T10:14:22Z"
}
```

The agent runtime processes this envelope, executes whatever tools it needs (looking up the invoice DB, etc.), and produces a reply. The reply goes back to the Baileys adapter, which translates it into WhatsApp-native form (markdown → text formatting, line breaks preserved) and sends it.

## Handling media

WhatsApp messages can include images, videos, voice notes, documents. Each gets normalized into a `content` block with a type and a (locally-stored) path:

- **image** → routed to a vision-capable model (Claude Opus, GPT-4o).
- **voice** → transcribed via Whisper (local or API), then treated as text.
- **document** → text-extracted via pdfjs / docx / etc., then included as context.

Outbound media is symmetric: the agent can attach an image (e.g., a generated chart) and the adapter uploads it via WhatsApp's media endpoints.

## Cost accounting

Every LLM call records a billing row scoped to the tenant. At the end of the month, generate a CSV:

```bash
openclaw billing report acme --period current-month --csv > acme-2026-06.csv
```

Pipe that into Stripe Billing, QuickBooks, or your own invoicing flow. The customer sees an itemized usage statement; you pocket the margin over your LLM provider's cost.

## Reliability concerns

- **Session expiry.** WhatsApp will occasionally invalidate a multi-device session. The fix is to re-pair the QR. Build a re-pair UX for your customers to handle this without paging your support team.
- **Rate limits.** WhatsApp throttles per-account; respect their guidance on message-send rates.
- **Backups.** `openclaw tenants backup acme --to s3://...` snapshots the full tenant state, including the WhatsApp credentials. Schedule nightly.
- **Multi-region resilience.** Run a hot-standby gateway in a second region with cross-region S3 replication. RTO ~10 minutes via restore.

## When *not* to self-host

- **Very high volume.** Above a few thousand messages/day per account, the official WhatsApp Cloud API or a BSP becomes operationally cleaner.
- **Regulated industries with strict approval flows.** Healthcare, banking, and some government contexts require the official API (button-style templates, opt-in flows).
- **You don't want to operate a VM.** Run the gateway via a managed hosting partner instead. (Hosted-ops contracts available — see /pricing.)

## The stack, end-to-end

1. **VM**: Hetzner CCX13 ($35/mo) or AWS t3.medium ($30/mo).
2. **OpenClawMU**: Apache-2.0, self-hosted.
3. **LLM**: Anthropic, OpenAI, or local Llama / Mistral.
4. **TLS / public URL**: Tailscale Funnel (free), Cloudflare Tunnel, or your own nginx.
5. **Backups**: S3, R2, or MinIO.
6. **Monitoring**: any Prometheus scraper for /metrics; any log forwarder for the audit log.

Total fixed cost: $50–80/month depending on VM choice. Variable cost: your LLM bill, which you can pass through to your customers with margin.

That's the entire playbook. The platform is free; the LLM you pay for; the customers you charge.

---

# OpenClaw vs OpenClawMU: when to fork from single-user to multi-tenant

URL: https://openclawmu.neullabs.com/blog/openclaw-vs-openclawmu
Published: 2026-06-01
Tags: openclaw, openclawmu, multi-tenant, fork, migration
Cluster: cornerstone

> OpenClaw is a brilliant single-user AI gateway. OpenClawMU is the multi-tenant fork. Here's exactly what differs, when the fork is worth it, and how to migrate cleanly without losing state.

Upstream OpenClaw is the AI gateway. One Node process bridges WhatsApp, Telegram, Slack, Discord, Signal, iMessage, Teams, Matrix, LINE, Lark, Google Chat, and WebChat to your LLM of choice, with skills, memory, sandboxing, voice wake-words, and a canvas UI. It's a remarkable single-user product.

OpenClawMU is the same codebase with a multi-tenant surface added. The "MU" stands for multi-user / multi-tenant. This article is the precise diff: what's different, what's the same, when to switch, and how to migrate without losing state.

## What's the same

Everything you love about upstream is in MU, unchanged:

- **All channel adapters.** Same Baileys for WhatsApp, same grammY for Telegram, same Bolt for Slack.
- **The Pi agent runtime.** Same SDK, same skill format, same canvas UI hooks.
- **ClawHub.** Same skill registry, same install flow.
- **CLI surface.** `openclaw agent`, `openclaw skills`, `openclaw onboard` — all unchanged.
- **Voice wake-word + Whisper.** Same pipeline.
- **The Discord community.** Shared with upstream.

If you're an upstream user, the MU gateway will feel identical except for a handful of new commands.

## What's different

The diff is in the parts that bear state and credentials. Specifically:

### 1. Tenant token authentication

Every JSON-RPC method and every HTTP API endpoint requires a Bearer token. The dispatcher hashes it (SHA-256, constant-time compared) and resolves the tenant ID. No tenant-scoped method accepts an unauthenticated request.

### 2. Isolated directory layout

```
# upstream
~/.openclaw/sessions/
~/.openclaw/memory/
~/.openclaw/plugins/
~/.openclaw/cron/

# MU
~/.openclaw/tenants/acme/sessions/
~/.openclaw/tenants/acme/memory/
~/.openclaw/tenants/acme/plugins/
~/.openclaw/tenants/acme/cron/
~/.openclaw/tenants/globex/sessions/
...
```

Every state-bearing directory is per-tenant. The dispatcher resolves the tenant from the auth token and routes all path-taking operations against that tenant's root.

### 3. Sandboxed per tenant

Both upstream and MU support bubblewrap and Docker sandboxes. The difference: in MU, the sandbox root is the tenant's `sandbox/` directory. A code-executing tool call run by tenant A cannot read tenant B's data, even if it tries to escape the sandbox (it can't, but defense in depth).

### 4. Per-tenant quotas

Upstream has no quota concept (you're the user; you set your own limits). MU adds three knobs per tenant: tokens/day, USD/day, requests/minute. Exceed any and the gateway returns 429 with the next-reset timestamp.

### 5. Cost accounting

Upstream records LLM usage in a single bucket. MU adds a tenant column to every billing row, plus a CSV report generator scoped per tenant per period.

### 6. Web terminal

Upstream has terminal commands in the local CLI. MU adds a browser-based xterm.js UI that attaches to a per-tenant sandboxed pty over WebSocket. The terminal token-authenticates on every frame, not just the handshake.

### 7. Control-plane HTTP API

Upstream's admin surface is the CLI. MU adds an HTTP control plane (admin-key authenticated) for tenant CRUD, quota updates, usage queries, backup triggers. The CLI is now a thin wrapper around this API.

### 8. S3 backup / restore

Upstream doesn't bundle backup. MU has `openclaw tenants backup <name> --to s3://...` and the reverse-symmetric restore, both path-traversal hardened.

### 9. Admin / tenant key separation

In upstream, all keys live in the same config. In MU, certain keys are admin-only — LLM credentials, the rate card, S3 credentials. Tenant config overlays cannot override these (it's checked at load time).

### 10. OpenAI-compatible HTTP endpoints scoped per tenant

MU exposes `/v1/chat/completions` and `/v1/responses` that accept a tenant Bearer token and route to that tenant's session/memory/tools. Point any OpenAI SDK at your gateway and it Just Works.

## When the MU fork is worth it

Switch to MU if **any** of these apply:

- More than one person uses your gateway.
- You want to charge customers and need per-customer cost accounting.
- You're building a SaaS / multi-customer product on top of OpenClaw.
- You need sandbox isolation between code-executing workloads.
- You want S3-portable tenant snapshots.
- Your security or compliance team wants admin/tenant key separation.

Stay on upstream if:

- It's just you, on your own machine, no plans to share.
- You contribute to upstream and want a smaller diff to reason about.
- You're using a feature that's leading-edge in upstream and not yet rebased into MU.

## Migration: upstream → MU

The MU CLI ships a one-shot importer that converts an upstream state directory into a default tenant:

```bash
# 1. Stop upstream
openclaw service stop

# 2. Install MU
npm install -g openclaw@latest --tag mu

# 3. Import upstream state as a tenant named 'default'
openclaw tenants import-from-upstream \
  --upstream-dir ~/.openclaw-upstream-backup \
  --as default

# 4. Start MU
openclaw service start

# 5. Get the default tenant's token
openclaw tenants token show default
```

After import, your sessions, memory, channels, and cron jobs are accessible via the `default` tenant. Channels stay paired (no need to re-scan WhatsApp QR). You can then add more tenants alongside.

## Migration: MU → upstream

Less common but possible. Pick a single tenant; back up; install upstream; point upstream at the tenant's data directory. Multi-tenant data not on the chosen tenant is left behind (back it up too).

## Rebasing cadence

The MU branch rebases onto upstream every 2–4 weeks. New channels, agent runtime improvements, skill features all land in MU automatically. The `OPENCLAWMU ADDITION` markers make the merge surface tractable — when upstream changes a function MU also touches, the conflict resolution is straightforward.

If you're running MU in production, pin to a tagged release; if you want the leading edge, run from `main`.

## What stays free forever

OpenClaw upstream: free, Apache-2.0. OpenClawMU: free, Apache-2.0. Neither charges per seat or per active user. Neither runs SaaS infrastructure you depend on.

The bet is that more open, more usage, more contributions outweigh anything we'd extract via licensing. The bot platform incumbents charge a margin because they own the chokepoint; the open-source fork model says no thanks.

## Summary

OpenClaw is the brilliant single-user product. OpenClawMU is the same code with the multi-tenant surface. Switch the moment you need more than one user or want to charge customers. Migration is one command. Rebase against upstream every few weeks. Both will be free forever.

---

# Tenant isolation for LLM agents: the patterns that actually hold up

URL: https://openclawmu.neullabs.com/blog/tenant-isolation-for-llm-agents
Published: 2026-05-29
Tags: tenant isolation, LLM agents, security, multi-tenant, architecture
Cluster: technical

> Isolating LLM agents per tenant is harder than adding a tenant_id column. Here are the patterns that survive contact with adversarial users, leaky tools, and the inevitable refactor.

You're building a multi-tenant LLM-agent system. Tenants A and B both have their own data, their own prompts, their own tools. The job is to ensure A cannot read B's data — not by accident, not by adversarial prompting, not after a refactor.

The first instinct is "I'll add `tenant_id` to every database row and filter on it." That works until it doesn't, which is usually the day you ship the third feature. Here are the patterns that actually hold up.

## Make the boundary structural

The cleanest defense is to make the tenant ID the *root* of every namespace it influences, not just a *column*. Files live under `tenants/<name>/`, not in a shared directory. Cache keys are prefixed `tenant:<name>:`, not just keyed by content. Cron schedules carry a tenant in their type signature. Sandboxes are spawned with the tenant root as the only writable filesystem path.

When the tenant ID is *structural*, forgetting it is a build error or a path-resolution failure — observable. When the tenant ID is *a column*, forgetting it is silent data leakage — invisible until it isn't.

## Token-rooted dispatch

Every inbound request carries a tenant token in the `Authorization` header. The dispatcher's first job: hash the token, look up the tenant, attach the tenant ID to the request context. Every downstream handler reads the tenant ID from the context — never from a parameter the caller can spoof.

```ts
async function dispatch(req: Request) {
  const token = extractBearer(req);
  const tenantId = await resolveTenant(token);  // SHA-256 + timing-safe lookup
  if (!tenantId) return reject(401);

  const ctx = { ...baseContext, tenantId };
  return handler(req, ctx);
}
```

If a downstream handler accidentally accepts a `tenantId` parameter from the user, you have a privilege escalation. Make `ctx.tenantId` the only path; lint against any handler signature that takes it as a parameter.

## Path-traversal protection, everywhere

Every API that takes a path string is a potential escape hatch. The defense is mechanical:

```ts
function resolveTenantPath(tenantRoot: string, userPath: string): string {
  const resolved = path.resolve(tenantRoot, userPath);
  if (!resolved.startsWith(tenantRoot + path.sep) && resolved !== tenantRoot) {
    throw new ForbiddenError("path escapes tenant root");
  }
  return resolved;
}
```

Apply this to: `file_read` / `file_write` tools, plugin install paths, sandbox mount specs, backup target keys, restore source keys, config overlay file paths. It's tedious; it's the most important tedium in the system.

Bonus: also reject symlinks that point outside the tenant root, after resolution. Bubblewrap can be configured to refuse symlink-following in mount specs.

## Sandbox the agent's tool calls

When the agent executes code (running a shell command, installing a package, running a Python script), do it inside a sandbox whose root is the tenant directory. Two choices:

- **bubblewrap** — fast (~30 ms cold start), Linux-only, rootless. Default for trusted-but-isolated code.
- **Docker** — slower (~200–500 ms), cross-platform, full container-isolation surface. Use for genuinely untrusted code.

In both cases:

- No network egress by default.
- Read-only host filesystem.
- Writable tmpfs scratch.
- Writable tenant work dir only.
- Default-deny seccomp filter.
- All capabilities dropped.

The sandbox is the load-bearing layer. Even if every other check fails, a properly-configured sandbox prevents tenant A's code from reading tenant B's data because it physically can't see it.

## Per-tenant retrieval indexes

Shared embedding indexes are a footgun. The vector similarity that powers retrieval doesn't respect access control — if you have one shared index, an unrelated tenant's documents can surface for any tenant's query that happens to match.

Default to per-tenant indexes. Use sqlite-vec or a per-tenant Pinecone namespace or per-tenant tables. Yes, it's more expensive on storage; the alternative is "tenant A's customer list shows up in tenant B's chat reply".

If you absolutely must share an index (multi-tenant retrieval over a shared knowledge base, say), enforce ACLs at retrieval time with metadata filters — and treat any retrieval bug as a tenant-data-leak bug.

## Shared LLM clients are fine

The LLM HTTP client itself can (and should) be shared. Each API call to Anthropic / OpenAI is stateless from the provider's perspective — the message history is in the request body, not in a server-side session. As long as the gateway only puts one tenant's data into any given request, the LLM provider's multi-tenant isolation does the rest.

Tag each call with the tenant ID for cost accounting and logging. Don't pass the tenant ID through to the LLM as a content field — there's no need, and it adds a small leak surface if the LLM ever echoes it back.

## Audit log captures every state change

A JSONL append-only log of every state-changing operation gives you forensics when something goes sideways. Capture: tenant create/delete, token rotate, config overlay write, channel pair/unpair, cron add/remove, backup/restore, quota update.

Each line: timestamp, actor (admin key ID or tenant token hash prefix), action, target. Ship to your SIEM. When an incident lands, the audit log is the difference between "we'll figure it out" and "here's the exact sequence of events".

## Admin / tenant key separation

A tenant should be able to override its model choice, max tokens, system prompt — fine. It should *not* be able to override the gateway's Anthropic API key, the rate card, or the S3 credentials. That separation needs to be enforced *at config-load time*, with a clear error if a tenant overlay tries to set an admin-only key.

```yaml
# Tenant overlay — allowed
model: claude-opus-4-7
max_tokens: 8192
system_prompt: "You are Acme Corp's assistant."

# Tenant overlay — rejected at load time
anthropic_api_key: "..."   # ✗ admin-only
rate_card: { ... }         # ✗ admin-only
s3_credentials: { ... }    # ✗ admin-only
```

The runtime never falls back; an attempted override is a fatal-on-load error.

## Hashed tokens, rotated easily

Tokens are 128-bit secrets. Store SHA-256(token), never plaintext. Compare with `crypto.timingSafeEqual`. Provide a one-command rotation path — `tenants token rotate <name>` — so that "this token might be compromised" is a 5-second mitigation instead of a deploy.

Prefix tokens with the tenant ID (`tk_acme_...`) so log lines are greppable without leaking the secret half. The prefix is *not* the secret; the 32 hex chars after are.

## The refactor test

The real test of tenant isolation is the next refactor. When someone six months from now adds a new tool, a new caching layer, a new background job — does the tenant boundary survive?

If the boundary is a `WHERE tenant_id = $1` clause, the answer is "only if they remember". If the boundary is structural — token-rooted dispatch, per-tenant directories, sandboxed execution, type-checked context — the answer is "yes, because the alternative is a build error".

Optimize for the second outcome. Future-you and your customers will thank present-you.

---

# Bubblewrap vs Docker: choosing a sandbox for AI agent tool calls

URL: https://openclawmu.neullabs.com/blog/bubblewrap-vs-docker-sandbox-for-agents
Published: 2026-05-26
Tags: sandbox, bubblewrap, docker, ai agents, security
Cluster: technical

> Sandboxing AI-agent tool calls is mandatory. Bubblewrap and Docker are both viable; here's the trade-off matrix on cold start, isolation surface, OS support, and operational complexity — and when each one wins.

Sandboxing AI agent tool calls is no longer optional. The moment your agent executes a shell command, installs a package, runs a Python script, or fetches a URL, you're running code with the agent's privileges. In a multi-tenant context that means tenant A's prompt can issue code that tries to read tenant B's data. The defense is a sandbox.

Two options dominate in 2026: **bubblewrap** and **Docker**. This article is the trade-off matrix.

## What the sandbox needs to do

Whatever you pick, the sandbox must:

1. **Isolate the filesystem.** The agent sees only the tenant's work directory and the system libraries it needs.
2. **Restrict network access.** Default-deny; opt-in to specific hostnames + ports.
3. **Drop capabilities.** No `CAP_SYS_ADMIN`, no raw network sockets, no kernel-module loading.
4. **Filter syscalls.** Default-deny seccomp profile; allow only what tool execution needs.
5. **Cap resources.** Memory limit, CPU quota, wall-clock timeout.
6. **Be cheap.** Cold start measured in tens of milliseconds, not seconds — because you'll spawn one per tool call.

Both bubblewrap and Docker can satisfy all six. The question is which trade-offs each makes.

## Bubblewrap

[Bubblewrap](https://github.com/containers/bubblewrap) is the sandboxing primitive that powers Flatpak. It's a small setuid-free binary that builds a Linux user-namespace + mount-namespace sandbox using only kernel features. No daemon, no Docker, no root.

**Cold start**: ~30 ms on a modern x86 box. That's per-tool-call cheap — you can spawn a fresh sandbox every time your agent issues a shell command.

**Isolation mechanism**: user namespaces + mount namespaces + seccomp.

**Pros**:
- Lightning fast cold start.
- No daemon. Bubblewrap is a CLI you invoke; nothing keeps running between calls.
- Rootless. Doesn't need elevated privileges to set up.
- Small audit surface. The bwrap binary is a few thousand lines.
- Plays nicely with the existing host filesystem.

**Cons**:
- Linux only. macOS and Windows have no equivalent.
- Relies on user namespaces — a kernel namespace bug breaks isolation.
- No built-in resource limits beyond what you wire up via cgroups separately.
- Less battle-tested in adversarial multi-tenant production than Docker.

**Use it when**: you're running on Linux, the code you're executing is trusted-but-isolated (agent tools you control, but you want defense in depth), and cold-start latency matters.

## Docker

Docker's `runc` runtime (or any OCI-compatible runtime) gives you full container isolation: namespaces, cgroups, seccomp, AppArmor / SELinux, capability dropping, and optional GPU passthrough.

**Cold start**: 200–500 ms for a typical sandbox image, faster if you keep a warm pool of pre-spawned containers.

**Isolation mechanism**: kernel namespaces + cgroups + seccomp + AppArmor + capability drops.

**Pros**:
- Cross-platform. Linux native; macOS and Windows via Docker Desktop / Orbstack.
- Mature security ecosystem. Default-deny seccomp profile, AppArmor profiles, gVisor runtime option.
- Battle-tested. Every public container service uses some variant.
- Easy to plug into your existing container infrastructure.
- Supports GPU access if your agent needs ML model inference inside the sandbox.

**Cons**:
- Slower cold start. 10–15x bubblewrap.
- Daemon required. Adds an operational dependency.
- Larger audit surface. More features means more potential bugs.
- Resource overhead. Memory + CPU per container.

**Use it when**: you're running code from genuinely untrusted sources (user-submitted scripts, plugins from unverified publishers), you need cross-platform support, or you're already deeply invested in container infrastructure.

## The hybrid pattern

Real deployments use both. OpenClawMU's default: bubblewrap for the standard agent tool surface (shell, file_read, file_write, package install), Docker for tools explicitly marked as untrusted (custom plugins from unverified ClawHub publishers, user-submitted code).

The choice is per-tool, configurable in the tenant config:

```yaml
sandbox:
  default_mode: bwrap
  modes:
    untrusted_code:
      runtime: docker
      image: openclaw/sandbox-untrusted:latest
      memory_limit_mb: 512
      cpu_quota: 0.5
      runtime_class: runsc  # gVisor
```

Tools annotated `@sandbox("untrusted_code")` get the heavier isolation. Everything else gets the fast bubblewrap path.

## Cold-start cost in production numbers

A typical agent run issues 3–10 tool calls. Bubblewrap × 10 = ~300 ms total sandboxing overhead — negligible against the 2–5 second LLM response time. Docker × 10 = 2–5 seconds, which doubles the perceived latency.

If you can keep a warm Docker pool, the cold-start cost drops to ~50 ms per call. That's a reasonable tradeoff for the heavier isolation surface.

## Seccomp profiles

Both bubblewrap and Docker accept seccomp profiles that restrict the syscalls a process can issue. A reasonable default for agent code:

- **Allow**: read, write, openat, close, exec, fork, mmap, brk, exit, futex, clock_gettime, getpid, getuid, getgid (the boring stuff).
- **Deny**: ptrace, mount, umount2, reboot, kexec_load, sysctl, perf_event_open (anything that touches the kernel or other processes).

OpenClawMU ships a default-deny seccomp profile that allows the syscalls a typical Python / Node / shell tool needs. Custom profiles are configurable per sandbox mode.

## Network policy

Default to no network. Tools that need network access opt in with an allow-list:

```yaml
sandbox:
  network:
    default: deny
    allow:
      - "api.weather.gov:443"
      - "*.anthropic.com:443"
```

Implementation differs: bubblewrap can run without network namespaces or with a unshare(CLONE_NEWNET) for full isolation; Docker uses `--network=none` plus a per-container network namespace if you want allow-listing.

## When neither is enough

For truly hostile workloads (security research, user-submitted attack payloads), neither bubblewrap nor Docker-on-runc is sufficient. Step up to:

- **Docker + gVisor (`runsc`)**: kernel-level isolation, ~30% syscall overhead. The pragmatic next step.
- **Kata Containers**: lightweight VMs as containers. Stronger isolation, heavier cold start.
- **Firecracker**: AWS's MicroVM. Used by Lambda. Cold start ~125 ms; very strong isolation.

For most multi-tenant AI gateway use cases, Docker + gVisor is the right ceiling. Beyond that you're paying overhead you don't need.

## Recommendation

- **Linux + trusted-but-isolated workloads**: bubblewrap. Fast, simple, well-suited.
- **Cross-platform or moderately-untrusted workloads**: Docker with default seccomp + cap-drop.
- **Genuinely-untrusted (user-submitted code)**: Docker + gVisor.
- **Hostile workloads**: Firecracker.

Pick per workload, not per cluster. The right choice for one tool isn't the right choice for all of them.

---

# Per-tenant LLM cost accounting: meter, attribute, charge

URL: https://openclawmu.neullabs.com/blog/billing-per-tenant-llm-usage
Published: 2026-05-22
Tags: billing, LLM cost, per-tenant, Stripe, metering
Cluster: supporting

> If you're charging customers for an AI product, you need per-tenant token tracking. Here's the data model, the rate-card pattern, and the integration with Stripe Billing or your invoicing of choice.

If you're charging customers for an AI product, the question "what does this customer cost me?" needs a precise answer. Per-tenant LLM cost accounting is the boring infrastructure that makes that precision possible.

This article: the data model, the rate-card pattern, the integration paths to invoicing systems.

## What you need to capture

Every LLM call should write a billing row with:

- `timestamp` — when the call happened.
- `tenant_id` — which tenant made it.
- `model` — which model (`claude-opus-4-7`, `gpt-4o`, etc.).
- `input_tokens` — fresh input tokens (un-cached).
- `cache_read_tokens` — input tokens served from prompt cache (cheap).
- `cache_write_tokens` — input tokens written to the prompt cache (expensive, one-time).
- `output_tokens` — completion tokens.
- `reasoning_tokens` — extended-thinking / o-series reasoning tokens.
- `tool_calls` — count of tool invocations triggered in this turn.
- `request_id` — provider's ID for the call (audit trail).
- `cost_usd` — computed at write time, snapshotted against the rate card in effect.

The cost is computed at write time and stored, not recomputed on read. This is critical for audit: if you change your rate card in July, June's reports still reflect June's prices.

## The rate card

The rate card is a YAML map from (model, token-class) → USD-per-1M-tokens. Each LLM provider you use needs a row; reasoning and cache classes are separate from base rates.

```yaml
billing:
  currency: USD
  rate_card:
    "claude-opus-4-7":
      input:        15.00
      output:       75.00
      cache_read:    1.50
      cache_write:  18.75
      reasoning:    75.00
    "claude-sonnet-4-6":
      input:         3.00
      output:       15.00
      cache_read:    0.30
      cache_write:   3.75
    "gpt-4o":
      input:         2.50
      output:       10.00
    "gpt-5":
      input:         5.00
      output:       25.00
```

The rate card lives in the gateway's admin-only config. Tenant overlays can't override it — otherwise a tenant could quietly make their own bills cheaper.

## Margin on top

You're billing customers; you need a margin. Two patterns:

**Markup at meter time**: multiply the provider rate by 1.3x (or whatever) when computing `cost_usd`. The customer sees an aggregated cost that includes your margin; provider rate is invisible to them.

**Pass-through + flat fee**: bill exactly the provider rate, add a flat per-tenant per-month platform fee. Customers like the transparency; you take less variance risk.

The hybrid (5–15% markup + small platform fee) is common.

## Quotas — the hard stop

Three knobs per tenant:

- **Tokens per day** — sum of input + output + reasoning, reset at UTC midnight.
- **Cost per day (USD)** — rate-card-driven, reset at UTC midnight.
- **Requests per minute** — sliding window.

Exceed any and the gateway returns `429 Too Many Requests` with `Retry-After: <seconds>`. This is the difference between "a runaway tenant blows up your AWS bill" and "a runaway tenant gets throttled at 5 minutes past midnight."

Quotas live in the tenant's admin-only config; the tenant can request an increase but can't set their own.

## Reports

A single CSV row per tenant per day, with the rate-card-snapshotted cost:

```csv
date,tenant,model,tokens_in,tokens_out,tokens_cached,reasoning_tokens,tool_calls,cost_usd
2026-06-03,acme,claude-opus-4-7,142500,38200,12300,5400,42,4.78
2026-06-03,acme,claude-sonnet-4-6,891200,201400,84300,0,128,5.92
2026-06-03,globex,claude-sonnet-4-6,512300,98400,32100,0,67,3.41
```

Aggregate to month-end for invoicing. The CSV columns are the contract — keep them stable so customer downstream pipelines don't break when you add a new dimension.

## Stripe Billing integration

If you're invoicing via Stripe:

1. Create a Stripe `Product` per offering ("OpenClawMU Pro", "OpenClawMU Team").
2. Create a `usage-based Price` linked to a metered usage record.
3. Nightly cron: for each tenant, sum the day's `cost_usd`, post to Stripe via `usageRecords.create({ subscription_item, quantity: cost_cents, timestamp })`.
4. Stripe handles the invoice and the credit card charge at month-end.

The mapping tenant → Stripe subscription lives in your customer database; the gateway only knows tenant IDs and dollar amounts.

```ts
// nightly cron, in your invoicing service
import Stripe from "stripe";
import { openclaw } from "./openclaw-client";

const stripe = new Stripe(process.env.STRIPE_KEY!);

for (const customer of await db.customers.findAll()) {
  const usage = await openclaw.billing.report(customer.tenantId, "yesterday");
  const costCents = Math.round(usage.total_cost_usd * 100);
  if (costCents > 0) {
    await stripe.subscriptionItems.createUsageRecord(
      customer.stripeSubItemId,
      { quantity: costCents, timestamp: usage.date_unix }
    );
  }
}
```

## Showing customers their usage

Customers want to see what they're paying for. Build a simple dashboard backed by the same CSV / API:

- Daily tokens (input vs output vs cached, stacked).
- Daily cost (with model breakdown).
- Top 10 sessions by cost.
- Quota progress (current period).

OpenClawMU's control-plane API exposes `GET /v1/tenants/<id>/usage?period=current-month` returning the same data the CSV does, JSON-formatted. Front-end query, render charts, ship.

## Audit-friendly trail

Every billing row carries the rate-card snapshot. If a customer disputes a charge, you can reproduce the math:

> "On 2026-06-03 you used 142,500 input tokens of claude-opus-4-7 at $15/M = $2.138. Plus 38,200 output tokens at $75/M = $2.865. Plus 12,300 cached at $1.50/M = $0.018. Plus 5,400 reasoning at $75/M = $0.405. Plus our 10% margin = $5.97. Charge: $5.97."

Customers respect that level of transparency. Plus your audit team likes it.

## What not to do

- **Don't sample.** Recording one in ten LLM calls and extrapolating is a lawsuit waiting to happen.
- **Don't bill on requests instead of tokens.** Requests are a coarser unit; tokens are what your provider charges you for.
- **Don't change the rate card retroactively.** If you raise prices, snapshot the new rate going forward; historical reports stay at the old rate.
- **Don't bury margin in opaque per-message fees.** Customers find out, and they don't appreciate it.
- **Don't share LLM keys across tenants without a meter.** Even if it's "cheaper" in the short term, you've lost the cost attribution.

## Build vs adopt

You can build all of this in a sprint. The interesting work is your invoicing UX, your customer dashboard, your pricing strategy. The plumbing (meter every LLM call, snapshot a rate card, emit a CSV) is identical across every multi-tenant AI gateway. OpenClawMU ships it; LiteLLM has a thinner version; you can absolutely roll your own.

The point is: don't ship the product before you ship the meter. Customers without bills become very expensive customers very quickly.

---

# From personal Telegram bot to multi-customer bot platform: field notes

URL: https://openclawmu.neullabs.com/blog/from-personal-bot-to-bot-platform
Published: 2026-05-18
Tags: field notes, telegram, multi-tenant, bot platform, migration
Cluster: narrative

> Field notes on graduating a single-user Telegram bot into a multi-tenant SaaS. The architectural choices that survived, the mistakes that didn't, and the migration timeline.

This is a write-up of taking a working Telegram bot — single tenant, my own conversations, three friends had been pilot-testing it — and turning it into a multi-customer SaaS. The bot's domain isn't important; what matters is the architectural transition.

## Where we started

The bot was about 1,200 lines of TypeScript. One Node process. One Telegram bot token. A `sqlite` file with the chat history. A folder full of "skills" (TypeScript modules that exposed tools to the agent). Anthropic Claude on the back end. A `cron` running on the box for scheduled reminders.

It worked. The four of us (me + three friends) had paired our Telegram accounts to the same bot. The conversations stayed coherent because the bot's session-keying logic used the Telegram user ID. The memory used the same user ID. It was — accidentally — a half-baked multi-tenant system. Half-baked because:

- The skills folder was shared, so installing one for myself meant installing it for everyone.
- The cron folder was shared, so scheduling a reminder for myself meant it ran with whatever identity the scheduler picked.
- The audit log was a single text file, no per-user separation.
- The cost was a single Anthropic bill with no attribution.
- The vector retrieval used a shared index, which retrieved across users in surprising ways.
- There was one Anthropic API key for everyone — no quotas, no rate-limit, no "this customer overused".

None of this was disastrous because we were four friends. The moment we wanted to offer it to a customer, every problem turned into a real problem.

## The naive migration

The first instinct: "add `tenant_id` to every table". This is exactly what every engineer reading this is currently considering. We tried it. It worked for the sessions table and the memory table. It didn't work for:

- The skills folder (filesystem, not a database).
- The cron folder (same).
- The "tmp/" directory the bot used as scratch space.
- The audit log (single file).
- The Telegram token (one token, one bot).

We then considered "one bot per customer" — a separate Node process for each. This works but the operational complexity is awful. Five customers means five processes, five cron rotations, five audit logs to ship.

## The pivot

We adopted [OpenClawMU](/) — the multi-tenant fork of OpenClaw. The architecture was exactly what we were trying to build:

- Per-tenant directories for sessions, memory, skills, cron, channels, sandbox.
- Per-tenant Telegram bot pairings (each customer gets their own BotFather token).
- Per-tenant cost accounting.
- One process, many tenants, structural isolation.

The decision wasn't free — adopting a primitive means giving up some control. But the trade was three engineering weeks for two months of multi-tenant infrastructure work we didn't want to do.

## Day 1: install and bootstrap

```bash
npm install -g openclaw@latest
openclaw onboard --install-daemon
openclaw tenants create me  # for me, the original user
```

The MU CLI's `tenants import-from-upstream` command pointed at our old data directory and converted it into the `me` tenant. Sessions, memory, channel pairing — all migrated. About 30 minutes for the import; another hour validating that conversations still made sense.

## Day 2: per-tenant Telegram tokens

Each customer needs their own Telegram bot. BotFather makes this trivial (`/newbot`, name it, get a token). For each customer:

```bash
openclaw tenants create customer-acme
openclaw channels pair telegram --tenant customer-acme --token <BotFather-token>
```

Then we sent each customer the bot's @username so they could open a chat. New chats land in the customer's tenant; their messages route to their agent; replies come from their bot.

## Day 3: skills and prompts

Each customer had asked for slightly different agent behavior. With OpenClawMU we could put different skill packages and system prompts in each tenant's overlay:

```yaml
# /tenants/customer-acme/config.yaml
model: claude-sonnet-4-6
system_prompt: |
  You are Acme Corp's reservation assistant.
  Be concise. Always ask for booking dates before suggesting venues.
allowed_tools:
  - book_table
  - check_calendar
  - search_venues
```

```yaml
# /tenants/customer-globex/config.yaml
model: claude-opus-4-7
system_prompt: |
  You are Globex's translation assistant.
  Default to professional tone. Preserve technical terms.
allowed_tools:
  - translate
  - dictionary_lookup
```

Two customers, two completely different bot personalities, one Node process.

## Day 4: cost and quota

We set up the rate card and the quotas:

```bash
# rate card lives in gateway-wide config (admin-only)
yq -i '.billing.rate_card."claude-sonnet-4-6".input = 3.00' ~/.openclaw/config.yaml
yq -i '.billing.rate_card."claude-sonnet-4-6".output = 15.00' ~/.openclaw/config.yaml
# (and so on for the other models we use)

# per-tenant quotas
openclaw tenants quota update customer-acme \
  --tokens-per-day 1_000_000 \
  --cost-per-day-usd 25
```

The first nightly CSV showed us exactly what each customer was costing us. The variance was eye-opening — one customer was 3x the others in token usage. They were doing the right thing; the bot was just useful enough that they leaned on it harder. We adjusted their quota and their plan.

## Day 5: dashboards and reporting

The customer-facing dashboard was a simple SvelteKit app calling OpenClawMU's `/v1/tenants/<id>/usage` endpoint. Customers see their token usage, their cost, their quota progress.

We didn't add a chat UI for them — they use Telegram. The dashboard is purely for transparency on cost.

## What broke

- **Memory bleed**. The old shared vector index had been retrieving across users in subtle ways — phrases from one customer's conversations showed up as "relevant context" in another's. The fix was per-tenant sqlite-vec indexes (which OpenClawMU's default already does). The lesson: shared retrieval is a leak surface regardless of how small "shared" is.
- **Telegram media uploads**. The old bot stored uploaded files in a shared `tmp/`. With per-tenant sandboxes, the media path is now per-tenant. Migrating existing files was a one-shot rsync but I'd recommend doing it before going live to avoid 404 confusion.
- **Cron rotation**. We had a "Friday digest" cron that worked under the old shared scheduler. Per-tenant cron means each tenant needs to opt in. We pre-seeded each existing customer's tenant with their cron; new customers configure it via a dashboard toggle.

## What didn't break

- **Skill code**. TypeScript modules dropped into the tenant's `plugins/` directory worked identically. Same imports, same interfaces.
- **Anthropic API key**. One admin-side key, all tenants billed against it, cost attributed correctly. We never had to issue per-tenant keys.
- **The bot's personality**. System prompts and model choices moved into per-tenant config, which is exactly where they belonged.
- **Telegram pairing**. Once per tenant; survives gateway restarts; we haven't needed to re-pair anyone yet.

## What we'd do differently

- **Plan the data model on day -10**, not day 2. Knowing that you'll be multi-tenant *eventually* means designing the skill loader and the vector store with per-tenant dimensions from the start, even if the v1 only ever ships one tenant.
- **Don't try to write multi-tenant infrastructure yourself**. We tried. It works for the first three customers. The next ten customers, plus the customer who has a security review, plus the customer who needs SOX-compliant audit logs, plus the customer who wants a quota — each of those is a week.

## Six months later

We have 47 paying tenants on one gateway. CCX13 Hetzner VM. Nightly S3 backups to Cloudflare R2. Anthropic spend is up 8x what it was as a personal bot. Net margin on the operation is fine. Operations: I check the audit log on Monday mornings, I respond to GitHub issues when something looks off, and I update the Anthropic SDK every few weeks.

The architecture we adopted on day 1 is the architecture we still run. Nothing has had to be re-thought. That, more than anything else, is the case for adopting the primitive vs. rolling your own.

---