Is `tenant_id WHERE` clause enough?

Only if every code path remembers it. The moment a cache key, a temp file path, a sandbox mount, a cron schedule, a plugin install, or a backup file forgets the tenant_id, the boundary leaks. Structural isolation (per-tenant directories, per-tenant namespaces) makes 'forgetting' a type error rather than a data leak.

Can the LLM itself leak data across tenants?

Not if it never receives data from more than one tenant in a single context window. The gateway is responsible for ensuring that — tenant A's messages, tools, system prompt, and memory are loaded into a fresh context per request. The LLM provider (Anthropic, OpenAI) doesn't mix data across API calls; cross-tenant contamination is solely a gateway-design failure.

What about shared embeddings / shared retrieval indexes?

Tempting (it's cheaper) and risky. If two tenants share an embedding index, a poorly-tuned similarity search will surface tenant A's docs to tenant B's query. Default to per-tenant indexes; share only with explicit per-tenant ACLs enforced at retrieval time.

Engineering

Tenant isolation for LLM agents: the patterns that actually hold up

Isolating LLM agents per tenant is harder than adding a tenant_id column. Here are the patterns that survive contact with adversarial users, leaky tools, and the inevitable refactor.

By Dipankar Sarkar May 29, 2026 5 min read View raw .md

tenant isolation
LLM agents
security
multi-tenant
architecture

You’re building a multi-tenant LLM-agent system. Tenants A and B both have their own data, their own prompts, their own tools. The job is to ensure A cannot read B’s data — not by accident, not by adversarial prompting, not after a refactor.

The first instinct is “I’ll add tenant_id to every database row and filter on it.” That works until it doesn’t, which is usually the day you ship the third feature. Here are the patterns that actually hold up.

Make the boundary structural

The cleanest defense is to make the tenant ID the root of every namespace it influences, not just a column. Files live under tenants/<name>/, not in a shared directory. Cache keys are prefixed tenant:<name>:, not just keyed by content. Cron schedules carry a tenant in their type signature. Sandboxes are spawned with the tenant root as the only writable filesystem path.

When the tenant ID is structural, forgetting it is a build error or a path-resolution failure — observable. When the tenant ID is a column, forgetting it is silent data leakage — invisible until it isn’t.

Token-rooted dispatch

Every inbound request carries a tenant token in the Authorization header. The dispatcher’s first job: hash the token, look up the tenant, attach the tenant ID to the request context. Every downstream handler reads the tenant ID from the context — never from a parameter the caller can spoof.

async function dispatch(req: Request) {
  const token = extractBearer(req);
  const tenantId = await resolveTenant(token);  // SHA-256 + timing-safe lookup
  if (!tenantId) return reject(401);

  const ctx = { ...baseContext, tenantId };
  return handler(req, ctx);
}

If a downstream handler accidentally accepts a tenantId parameter from the user, you have a privilege escalation. Make ctx.tenantId the only path; lint against any handler signature that takes it as a parameter.

Path-traversal protection, everywhere

Every API that takes a path string is a potential escape hatch. The defense is mechanical:

function resolveTenantPath(tenantRoot: string, userPath: string): string {
  const resolved = path.resolve(tenantRoot, userPath);
  if (!resolved.startsWith(tenantRoot + path.sep) && resolved !== tenantRoot) {
    throw new ForbiddenError("path escapes tenant root");
  }
  return resolved;
}

Apply this to: file_read / file_write tools, plugin install paths, sandbox mount specs, backup target keys, restore source keys, config overlay file paths. It’s tedious; it’s the most important tedium in the system.

Bonus: also reject symlinks that point outside the tenant root, after resolution. Bubblewrap can be configured to refuse symlink-following in mount specs.

Sandbox the agent’s tool calls

When the agent executes code (running a shell command, installing a package, running a Python script), do it inside a sandbox whose root is the tenant directory. Two choices:

bubblewrap — fast (~30 ms cold start), Linux-only, rootless. Default for trusted-but-isolated code.
Docker — slower (~200–500 ms), cross-platform, full container-isolation surface. Use for genuinely untrusted code.

In both cases:

No network egress by default.
Read-only host filesystem.
Writable tmpfs scratch.
Writable tenant work dir only.
Default-deny seccomp filter.
All capabilities dropped.

The sandbox is the load-bearing layer. Even if every other check fails, a properly-configured sandbox prevents tenant A’s code from reading tenant B’s data because it physically can’t see it.

Per-tenant retrieval indexes

Shared embedding indexes are a footgun. The vector similarity that powers retrieval doesn’t respect access control — if you have one shared index, an unrelated tenant’s documents can surface for any tenant’s query that happens to match.

Default to per-tenant indexes. Use sqlite-vec or a per-tenant Pinecone namespace or per-tenant tables. Yes, it’s more expensive on storage; the alternative is “tenant A’s customer list shows up in tenant B’s chat reply”.

If you absolutely must share an index (multi-tenant retrieval over a shared knowledge base, say), enforce ACLs at retrieval time with metadata filters — and treat any retrieval bug as a tenant-data-leak bug.

Shared LLM clients are fine

The LLM HTTP client itself can (and should) be shared. Each API call to Anthropic / OpenAI is stateless from the provider’s perspective — the message history is in the request body, not in a server-side session. As long as the gateway only puts one tenant’s data into any given request, the LLM provider’s multi-tenant isolation does the rest.

Tag each call with the tenant ID for cost accounting and logging. Don’t pass the tenant ID through to the LLM as a content field — there’s no need, and it adds a small leak surface if the LLM ever echoes it back.

Audit log captures every state change

A JSONL append-only log of every state-changing operation gives you forensics when something goes sideways. Capture: tenant create/delete, token rotate, config overlay write, channel pair/unpair, cron add/remove, backup/restore, quota update.

Each line: timestamp, actor (admin key ID or tenant token hash prefix), action, target. Ship to your SIEM. When an incident lands, the audit log is the difference between “we’ll figure it out” and “here’s the exact sequence of events”.

Admin / tenant key separation

A tenant should be able to override its model choice, max tokens, system prompt — fine. It should not be able to override the gateway’s Anthropic API key, the rate card, or the S3 credentials. That separation needs to be enforced at config-load time, with a clear error if a tenant overlay tries to set an admin-only key.

# Tenant overlay — allowed
model: claude-opus-4-7
max_tokens: 8192
system_prompt: "You are Acme Corp's assistant."

# Tenant overlay — rejected at load time
anthropic_api_key: "..."   # ✗ admin-only
rate_card: { ... }         # ✗ admin-only
s3_credentials: { ... }    # ✗ admin-only

The runtime never falls back; an attempted override is a fatal-on-load error.

Hashed tokens, rotated easily

Tokens are 128-bit secrets. Store SHA-256(token), never plaintext. Compare with crypto.timingSafeEqual. Provide a one-command rotation path — tenants token rotate <name> — so that “this token might be compromised” is a 5-second mitigation instead of a deploy.

Prefix tokens with the tenant ID (tk_acme_...) so log lines are greppable without leaking the secret half. The prefix is not the secret; the 32 hex chars after are.

The refactor test

The real test of tenant isolation is the next refactor. When someone six months from now adds a new tool, a new caching layer, a new background job — does the tenant boundary survive?

If the boundary is a WHERE tenant_id = $1 clause, the answer is “only if they remember”. If the boundary is structural — token-rooted dispatch, per-tenant directories, sandboxed execution, type-checked context — the answer is “yes, because the alternative is a build error”.

Optimize for the second outcome. Future-you and your customers will thank present-you.

Frequently asked

Is `tenant_id WHERE` clause enough?: Only if every code path remembers it. The moment a cache key, a temp file path, a sandbox mount, a cron schedule, a plugin install, or a backup file forgets the tenant_id, the boundary leaks. Structural isolation (per-tenant directories, per-tenant namespaces) makes 'forgetting' a type error rather than a data leak.
Can the LLM itself leak data across tenants?: Not if it never receives data from more than one tenant in a single context window. The gateway is responsible for ensuring that — tenant A's messages, tools, system prompt, and memory are loaded into a fresh context per request. The LLM provider (Anthropic, OpenAI) doesn't mix data across API calls; cross-tenant contamination is solely a gateway-design failure.
What about shared embeddings / shared retrieval indexes?: Tempting (it's cheaper) and risky. If two tenants share an embedding index, a poorly-tuned similarity search will surface tenant A's docs to tenant B's query. Default to per-tenant indexes; share only with explicit per-tenant ACLs enforced at retrieval time.