---
title: "Self-hosted WhatsApp bot platform: the open-source playbook"
description: "Run a multi-tenant WhatsApp bot platform on your own infra without the WhatsApp Business API contract. Open-source, BYO-LLM, with per-customer isolation and cost tracking. Here's the architecture."
url: https://openclawmu.neullabs.com/blog/self-hosted-whatsapp-bot-platform
publishedAt: 2026-06-02T00:00:00.000Z
tags: ["WhatsApp", "bot platform", "self-hosted", "Baileys", "open source"]
cluster: cornerstone
source: OpenClawMU
---

You want to run a WhatsApp bot platform. Either as a SaaS product for customers, or as the messaging layer for your own product. The path most people take is signing up for the WhatsApp Business API via a BSP (Twilio, Vonage, 360dialog) — which works, but binds you to per-message fees, a contract, and their infrastructure.

There's a self-hosted alternative that's a much better fit for many use cases: run your own multi-tenant gateway on a VM you control, connect each tenant to their WhatsApp via the multi-device protocol, and pay only for your LLM provider and your VM.

## Why self-host?

- **No per-message fee.** WhatsApp Cloud API charges per conversation; self-hosted via Baileys has no marginal cost beyond your VM.
- **Bring your own LLM.** Anthropic, OpenAI, Llama, Mistral — your choice. The cloud bot platforms typically lock you to one.
- **Data residency.** Conversations stay on your hardware in your region.
- **Customization.** Drop in any tool, any prompt, any agent behavior. No proprietary flow language.
- **Per-customer billing.** Meter each tenant's LLM cost and charge them what makes sense for your business.

The trade-offs are real: you operate the gateway, you handle the QR-code re-pairing when WhatsApp deauthorizes a session, and Baileys is unofficial (so a particularly hostile Meta policy change could break it). For most SMB use cases, the trade-offs land in your favor.

## The stack

A self-hosted WhatsApp bot platform needs four things:

1. **A WhatsApp adapter.** Baileys is the standard for the multi-device protocol.
2. **An agent runtime.** Something that takes an inbound message and produces a reply, with tool-use, memory, and personality.
3. **Tenant isolation.** Each customer's conversations, memory, and credentials kept separate.
4. **Cost accounting.** Per-tenant token tracking so you can bill rationally.

OpenClawMU bundles all four. The flow:

```
WhatsApp ──Baileys──► OpenClawMU ──tenant-routed──► Agent runtime
                          │                              │
                          ├── per-tenant session store ──┘
                          ├── per-tenant memory (sqlite-vec)
                          ├── per-tenant sandbox
                          └── per-tenant cost accounting
```

## Pairing a tenant's WhatsApp

The CLI walks the QR-code dance. The end-user's phone scans the QR; Baileys negotiates the device-paired session; the credentials are stored in the tenant's directory.

```bash
openclaw channels pair whatsapp --tenant acme
# → scans QR; on success, /tenants/acme/channels/whatsapp.json is written
```

Once paired, inbound messages from that WhatsApp account route to the `acme` tenant's agent. The agent's reply is sent back through the same Baileys session.

## Inbound message flow

Every inbound is normalized into a tenant-tagged envelope:

```json
{
  "tenant": "acme",
  "channel": "whatsapp",
  "user": {
    "id": "wa:+15551234567",
    "display_name": "Jane Doe"
  },
  "session_id": "wa:+15551234567:default",
  "content": { "type": "text", "text": "How many invoices are overdue?" },
  "received_at": "2026-06-03T10:14:22Z"
}
```

The agent runtime processes this envelope, executes whatever tools it needs (looking up the invoice DB, etc.), and produces a reply. The reply goes back to the Baileys adapter, which translates it into WhatsApp-native form (markdown → text formatting, line breaks preserved) and sends it.

## Handling media

WhatsApp messages can include images, videos, voice notes, documents. Each gets normalized into a `content` block with a type and a (locally-stored) path:

- **image** → routed to a vision-capable model (Claude Opus, GPT-4o).
- **voice** → transcribed via Whisper (local or API), then treated as text.
- **document** → text-extracted via pdfjs / docx / etc., then included as context.

Outbound media is symmetric: the agent can attach an image (e.g., a generated chart) and the adapter uploads it via WhatsApp's media endpoints.

## Cost accounting

Every LLM call records a billing row scoped to the tenant. At the end of the month, generate a CSV:

```bash
openclaw billing report acme --period current-month --csv > acme-2026-06.csv
```

Pipe that into Stripe Billing, QuickBooks, or your own invoicing flow. The customer sees an itemized usage statement; you pocket the margin over your LLM provider's cost.

## Reliability concerns

- **Session expiry.** WhatsApp will occasionally invalidate a multi-device session. The fix is to re-pair the QR. Build a re-pair UX for your customers to handle this without paging your support team.
- **Rate limits.** WhatsApp throttles per-account; respect their guidance on message-send rates.
- **Backups.** `openclaw tenants backup acme --to s3://...` snapshots the full tenant state, including the WhatsApp credentials. Schedule nightly.
- **Multi-region resilience.** Run a hot-standby gateway in a second region with cross-region S3 replication. RTO ~10 minutes via restore.

## When *not* to self-host

- **Very high volume.** Above a few thousand messages/day per account, the official WhatsApp Cloud API or a BSP becomes operationally cleaner.
- **Regulated industries with strict approval flows.** Healthcare, banking, and some government contexts require the official API (button-style templates, opt-in flows).
- **You don't want to operate a VM.** Run the gateway via a managed hosting partner instead. (Hosted-ops contracts available — see /pricing.)

## The stack, end-to-end

1. **VM**: Hetzner CCX13 ($35/mo) or AWS t3.medium ($30/mo).
2. **OpenClawMU**: Apache-2.0, self-hosted.
3. **LLM**: Anthropic, OpenAI, or local Llama / Mistral.
4. **TLS / public URL**: Tailscale Funnel (free), Cloudflare Tunnel, or your own nginx.
5. **Backups**: S3, R2, or MinIO.
6. **Monitoring**: any Prometheus scraper for /metrics; any log forwarder for the audit log.

Total fixed cost: $50–80/month depending on VM choice. Variable cost: your LLM bill, which you can pass through to your customers with margin.

That's the entire playbook. The platform is free; the LLM you pay for; the customers you charge.