You want to run a WhatsApp bot platform. Either as a SaaS product for customers, or as the messaging layer for your own product. The path most people take is signing up for the WhatsApp Business API via a BSP (Twilio, Vonage, 360dialog) — which works, but binds you to per-message fees, a contract, and their infrastructure.
There’s a self-hosted alternative that’s a much better fit for many use cases: run your own multi-tenant gateway on a VM you control, connect each tenant to their WhatsApp via the multi-device protocol, and pay only for your LLM provider and your VM.
Why self-host?
- No per-message fee. WhatsApp Cloud API charges per conversation; self-hosted via Baileys has no marginal cost beyond your VM.
- Bring your own LLM. Anthropic, OpenAI, Llama, Mistral — your choice. The cloud bot platforms typically lock you to one.
- Data residency. Conversations stay on your hardware in your region.
- Customization. Drop in any tool, any prompt, any agent behavior. No proprietary flow language.
- Per-customer billing. Meter each tenant’s LLM cost and charge them what makes sense for your business.
The trade-offs are real: you operate the gateway, you handle the QR-code re-pairing when WhatsApp deauthorizes a session, and Baileys is unofficial (so a particularly hostile Meta policy change could break it). For most SMB use cases, the trade-offs land in your favor.
The stack
A self-hosted WhatsApp bot platform needs four things:
- A WhatsApp adapter. Baileys is the standard for the multi-device protocol.
- An agent runtime. Something that takes an inbound message and produces a reply, with tool-use, memory, and personality.
- Tenant isolation. Each customer’s conversations, memory, and credentials kept separate.
- Cost accounting. Per-tenant token tracking so you can bill rationally.
OpenClawMU bundles all four. The flow:
WhatsApp ──Baileys──► OpenClawMU ──tenant-routed──► Agent runtime
│ │
├── per-tenant session store ──┘
├── per-tenant memory (sqlite-vec)
├── per-tenant sandbox
└── per-tenant cost accounting
Pairing a tenant’s WhatsApp
The CLI walks the QR-code dance. The end-user’s phone scans the QR; Baileys negotiates the device-paired session; the credentials are stored in the tenant’s directory.
openclaw channels pair whatsapp --tenant acme
# → scans QR; on success, /tenants/acme/channels/whatsapp.json is written
Once paired, inbound messages from that WhatsApp account route to the acme tenant’s agent. The agent’s reply is sent back through the same Baileys session.
Inbound message flow
Every inbound is normalized into a tenant-tagged envelope:
{
"tenant": "acme",
"channel": "whatsapp",
"user": {
"id": "wa:+15551234567",
"display_name": "Jane Doe"
},
"session_id": "wa:+15551234567:default",
"content": { "type": "text", "text": "How many invoices are overdue?" },
"received_at": "2026-06-03T10:14:22Z"
}
The agent runtime processes this envelope, executes whatever tools it needs (looking up the invoice DB, etc.), and produces a reply. The reply goes back to the Baileys adapter, which translates it into WhatsApp-native form (markdown → text formatting, line breaks preserved) and sends it.
Handling media
WhatsApp messages can include images, videos, voice notes, documents. Each gets normalized into a content block with a type and a (locally-stored) path:
- image → routed to a vision-capable model (Claude Opus, GPT-4o).
- voice → transcribed via Whisper (local or API), then treated as text.
- document → text-extracted via pdfjs / docx / etc., then included as context.
Outbound media is symmetric: the agent can attach an image (e.g., a generated chart) and the adapter uploads it via WhatsApp’s media endpoints.
Cost accounting
Every LLM call records a billing row scoped to the tenant. At the end of the month, generate a CSV:
openclaw billing report acme --period current-month --csv > acme-2026-06.csv
Pipe that into Stripe Billing, QuickBooks, or your own invoicing flow. The customer sees an itemized usage statement; you pocket the margin over your LLM provider’s cost.
Reliability concerns
- Session expiry. WhatsApp will occasionally invalidate a multi-device session. The fix is to re-pair the QR. Build a re-pair UX for your customers to handle this without paging your support team.
- Rate limits. WhatsApp throttles per-account; respect their guidance on message-send rates.
- Backups.
openclaw tenants backup acme --to s3://...snapshots the full tenant state, including the WhatsApp credentials. Schedule nightly. - Multi-region resilience. Run a hot-standby gateway in a second region with cross-region S3 replication. RTO ~10 minutes via restore.
When not to self-host
- Very high volume. Above a few thousand messages/day per account, the official WhatsApp Cloud API or a BSP becomes operationally cleaner.
- Regulated industries with strict approval flows. Healthcare, banking, and some government contexts require the official API (button-style templates, opt-in flows).
- You don’t want to operate a VM. Run the gateway via a managed hosting partner instead. (Hosted-ops contracts available — see /pricing.)
The stack, end-to-end
- VM: Hetzner CCX13 ($35/mo) or AWS t3.medium ($30/mo).
- OpenClawMU: Apache-2.0, self-hosted.
- LLM: Anthropic, OpenAI, or local Llama / Mistral.
- TLS / public URL: Tailscale Funnel (free), Cloudflare Tunnel, or your own nginx.
- Backups: S3, R2, or MinIO.
- Monitoring: any Prometheus scraper for /metrics; any log forwarder for the audit log.
Total fixed cost: $50–80/month depending on VM choice. Variable cost: your LLM bill, which you can pass through to your customers with margin.
That’s the entire playbook. The platform is free; the LLM you pay for; the customers you charge.