---
title: "From personal Telegram bot to multi-customer bot platform: field notes"
description: "Field notes on graduating a single-user Telegram bot into a multi-tenant SaaS. The architectural choices that survived, the mistakes that didn't, and the migration timeline."
url: https://openclawmu.neullabs.com/blog/from-personal-bot-to-bot-platform
publishedAt: 2026-05-18T00:00:00.000Z
tags: ["field notes", "telegram", "multi-tenant", "bot platform", "migration"]
cluster: narrative
source: OpenClawMU
---

This is a write-up of taking a working Telegram bot — single tenant, my own conversations, three friends had been pilot-testing it — and turning it into a multi-customer SaaS. The bot's domain isn't important; what matters is the architectural transition.

## Where we started

The bot was about 1,200 lines of TypeScript. One Node process. One Telegram bot token. A `sqlite` file with the chat history. A folder full of "skills" (TypeScript modules that exposed tools to the agent). Anthropic Claude on the back end. A `cron` running on the box for scheduled reminders.

It worked. The four of us (me + three friends) had paired our Telegram accounts to the same bot. The conversations stayed coherent because the bot's session-keying logic used the Telegram user ID. The memory used the same user ID. It was — accidentally — a half-baked multi-tenant system. Half-baked because:

- The skills folder was shared, so installing one for myself meant installing it for everyone.
- The cron folder was shared, so scheduling a reminder for myself meant it ran with whatever identity the scheduler picked.
- The audit log was a single text file, no per-user separation.
- The cost was a single Anthropic bill with no attribution.
- The vector retrieval used a shared index, which retrieved across users in surprising ways.
- There was one Anthropic API key for everyone — no quotas, no rate-limit, no "this customer overused".

None of this was disastrous because we were four friends. The moment we wanted to offer it to a customer, every problem turned into a real problem.

## The naive migration

The first instinct: "add `tenant_id` to every table". This is exactly what every engineer reading this is currently considering. We tried it. It worked for the sessions table and the memory table. It didn't work for:

- The skills folder (filesystem, not a database).
- The cron folder (same).
- The "tmp/" directory the bot used as scratch space.
- The audit log (single file).
- The Telegram token (one token, one bot).

We then considered "one bot per customer" — a separate Node process for each. This works but the operational complexity is awful. Five customers means five processes, five cron rotations, five audit logs to ship.

## The pivot

We adopted [OpenClawMU](/) — the multi-tenant fork of OpenClaw. The architecture was exactly what we were trying to build:

- Per-tenant directories for sessions, memory, skills, cron, channels, sandbox.
- Per-tenant Telegram bot pairings (each customer gets their own BotFather token).
- Per-tenant cost accounting.
- One process, many tenants, structural isolation.

The decision wasn't free — adopting a primitive means giving up some control. But the trade was three engineering weeks for two months of multi-tenant infrastructure work we didn't want to do.

## Day 1: install and bootstrap

```bash
npm install -g openclaw@latest
openclaw onboard --install-daemon
openclaw tenants create me  # for me, the original user
```

The MU CLI's `tenants import-from-upstream` command pointed at our old data directory and converted it into the `me` tenant. Sessions, memory, channel pairing — all migrated. About 30 minutes for the import; another hour validating that conversations still made sense.

## Day 2: per-tenant Telegram tokens

Each customer needs their own Telegram bot. BotFather makes this trivial (`/newbot`, name it, get a token). For each customer:

```bash
openclaw tenants create customer-acme
openclaw channels pair telegram --tenant customer-acme --token <BotFather-token>
```

Then we sent each customer the bot's @username so they could open a chat. New chats land in the customer's tenant; their messages route to their agent; replies come from their bot.

## Day 3: skills and prompts

Each customer had asked for slightly different agent behavior. With OpenClawMU we could put different skill packages and system prompts in each tenant's overlay:

```yaml
# /tenants/customer-acme/config.yaml
model: claude-sonnet-4-6
system_prompt: |
  You are Acme Corp's reservation assistant.
  Be concise. Always ask for booking dates before suggesting venues.
allowed_tools:
  - book_table
  - check_calendar
  - search_venues
```

```yaml
# /tenants/customer-globex/config.yaml
model: claude-opus-4-7
system_prompt: |
  You are Globex's translation assistant.
  Default to professional tone. Preserve technical terms.
allowed_tools:
  - translate
  - dictionary_lookup
```

Two customers, two completely different bot personalities, one Node process.

## Day 4: cost and quota

We set up the rate card and the quotas:

```bash
# rate card lives in gateway-wide config (admin-only)
yq -i '.billing.rate_card."claude-sonnet-4-6".input = 3.00' ~/.openclaw/config.yaml
yq -i '.billing.rate_card."claude-sonnet-4-6".output = 15.00' ~/.openclaw/config.yaml
# (and so on for the other models we use)

# per-tenant quotas
openclaw tenants quota update customer-acme \
  --tokens-per-day 1_000_000 \
  --cost-per-day-usd 25
```

The first nightly CSV showed us exactly what each customer was costing us. The variance was eye-opening — one customer was 3x the others in token usage. They were doing the right thing; the bot was just useful enough that they leaned on it harder. We adjusted their quota and their plan.

## Day 5: dashboards and reporting

The customer-facing dashboard was a simple SvelteKit app calling OpenClawMU's `/v1/tenants/<id>/usage` endpoint. Customers see their token usage, their cost, their quota progress.

We didn't add a chat UI for them — they use Telegram. The dashboard is purely for transparency on cost.

## What broke

- **Memory bleed**. The old shared vector index had been retrieving across users in subtle ways — phrases from one customer's conversations showed up as "relevant context" in another's. The fix was per-tenant sqlite-vec indexes (which OpenClawMU's default already does). The lesson: shared retrieval is a leak surface regardless of how small "shared" is.
- **Telegram media uploads**. The old bot stored uploaded files in a shared `tmp/`. With per-tenant sandboxes, the media path is now per-tenant. Migrating existing files was a one-shot rsync but I'd recommend doing it before going live to avoid 404 confusion.
- **Cron rotation**. We had a "Friday digest" cron that worked under the old shared scheduler. Per-tenant cron means each tenant needs to opt in. We pre-seeded each existing customer's tenant with their cron; new customers configure it via a dashboard toggle.

## What didn't break

- **Skill code**. TypeScript modules dropped into the tenant's `plugins/` directory worked identically. Same imports, same interfaces.
- **Anthropic API key**. One admin-side key, all tenants billed against it, cost attributed correctly. We never had to issue per-tenant keys.
- **The bot's personality**. System prompts and model choices moved into per-tenant config, which is exactly where they belonged.
- **Telegram pairing**. Once per tenant; survives gateway restarts; we haven't needed to re-pair anyone yet.

## What we'd do differently

- **Plan the data model on day -10**, not day 2. Knowing that you'll be multi-tenant *eventually* means designing the skill loader and the vector store with per-tenant dimensions from the start, even if the v1 only ever ships one tenant.
- **Don't try to write multi-tenant infrastructure yourself**. We tried. It works for the first three customers. The next ten customers, plus the customer who has a security review, plus the customer who needs SOX-compliant audit logs, plus the customer who wants a quota — each of those is a week.

## Six months later

We have 47 paying tenants on one gateway. CCX13 Hetzner VM. Nightly S3 backups to Cloudflare R2. Anthropic spend is up 8x what it was as a personal bot. Net margin on the operation is fine. Operations: I check the audit log on Monday mornings, I respond to GitHub issues when something looks off, and I update the Anthropic SDK every few weeks.

The architecture we adopted on day 1 is the architecture we still run. Nothing has had to be re-thought. That, more than anything else, is the case for adopting the primitive vs. rolling your own.