If you’re charging customers for an AI product, the question “what does this customer cost me?” needs a precise answer. Per-tenant LLM cost accounting is the boring infrastructure that makes that precision possible.
This article: the data model, the rate-card pattern, the integration paths to invoicing systems.
What you need to capture
Every LLM call should write a billing row with:
timestamp— when the call happened.tenant_id— which tenant made it.model— which model (claude-opus-4-7,gpt-4o, etc.).input_tokens— fresh input tokens (un-cached).cache_read_tokens— input tokens served from prompt cache (cheap).cache_write_tokens— input tokens written to the prompt cache (expensive, one-time).output_tokens— completion tokens.reasoning_tokens— extended-thinking / o-series reasoning tokens.tool_calls— count of tool invocations triggered in this turn.request_id— provider’s ID for the call (audit trail).cost_usd— computed at write time, snapshotted against the rate card in effect.
The cost is computed at write time and stored, not recomputed on read. This is critical for audit: if you change your rate card in July, June’s reports still reflect June’s prices.
The rate card
The rate card is a YAML map from (model, token-class) → USD-per-1M-tokens. Each LLM provider you use needs a row; reasoning and cache classes are separate from base rates.
billing:
currency: USD
rate_card:
"claude-opus-4-7":
input: 15.00
output: 75.00
cache_read: 1.50
cache_write: 18.75
reasoning: 75.00
"claude-sonnet-4-6":
input: 3.00
output: 15.00
cache_read: 0.30
cache_write: 3.75
"gpt-4o":
input: 2.50
output: 10.00
"gpt-5":
input: 5.00
output: 25.00
The rate card lives in the gateway’s admin-only config. Tenant overlays can’t override it — otherwise a tenant could quietly make their own bills cheaper.
Margin on top
You’re billing customers; you need a margin. Two patterns:
Markup at meter time: multiply the provider rate by 1.3x (or whatever) when computing cost_usd. The customer sees an aggregated cost that includes your margin; provider rate is invisible to them.
Pass-through + flat fee: bill exactly the provider rate, add a flat per-tenant per-month platform fee. Customers like the transparency; you take less variance risk.
The hybrid (5–15% markup + small platform fee) is common.
Quotas — the hard stop
Three knobs per tenant:
- Tokens per day — sum of input + output + reasoning, reset at UTC midnight.
- Cost per day (USD) — rate-card-driven, reset at UTC midnight.
- Requests per minute — sliding window.
Exceed any and the gateway returns 429 Too Many Requests with Retry-After: <seconds>. This is the difference between “a runaway tenant blows up your AWS bill” and “a runaway tenant gets throttled at 5 minutes past midnight.”
Quotas live in the tenant’s admin-only config; the tenant can request an increase but can’t set their own.
Reports
A single CSV row per tenant per day, with the rate-card-snapshotted cost:
date,tenant,model,tokens_in,tokens_out,tokens_cached,reasoning_tokens,tool_calls,cost_usd
2026-06-03,acme,claude-opus-4-7,142500,38200,12300,5400,42,4.78
2026-06-03,acme,claude-sonnet-4-6,891200,201400,84300,0,128,5.92
2026-06-03,globex,claude-sonnet-4-6,512300,98400,32100,0,67,3.41
Aggregate to month-end for invoicing. The CSV columns are the contract — keep them stable so customer downstream pipelines don’t break when you add a new dimension.
Stripe Billing integration
If you’re invoicing via Stripe:
- Create a Stripe
Productper offering (“OpenClawMU Pro”, “OpenClawMU Team”). - Create a
usage-based Pricelinked to a metered usage record. - Nightly cron: for each tenant, sum the day’s
cost_usd, post to Stripe viausageRecords.create({ subscription_item, quantity: cost_cents, timestamp }). - Stripe handles the invoice and the credit card charge at month-end.
The mapping tenant → Stripe subscription lives in your customer database; the gateway only knows tenant IDs and dollar amounts.
// nightly cron, in your invoicing service
import Stripe from "stripe";
import { openclaw } from "./openclaw-client";
const stripe = new Stripe(process.env.STRIPE_KEY!);
for (const customer of await db.customers.findAll()) {
const usage = await openclaw.billing.report(customer.tenantId, "yesterday");
const costCents = Math.round(usage.total_cost_usd * 100);
if (costCents > 0) {
await stripe.subscriptionItems.createUsageRecord(
customer.stripeSubItemId,
{ quantity: costCents, timestamp: usage.date_unix }
);
}
}
Showing customers their usage
Customers want to see what they’re paying for. Build a simple dashboard backed by the same CSV / API:
- Daily tokens (input vs output vs cached, stacked).
- Daily cost (with model breakdown).
- Top 10 sessions by cost.
- Quota progress (current period).
OpenClawMU’s control-plane API exposes GET /v1/tenants/<id>/usage?period=current-month returning the same data the CSV does, JSON-formatted. Front-end query, render charts, ship.
Audit-friendly trail
Every billing row carries the rate-card snapshot. If a customer disputes a charge, you can reproduce the math:
“On 2026-06-03 you used 142,500 input tokens of claude-opus-4-7 at $15/M = $2.138. Plus 38,200 output tokens at $75/M = $2.865. Plus 12,300 cached at $1.50/M = $0.018. Plus 5,400 reasoning at $75/M = $0.405. Plus our 10% margin = $5.97. Charge: $5.97.”
Customers respect that level of transparency. Plus your audit team likes it.
What not to do
- Don’t sample. Recording one in ten LLM calls and extrapolating is a lawsuit waiting to happen.
- Don’t bill on requests instead of tokens. Requests are a coarser unit; tokens are what your provider charges you for.
- Don’t change the rate card retroactively. If you raise prices, snapshot the new rate going forward; historical reports stay at the old rate.
- Don’t bury margin in opaque per-message fees. Customers find out, and they don’t appreciate it.
- Don’t share LLM keys across tenants without a meter. Even if it’s “cheaper” in the short term, you’ve lost the cost attribution.
Build vs adopt
You can build all of this in a sprint. The interesting work is your invoicing UX, your customer dashboard, your pricing strategy. The plumbing (meter every LLM call, snapshot a rate card, emit a CSV) is identical across every multi-tenant AI gateway. OpenClawMU ships it; LiteLLM has a thinner version; you can absolutely roll your own.
The point is: don’t ship the product before you ship the meter. Customers without bills become very expensive customers very quickly.