---
title: "Per-tenant LLM cost accounting: meter, attribute, charge"
description: "If you're charging customers for an AI product, you need per-tenant token tracking. Here's the data model, the rate-card pattern, and the integration with Stripe Billing or your invoicing of choice."
url: https://openclawmu.neullabs.com/blog/billing-per-tenant-llm-usage
publishedAt: 2026-05-22T00:00:00.000Z
tags: ["billing", "LLM cost", "per-tenant", "Stripe", "metering"]
cluster: supporting
source: OpenClawMU
---

If you're charging customers for an AI product, the question "what does this customer cost me?" needs a precise answer. Per-tenant LLM cost accounting is the boring infrastructure that makes that precision possible.

This article: the data model, the rate-card pattern, the integration paths to invoicing systems.

## What you need to capture

Every LLM call should write a billing row with:

- `timestamp` — when the call happened.
- `tenant_id` — which tenant made it.
- `model` — which model (`claude-opus-4-7`, `gpt-4o`, etc.).
- `input_tokens` — fresh input tokens (un-cached).
- `cache_read_tokens` — input tokens served from prompt cache (cheap).
- `cache_write_tokens` — input tokens written to the prompt cache (expensive, one-time).
- `output_tokens` — completion tokens.
- `reasoning_tokens` — extended-thinking / o-series reasoning tokens.
- `tool_calls` — count of tool invocations triggered in this turn.
- `request_id` — provider's ID for the call (audit trail).
- `cost_usd` — computed at write time, snapshotted against the rate card in effect.

The cost is computed at write time and stored, not recomputed on read. This is critical for audit: if you change your rate card in July, June's reports still reflect June's prices.

## The rate card

The rate card is a YAML map from (model, token-class) → USD-per-1M-tokens. Each LLM provider you use needs a row; reasoning and cache classes are separate from base rates.

```yaml
billing:
  currency: USD
  rate_card:
    "claude-opus-4-7":
      input:        15.00
      output:       75.00
      cache_read:    1.50
      cache_write:  18.75
      reasoning:    75.00
    "claude-sonnet-4-6":
      input:         3.00
      output:       15.00
      cache_read:    0.30
      cache_write:   3.75
    "gpt-4o":
      input:         2.50
      output:       10.00
    "gpt-5":
      input:         5.00
      output:       25.00
```

The rate card lives in the gateway's admin-only config. Tenant overlays can't override it — otherwise a tenant could quietly make their own bills cheaper.

## Margin on top

You're billing customers; you need a margin. Two patterns:

**Markup at meter time**: multiply the provider rate by 1.3x (or whatever) when computing `cost_usd`. The customer sees an aggregated cost that includes your margin; provider rate is invisible to them.

**Pass-through + flat fee**: bill exactly the provider rate, add a flat per-tenant per-month platform fee. Customers like the transparency; you take less variance risk.

The hybrid (5–15% markup + small platform fee) is common.

## Quotas — the hard stop

Three knobs per tenant:

- **Tokens per day** — sum of input + output + reasoning, reset at UTC midnight.
- **Cost per day (USD)** — rate-card-driven, reset at UTC midnight.
- **Requests per minute** — sliding window.

Exceed any and the gateway returns `429 Too Many Requests` with `Retry-After: <seconds>`. This is the difference between "a runaway tenant blows up your AWS bill" and "a runaway tenant gets throttled at 5 minutes past midnight."

Quotas live in the tenant's admin-only config; the tenant can request an increase but can't set their own.

## Reports

A single CSV row per tenant per day, with the rate-card-snapshotted cost:

```csv
date,tenant,model,tokens_in,tokens_out,tokens_cached,reasoning_tokens,tool_calls,cost_usd
2026-06-03,acme,claude-opus-4-7,142500,38200,12300,5400,42,4.78
2026-06-03,acme,claude-sonnet-4-6,891200,201400,84300,0,128,5.92
2026-06-03,globex,claude-sonnet-4-6,512300,98400,32100,0,67,3.41
```

Aggregate to month-end for invoicing. The CSV columns are the contract — keep them stable so customer downstream pipelines don't break when you add a new dimension.

## Stripe Billing integration

If you're invoicing via Stripe:

1. Create a Stripe `Product` per offering ("OpenClawMU Pro", "OpenClawMU Team").
2. Create a `usage-based Price` linked to a metered usage record.
3. Nightly cron: for each tenant, sum the day's `cost_usd`, post to Stripe via `usageRecords.create({ subscription_item, quantity: cost_cents, timestamp })`.
4. Stripe handles the invoice and the credit card charge at month-end.

The mapping tenant → Stripe subscription lives in your customer database; the gateway only knows tenant IDs and dollar amounts.

```ts
// nightly cron, in your invoicing service
import Stripe from "stripe";
import { openclaw } from "./openclaw-client";

const stripe = new Stripe(process.env.STRIPE_KEY!);

for (const customer of await db.customers.findAll()) {
  const usage = await openclaw.billing.report(customer.tenantId, "yesterday");
  const costCents = Math.round(usage.total_cost_usd * 100);
  if (costCents > 0) {
    await stripe.subscriptionItems.createUsageRecord(
      customer.stripeSubItemId,
      { quantity: costCents, timestamp: usage.date_unix }
    );
  }
}
```

## Showing customers their usage

Customers want to see what they're paying for. Build a simple dashboard backed by the same CSV / API:

- Daily tokens (input vs output vs cached, stacked).
- Daily cost (with model breakdown).
- Top 10 sessions by cost.
- Quota progress (current period).

OpenClawMU's control-plane API exposes `GET /v1/tenants/<id>/usage?period=current-month` returning the same data the CSV does, JSON-formatted. Front-end query, render charts, ship.

## Audit-friendly trail

Every billing row carries the rate-card snapshot. If a customer disputes a charge, you can reproduce the math:

> "On 2026-06-03 you used 142,500 input tokens of claude-opus-4-7 at $15/M = $2.138. Plus 38,200 output tokens at $75/M = $2.865. Plus 12,300 cached at $1.50/M = $0.018. Plus 5,400 reasoning at $75/M = $0.405. Plus our 10% margin = $5.97. Charge: $5.97."

Customers respect that level of transparency. Plus your audit team likes it.

## What not to do

- **Don't sample.** Recording one in ten LLM calls and extrapolating is a lawsuit waiting to happen.
- **Don't bill on requests instead of tokens.** Requests are a coarser unit; tokens are what your provider charges you for.
- **Don't change the rate card retroactively.** If you raise prices, snapshot the new rate going forward; historical reports stay at the old rate.
- **Don't bury margin in opaque per-message fees.** Customers find out, and they don't appreciate it.
- **Don't share LLM keys across tenants without a meter.** Even if it's "cheaper" in the short term, you've lost the cost attribution.

## Build vs adopt

You can build all of this in a sprint. The interesting work is your invoicing UX, your customer dashboard, your pricing strategy. The plumbing (meter every LLM call, snapshot a rate card, emit a CSV) is identical across every multi-tenant AI gateway. OpenClawMU ships it; LiteLLM has a thinner version; you can absolutely roll your own.

The point is: don't ship the product before you ship the meter. Customers without bills become very expensive customers very quickly.