AI Usage & Credit Billing¶
Every AI call is metered and billed against the organization (this kit is B2B multi-tenant; the org is the billing unit, which degrades naturally to "per user" for single-member orgs). Billing layers onto the existing Stripe subscription model — there is no parallel per-user credit system.
A call's tokens are charged in a fixed order:
- Monthly allotment — the plan's included token budget, resets each billing period.
- Credit-pack balance — purchased, non-expiring tokens.
- Overage — usage beyond both. Blocked by default (see Overage settings).
How a call is billed¶
backend/app/service/ai_usage_billing_service.py owns this. Two entry points:
has_ai_capacity(org_id, estimated_tokens)— a pre-flight check the copilot routes call before streaming, so an org that's out of capacity gets a cleanQuotaExceededinstead of a half-streamed answer. It estimates from input length (~4 chars/token).stream_and_record(...)/record_llm_usage(...)— after the call, the actualTokenUsageis split across allotment → credits → overage, the consumed buckets are debited, the USD cost is computed (see Model pricing), and a row is written tollm_usage_logtagged with thebucketit drew from.
For streaming calls, stream_and_record forwards text chunks to the client and bills the final TokenUsage once the stream ends — identical accounting to a non-streamed call.
The monthly allotment¶
The allotment is just one feature in the generic Plans & Usage system — the token_limit FeatureKey, metered per billing period and resetting each period. Set each plan's token budget alongside its other limits; it isn't a separate AI-only mechanism. An org with no token_limit has a zero allotment and must rely on credit packs.
Credit packs¶
Packs are configured in settings — settings.credit_packs, with a default in backend/app/config/settings.py and overridable via the FS_CREDIT_PACKS env var (JSON). The defaults:
| Pack | Tokens | Price |
|---|---|---|
pack_5m |
5,000,000 | $39 |
pack_10m |
10,000,000 | $69 |
pack_25m |
25,000,000 | $149 |
Change tokens or pricing in settings — no code edit, and no Stripe products to create (checkout builds the price inline via Stripe price_data). Purchased tokens are non-expiring and live in organization_credit_balance, separate from the per-period allotment.
Purchase flow (backend/app/api/route/credit_pack_route.py, all ORG_ADMIN):
| Endpoint | Purpose |
|---|---|
GET /billing/credit-packs |
List available packs |
POST /billing/credit-packs/checkout |
{ pack_id, return_base_url } → { url } — redirect the user to this Stripe Checkout URL |
The org must already have a Stripe customer (complete subscription billing setup first). Fulfillment is handled on the Stripe checkout.session.completed webhook (backend/app/api/route/stripe_webhook_route.py): CreditPackService.fulfill_purchase checks metadata.type == "credit_pack", credits the balance, and appends an organization_credit_transaction audit row.
Model pricing¶
USD cost per call is computed from the model_price table (seeded by migrations 006_model_price.sql + 008_more_model_prices.sql), which ships current OpenAI GPT-4.1 and GPT-5 series pricing (standard tier, per 1M tokens). A sample:
| Provider | Model | Input / 1M | Output / 1M |
|---|---|---|---|
| openai | gpt-5-mini (default) | $0.25 | $2.00 |
| openai | gpt-5 | $1.25 | $10.00 |
| openai | gpt-5-nano | $0.05 | $0.40 |
| openai | gpt-4.1-mini | $0.40 | $1.60 |
| openai | gpt-4o-mini | $0.15 | $0.60 |
…plus gpt-5.1, gpt-5.2, gpt-4.1, gpt-4o, and the -pro variants. LlmPricingService.compute_cost_usd looks up the price (cached) and multiplies by input/output tokens. If you point the copilot at a model that isn't in this table, calls will fail — add a row (or a migration) for any model you enable.
Usage reporting¶
backend/app/api/route/usage_route.py exposes:
| Endpoint | Role | Returns |
|---|---|---|
GET /usage/summary |
MEMBER |
used / limit / credit tokens for the period, plus the caller's own usage |
GET /usage/history |
MEMBER (own rows) · ORG_ADMIN+ (whole org) |
paginated per-call log (limit, offset) |
GET /usage/top-users |
ORG_ADMIN |
top users by tokens this period |
GET /usage/fleet |
SYSTEM_ADMIN |
fleet-wide rollup over days — backs the admin AI-usage page |
The user-facing usage page reads /summary + /history; the system-admin AI-usage page reads /fleet.
Overage settings¶
Overage is gated by two settings, both false by default, so usage beyond allotment + credits is blocked (a hard cap) rather than silently charged:
overage_enabled(org setting) — an org opts into overage.ai_overage_enabled(system setting,SYSTEM_ADMIN) — an operator-level kill switch; an org'soverage_enabledonly takes effect if this is also on.
Both use the existing generic setting mechanism (org settings via /organization/{id}/{key}, system settings via /system/{key}). Actually charging for overage (Stripe metered billing / invoice items) is intentionally not implemented yet — leave overage off until you wire it up.
Schema¶
Migration 007_ai_credit_billing.sql adds:
llm_usage_log— append-only per-request log (tokens, cost,bucket=allotment|credit|overage).organization_credit_balance— running non-expiring credit-pack balance.organization_credit_transaction— audit log of credit purchases/debits.
Next steps¶
- AI Integration — the LLM client, the sample copilot, and streaming.
- Stripe Integration — the subscription billing this builds on.