Skip to main content

Attribute cost per tenant for chargeback

The problem

You run a multi-tenant SaaS where each customer makes LLM calls through your platform. Finance needs to know exactly how much each customer spent last month on each model from each provider, with enough detail to either (a) bill the customer back for their usage or (b) allocate cost internally to their cost center. A raw OpenAI invoice can't tell you any of this — it's one lump sum for all of your traffic.

The approach

DVARA issues an API key per tenant, stamps every request with the tenant's identity, and tracks cost at the request level (tenant + API key + model + provider + tokens + USD). A scheduled chargeback job rolls this up into a monthly report per tenant, downloadable as PDF or CSV and ready to attach to the invoice.

Prerequisites

  • A running DVARA instance (Quickstart)
  • Admin / owner access on DVARA Flightdeck or an admin PAT
  • Provider credentials configured (environment, per-tenant BYOK, or vault — see Credentials & BYOK)
  • Model pricing entries for each (provider, model) pair you use — DVARA ships defaults for the major providers, but verify your pricing is current under Cost Management → Pricing

The steps

1. Create one tenant per customer

For each customer, create a tenant. The tenant carries the customer's name, status, region, and optional metadata:

TENANT_ID=$(curl -s -X POST https://dvara.internal.example.com/v1/admin/tenants \
-H "Authorization: Bearer <admin-pat>" \
-H "Content-Type: application/json" \
-d '{
"name": "Acme Corporation",
"region": "us-east-1"
}' | jq -r .id)

echo "Tenant ID: $TENANT_ID"

The Automation API generates a server-side UUID for the tenant. Capture it from the response (client-supplied id is ignored) and use $TENANT_ID in every URL below. Tenant IDs show up in every audit event, cost record, and chargeback report — record the mapping (UUID → customer name) somewhere your billing system can read.

2. Issue an API key per tenant

Each customer-facing application gets its own API key, scoped to one tenant. The gateway reads the key on every request, hashes it, looks up the tenant, and stamps the request:

curl -s -X POST https://dvara.internal.example.com/v1/admin/tenants/$TENANT_ID/keys \
-H "Authorization: Bearer <admin-pat>" \
-H "Content-Type: application/json" \
-d '{
"name": "acme-production-app"
}'

The response contains the plaintext key once. Store it in the customer app's secret manager; you can't retrieve it again after the response.

For customers with multiple applications (a web UI, a batch job, a mobile backend), issue one key per application. That way cost records separate naturally and you can revoke one without affecting the others.

3. Route application traffic through DVARA

The customer's applications keep their existing code and SDKs — they just point at DVARA and use the issued key. See Add DVARA to an existing OpenAI-SDK app for the client-side change. On every request, DVARA:

  1. Hashes the Authorization: Bearer <key> value
  2. Looks up the tenant (sub-millisecond — the gateway caches the key-to-tenant mapping)
  3. Dispatches the request and records the outcome under that tenant

4. Let cost records accumulate

For every successful request, DVARA writes a cost record: tenant ID, API key ID, model, provider, input tokens, output tokens, USD cost (input tokens × input price + output tokens × output price from the pricing table), and a timestamp.

Verify cost is flowing under your tenant:

curl -s "https://dvara.internal.example.com/v1/admin/costs/summary?tenantId=$TENANT_ID&from=2026-04-01T00:00:00Z&to=2026-04-19T00:00:00Z" \
-H "Authorization: Bearer <admin-pat>"

Use the /summary variant for aggregated totals over a date range; the bare /v1/admin/costs endpoint lists raw rows and does not accept from / to filters. Dates must be full ISO-8601 instants — bare YYYY-MM-DD is rejected with a 400.

or open the Cost Management dashboard in DVARA Flightdeck and filter by tenant — see Cost Management for the full walkthrough including forecasts and anomaly alerts.

5. Generate a monthly chargeback report

At month-end, generate a chargeback report per tenant:

curl -s -X POST https://dvara.internal.example.com/v1/admin/chargeback \
-H "Authorization: Bearer <admin-pat>" \
-H "Content-Type: application/json" \
-d '{
"tenant_id": "'"$TENANT_ID"'",
"from": "2026-04-01T00:00:00Z",
"to": "2026-04-30T23:59:59Z"
}'

Body field naming on the Automation API isn't uniform — it follows each DTO's @JsonProperty annotations. The chargeback request body uses tenant_id (snake_case); the budget cap body elsewhere on this page uses tenantId (camelCase). When in doubt, match the example for the endpoint you're calling. Dates must be full ISO-8601 instants — bare YYYY-MM-DD is rejected with 400.

The response carries the report ID. Download it in either format:

# PDF for attaching to invoices
curl -s "https://dvara.internal.example.com/v1/admin/chargeback/<report-id>/pdf" \
-H "Authorization: Bearer <admin-pat>" -o acme-april-2026.pdf

# CSV for loading into a billing system
curl -s "https://dvara.internal.example.com/v1/admin/chargeback/<report-id>/csv" \
-H "Authorization: Bearer <admin-pat>" -o acme-april-2026.csv

Or schedule it: set dvara.llm-gateway.finops.chargeback-schedule to a cron expression ("0 0 1 * *" for the 1st of each month) and DVARA generates reports for every active tenant automatically.

Extending the attribution

Attribute by cost center, project, or product line, not just by tenant. If a single tenant runs multiple products that need separate cost allocation internally, stamp requests with tags via the metadata.tags map. The OpenAI Python SDK passes the field through with extra_body:

client.chat.completions.create(
model="gpt-4o",
messages=[...],
extra_body={"metadata": {"tags": {"Project": "customer-support-bot"}}},
)

Tags must be string-to-string. DVARA records them on every cost record from this request and indexes them for tag-aware queries. Then aggregate cost by tag:

curl -s "https://dvara.internal.example.com/v1/admin/costs/summary/by-tag?key=Project&from=2026-04-01T00:00:00Z&to=2026-04-30T23:59:59Z" \
-H "Authorization: Bearer <admin-pat>"

See Cost Management → Tags for the full tag cascade.

Enforce budgets, don't just measure. Attribution tells you what was spent after the fact; a budget cap stops the bleeding before it hits the invoice. Create a per-tenant monthly hard cap:

curl -s -X POST https://dvara.internal.example.com/v1/admin/budgets \
-H "Authorization: Bearer <admin-pat>" \
-H "Content-Type: application/json" \
-d '{
"tenantId": "'"$TENANT_ID"'",
"name": "acme-monthly",
"period": "MONTHLY",
"limitUsd": 500,
"softLimitPct": 80,
"enabled": true
}'

Field names are camelCase (tenantId, limitUsd, softLimitPct); period values are the uppercase enum (DAILY / WEEKLY / MONTHLY).

When the tenant hits 80%, DVARA fires a soft-limit webhook and audit event. At 100%, requests are rejected with BUDGET_CAP_HARD (HTTP 402). See Cost Management → Budget Caps.

Verification

CheckWhat you should see
GET /v1/admin/costs?tenantId=$TENANT_IDRows with matching tenant, model, provider, and non-zero USD cost
GET /v1/admin/costs/summary?tenantId=$TENANT_ID&from=…&to=…Aggregate totals for the period that sum to the expected USD
Chargeback PDFTenant name, period, line items per model/provider, total USD
Audit eventsGATEWAY_REQUEST events carrying the tenant's UUID

If costs look zero despite traffic flowing, check that pricing entries exist for the (provider, model) pairs being called — a missing pricing row records tokens but zero USD. Add via POST /v1/admin/pricing or the Pricing UI.

Next steps

  • Multi-Tenancy — how tenant isolation is enforced end-to-end (API keys, budgets, policies, audit)
  • Cost Management (Flightdeck) — dashboard, forecasts, anomaly alerts, tags, budget caps, chargeback
  • SIEM & Webhooks — route soft-limit alerts to Slack or PagerDuty instead of just emitting audit events