Analytics
Every request through DVARA emits cost, latency, token, policy-decision, audit, and trace telemetry. That data fans out across three lenses — operator dashboards in the DVARA Flight Deck, self-service tenant views in the Portal, and raw machine-readable feeds (Prometheus, OpenTelemetry, structured logs, SIEM streams) you point your own observability stack at.
This page is a survey of what's available and where to find it. For deep dives, follow the links into the per-feature documentation.
The three lenses
| Lens | Audience | What it gives you |
|---|---|---|
Console dashboards (/, /costs, /audit, /latency, /analytics, …) | Platform operators, security admins, finance | Cross-tenant operational view with HTMX-polled fragments (3–10s refresh), role-based access controls, and click-through detail. The fastest way to answer "what's happening right now across the fleet". |
Portal pages (/portal, /portal/usage, /portal/audit, …) | Tenant users — application developers, team leads, auditors | The same dashboards narrowed to the caller's own tenant. Self-service for tenant admin / developer / viewer roles; the operator never has to grant Console access. |
| Raw feeds (Prometheus scrape, OTLP push, stdout JSON logs, Kafka/Splunk/CloudWatch SIEM) | Your own observability stack — Grafana, Datadog, Tempo, Jaeger, Splunk, ELK, … | Machine-readable telemetry meant to feed your existing dashboards, alerts, and SIEM. DVARA does not try to replace your observability platform; it emits the standard formats so you don't have to. |
Use the dashboards for day-to-day operations and the raw feeds for long-term retention, custom dashboards, and alerting. Reports (chargeback, compliance) are the one-shot PDF / CSV artifacts you hand to finance or auditors.
Quick orientation by audience
| You are a … | Start here |
|---|---|
| Platform operator running the gateway | Console Dashboard (/) for live health; Console Latency (/latency) for routing decisions; Prometheus scrape for alerting |
| Security / compliance lead | Console Audit (/audit) for the live event stream; Compliance Reports (/compliance) for SOC2 / HIPAA / GDPR rollups; SIEM stream for long-term retention |
| Finance / FinOps | Console Cost Dashboard (/costs) for live spend; Chargeback Reports (/chargeback) for monthly per-tenant PDFs; gateway_cost_dollars_total Prometheus counter for time-series budget burn |
| Tenant developer or team lead | Portal Dashboard (/portal) for your tenant's request counts, tokens, and cost; Portal Usage (/portal/usage) for per-model breakdowns; Portal Audit for your own audit trail |
| Agent platform team | Console Agent Analytics (/analytics) for session outcomes, approval funnels, tool-call heatmaps; Console MCP Sessions (/mcp/sessions) for per-session timelines |
The full surface
Every analytic the platform exposes, what it shows, where to view it, and the underlying endpoint.
Live dashboards
| Analytic | What it shows | Where to view | Endpoint |
|---|---|---|---|
| Operational dashboard | Req/s, P95 latency, error rate, tokens/min, provider health, cache stats, priority-admission stats | Console / | GET /actuator/prometheus + GET /actuator/gateway-status |
| Per-tenant dashboard | Request count, token usage, 30-day cost, active API keys, recent audit events — scoped to caller's tenant | Portal /portal | PortalDashboardController (direct repo reads) |
| Cost dashboard | Total cost, Cost-by-Provider donut, Cost-by-Model bar, budget status, forecast cards, anomaly alerts, raw records table | Console /costs | GET /v1/admin/costs, /costs/summary, /costs/forecast, /costs/anomalies |
| Latency dashboard | EWMA latency per (provider, model) — sample count, raw last sample, freshness badge | Console /latency (owner + policy-admin) | GET /v1/admin/latency |
| Priority admission stats | Concurrent / max concurrent / load %, per-tier admitting vs throttling | Dashboard priority card (when routing.priority.enabled=true) | GET /v1/admin/priority/stats |
| Semantic cache stats | Hit rate, miss count, cache size, average similarity on hits | Console /cache (owner only) | GET /v1/admin/cache/stats |
| License status | Licensee, type, expiry, days remaining, runtime status badge | Console /license, Portal license panel | GET /actuator/gateway-status |
Telemetry & event streams
| Analytic | What it shows | Where to view | Endpoint |
|---|---|---|---|
| Token usage telemetry | Per-request input/output/total tokens, summary KPI card, masked API key, estimated-flag tooltip | Console /token-usage, Portal /portal/usage | GET /v1/admin/token-usage, …/summary |
| Audit event stream | Append-only HMAC-chained events — policy decisions, auth, config imports, license transitions; click-to-expand detail; CSV / JSON export | Console /audit, Portal /portal/audit | GET /v1/admin/audit/events, …/export, …/export/json |
| Budget cap usage | Current period spend vs limit, color-coded progress bar, period boundaries, dollars remaining | Console /budgets, Portal /portal/budgets | GET /v1/admin/budgets/{id}/usage |
| MCP tool-call telemetry | Per-call records (tenant, session, server, tool, status, latency, response bytes, PII flag); aggregated counts | Console /mcp/tool-calls, Portal /portal/mcp/tool-calls | GET /v1/admin/mcp/tool-calls, …/summary |
| Agent session timeline | Session list + detail with summary stats (avg latency, error rate, duration, response bytes) + distinct-tools table + per-call timeline + JSON export | Console /mcp/sessions, Portal /portal/mcp/sessions | GET /v1/admin/sessions, …/{id}/timeline |
| Approval queue stats | Pending + decision history, action badges, nav badge count | Console /approvals, Portal /portal/approvals | GET /v1/admin/approvals/pending, …/history, …/count |
Comparison & experimentation analytics
| Analytic | What it shows | Where to view | Endpoint |
|---|---|---|---|
| Canary A/B report | Per-variant metrics on a canary-routed route; reset + live split-percentage update | Console /routes/{id}/canary | GET /v1/admin/routes/{id}/canary/report |
| Shadow policy report | Divergence stats (1h / 24h / 7d) + per-event diff between active and SHADOW policy | Console /policies/{id}/shadow | GET /v1/admin/policies/{id}/shadow/stats, …/events |
| Shadow routing report | Provider-vs-shadow comparison on a route (latency, status, response delta) | Console /routes/{id}/shadow | GET /v1/admin/routes/{id}/shadow/report |
| Prompt experiment metrics | Per-variant A/B metrics snapshot for a prompt experiment | Console + Portal experiment pages | GET /v1/admin/prompts/experiments/{id}/report |
| Agent behaviour analytics | Session-outcomes donut, approval-funnel bar, top-20 agent leaderboard, tool-call heatmap (server × tool), policy firing frequency | Console /analytics | AnalyticsController (composed from MCP + audit repos) |
| Model fingerprint drift | Drift report per golden prompt — when an upstream model's response semantics shifted | Console eval/fingerprint surface | GET /v1/admin/golden-prompts/{id}/drift |
| Eval pipeline reports | Per-model eval suite results vs golden corpus, drift detection, time-ranged report queries | Console /eval/reports | GET /v1/admin/eval/reports, …/model/{model} |
Downloadable reports
| Analytic | What it shows | Where to view | Endpoint |
|---|---|---|---|
| Chargeback reports | Per-tenant monthly rollup: tenant summary, model/provider breakdown, daily cost trend, forecasts, anomalies — PDF + CSV | Console /chargeback, Portal /portal/chargeback | POST /v1/admin/chargeback, GET …/{id}/pdf, …/{id}/csv |
| Compliance reports | SOC2 / HIPAA / GDPR rollups aggregated from audit + token usage + tenant + policy data — PDF | Console /compliance, Portal /portal/compliance | POST /v1/admin/reports, GET …/{id}/pdf |
Raw feeds — point your own stack at these
| Analytic | What it shows | Where to view | Endpoint |
|---|---|---|---|
| Prometheus metrics | ~35 named counters / histograms — gateway_requests_total, gateway_latency_seconds, gateway_tokens_total, gateway_cost_dollars_total, gateway_guardrail_blocked_total, gateway_budget_blocked_total, mcp_tool_calls_total, dvara_emails_sent_total, etc. | Your Grafana / Datadog / Mimir | GET /actuator/prometheus on gateway-server, mcp-proxy-server, flightdeck — Bearer $DVARA_ACTUATOR_METRICS_API_KEY |
| OpenTelemetry traces | Distributed spans — gateway.provider.chat, …stream, …embed; mcp.filter.*, mcp.server.call — with traceparent propagation | Your OTel collector → Jaeger / Tempo / Datadog | OTLP push to OTEL_EXPORTER_OTLP_ENDPOINT (default http://localhost:4318/v1/traces) |
| Structured access log | One JSON line per request: trace_id, method, path, status, latency_ms, model, provider, cache_status, masked api_key, tenant_id, token counts, error_code | Your log aggregator | stdout (Logback LogstashEncoder) |
| SIEM stream | All audit events forwarded to Splunk HEC, CloudWatch Logs, or Kafka | Your SIEM | dvara.llm-gateway.siem.{splunk|cloudwatch|kafka}.* |
Per-tenant vs platform scoping
Almost every analytic in the tables above respects DVARA's multi-tenant scoping rules:
- Platform-role callers (
owner/policy-admin/billing-admin) see cross-tenant data by default. Most list endpoints accept an optional?tenant_id=filter to narrow. - Tenant-role callers (
admin/developer/viewer) only ever see their own tenant's rows. Passing another tenant's id returns HTTP 403 with aTENANT_SCOPE_VIOLATIONaudit event.
The Portal pages enforce the same rule at the URL level — a tenant user lands at /portal/* and physically cannot construct a path that crosses tenants. See Multi-Tenancy for the full isolation model.
What's deliberately not in the box
A few things the platform deliberately leaves to your existing tools:
- Long-term metric retention beyond the Prometheus scrape window — point a Mimir / Thanos / Datadog at the scrape endpoint.
- Custom dashboards beyond the built-in Console / Portal views — Grafana on top of the Prometheus + OTLP feeds covers the long tail.
- Alerting — Prometheus AlertManager or Grafana alerts on the same metrics; the platform emits the data, not the alert routing.
- Cross-system correlation (joining DVARA spans against your service's app spans) — handled by your APM via the
traceparentpropagation DVARA already emits.
Where to go next
- DVARA Flight Deck — the full Console tour, including every dashboard listed above with screenshots.
- Cost Management — cost dashboard, forecasts, anomaly alerts, budget caps, chargeback reports.
- Observability — audit event taxonomy, Prometheus metrics list, OpenTelemetry span names, SIEM configuration.
- Multi-Tenancy — the scoping rules that govern who sees what data.
- Admin API reference — programmatic access to every analytic above for CI/CD, IaC, and custom integrations.