Skip to main content

Dashboard

The landing page at / displays live operational panels that auto-refresh without a page reload. The live-metrics and provider-health panels poll every 3 seconds; the latency and semantic-cache panels poll every 5 seconds. Data is powered by the LLM Gateway's Prometheus scrape endpoint at /actuator/prometheus (the DVARA Flightdeck authenticates with DVARA_ACTUATOR_METRICS_API_KEY automatically) and by in-process state (latency tracker, response cache).

DVARA Flightdeck with live metrics, provider health, and connection indicatorDVARA Flightdeck with live metrics, provider health, and connection indicator
Figure 1. DVARA Flightdeck with live metrics, provider health, and connection indicator

Live metrics

Four metric cards at the top of the page:

  • Requests/sec — average request throughput since startup
  • P95 Latency — 95th percentile response time (milliseconds)
  • Error Rate — percentage of 4xx/5xx responses
  • Tokens/min — average token throughput

Metrics are derived from counters and histograms exposed by the LLM Gateway. If the connection indicator at the top-right is red (LLM Gateway unreachable), these cards show stale or zero values but the DVARA Flightdeck continues to function.

Dashboard metric cards row — req/s, P95, error rate, tokensDashboard metric cards row — req/s, P95, error rate, tokens
Figure 2. Dashboard metric cards row — req/s, P95, error rate, tokens

Provider health

A table showing every registered LLM provider with:

  • Provider name and type
  • Health status (HEALTHY / DEGRADED / UNHEALTHY) with color-coded badges
  • Capability pills: Streaming, Vision, Tools, Structured, JSON
  • Maximum context token window

Use this panel to quickly spot provider outages, rate-limit throttling, or capability mismatches before they start failing end-user requests.

Provider Health panel with capability pillsProvider Health panel with capability pills
Figure 3. Provider Health panel with capability pills

Provider latency (EWMA)

A table showing every tracked provider + model pair with its current EWMA latency, color-coded against fixed thresholds, and sorted slowest-first so outliers surface at the top:

  • Provider + model pair
  • Current EWMA as a health-badge: green under 1 second, amber 1–5 seconds, red above 5 seconds
  • Sample count so you can tell whether the EWMA is built from enough data to trust

This is the same data the latency-aware routing strategy reads — seeing it on the Dashboard lets operators validate the numbers the router is making decisions against without running a Prometheus query. The tile polls every 5 seconds and shows an empty-state message until the first samples arrive.

Semantic cache

Four metric cards (Hit rate, Cached entries, Hits, Misses) plus a Clear cache button. The cache is a fleet-wide resource, so the clear button carries a confirmation dialog — any in-flight request that would have hit the cache will run against the upstream provider instead.

Tile visibility is gated on dvara.llm-gateway.cache.enabled. When the feature is disabled, neither the tile nor its 5-second poll fires — no wasted network round trips and no stale-looking panel on a deployment where caching is off.

The Cost Dashboard is the better place to see the dollar impact of cache hits; this tile is purely the operational view of hit rate and size.

Priority admission

When dvara.llm-gateway.routing.priority.enabled=true, a priority admission card appears on the dashboard showing concurrent load (now / max), the current load percentage, and a per-tier table (premium / standard / bulk) with each tier's throttle threshold and a live admitting-vs-throttling badge. The card polls every 5 seconds. When priority is disabled (the default), the card is hidden — no wasted poll, no empty panel.

For historical trends, alerting, and per-tenant cardinality, the same data is exported to Prometheus as gateway_priority_requests_total and gateway_priority_throttled_total (labelled by tenant and tier). See Routing → SLA-aware priority routing.

What's not on this page

Some observability signals are intentionally Prometheus-only — they're high-cardinality (per-tenant, per-model counters) and belong in the same pane of glass as the rest of your service metrics:

  • Per-tenant / per-model latency histograms — the EWMA table above gives a point-in-time summary; the full histogram with percentiles and over-time trending lives in Prometheus as gateway_latency_seconds.
  • Per-tenant cost and token throughput — see the Cost Dashboard for the dollar / token rollups, and gateway_cost_dollars_total / gateway_tokens_total in Prometheus for the raw counters.

Point Grafana at /actuator/prometheus on the gateway for the full counter and histogram set. If you scrape /actuator/prometheus directly from Prometheus or Grafana Agent, set the scrape's bearer-token to the same DVARA_ACTUATOR_METRICS_API_KEY value — that endpoint is no longer anonymous, and a distinct secret from DVARA_ACTUATOR_API_KEY so a leaked scrape token can't unlock the license envelope. A worked Prometheus scrape config is deferred to the Security section in 1.1.0-GA.