Skip to main content

Observability

DVARA provides structured JSON logging, Prometheus metrics, token usage metering, and request tracing out of the box.

X-Trace-ID Propagation

Every response includes an X-Trace-ID header for request correlation. This applies to both the LLM gateway (port 8080) and the MCP Proxy (port 8070).

Behavior:

  • If the incoming request includes an X-Trace-ID header, the same value is echoed back.
  • When OpenTelemetry tracing is active, the OTel 32-character hex trace ID is used if no client header is present.
  • Otherwise, the gateway generates a new random 32-character hex ID.
  • The trace ID is embedded in every error response body as error.trace_id.
  • The trace ID is added to SLF4J MDC as trace_id for inclusion in all structured log lines.
# With custom trace ID
curl -i -H "X-Trace-ID: my-custom-trace-001" http://localhost:8080/v1/models
# → X-Trace-ID: my-custom-trace-001

# Without trace ID (auto-generated)
curl -i http://localhost:8080/v1/models
# → X-Trace-ID: a6783439db1f46a6bfed511a0011e955

X-Session-Id Header

Both the LLM gateway and MCP Proxy accept an optional X-Session-Id header for agent session correlation. When present, the session ID is:

  • Stored as a servlet request attribute (sessionId)
  • Added to SLF4J MDC as session_id for structured log correlation
  • Attached as a high-cardinality attribute on OpenTelemetry spans

This allows traces from multiple LLM turns and MCP tool calls within a single agent session to be correlated by session ID.

# LLM request with session ID
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Session-Id: agent-session-42" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}'

# MCP request with same session ID
curl http://localhost:8070/mcp/filesystem/tools/call \
-H "Content-Type: application/json" \
-H "Authorization: Bearer gw_mykey" \
-H "X-Session-Id: agent-session-42" \
-d '{"name":"read_file","arguments":{"path":"/data/file.txt"}}'

Structured JSON Logging

DVARA emits structured JSON logs by default. Every log line is a JSON object, ready to ingest into ELK, Loki, Datadog, Splunk, or any other JSON-aware log pipeline.

Configuration

  • JSON mode (default) — Every log line is a JSON object with @timestamp, level, message, logger_name, and MDC fields.
  • Plain-text mode — Human-readable output for local development. Activate with the log-plain profile (SPRING_PROFILES_ACTIVE=log-plain).

Log fields

Every access log line and every gateway log line within the scope of a request carry these fields:

FieldDescription
trace_idRequest correlation ID (see above)
session_idAgent session ID (from X-Session-Id header, if present)
tenant_idTenant identifier
modelRequested model name
providerSelected provider
methodHTTP method
pathRequest URI
statusHTTP response status
latency_msRequest duration in milliseconds
api_keyAPI key (masked to first 8 chars)
cache_statusHIT or MISS
streamWhether request was streaming
tokens_promptPrompt token count
tokens_completionCompletion token count
tokens_totalTotal token count
error_codeGateway error code (if any)
priority_tierPriority admission tier the request was admitted under (premium / standard / bulk), present only when priority admission control is enabled

Access Log Example

Every request produces a single structured access log entry:

{
"@timestamp": "2026-02-25T10:23:45.123Z",
"level": "INFO",
"message": "Request completed",
"logger_name": "dvara.access",
"trace_id": "a6783439db1f46a6bfed511a0011e955",
"tenant_id": "acme-corp",
"model": "gpt-4o",
"provider": "openai",
"method": "POST",
"path": "/v1/chat/completions",
"status": "200",
"latency_ms": "342",
"api_key": "sk-prod-1...",
"cache_status": "MISS",
"tokens_prompt": "150",
"tokens_completion": "85",
"tokens_total": "235",
"service": "dvara-gateway"
}

Prometheus Metrics

DVARA exposes a Prometheus-format metrics endpoint out of the box.

Scrape Endpoint

GET /actuator/prometheus
Authorization: Bearer $DVARA_ACTUATOR_METRICS_API_KEY

The endpoint is authenticated — configure your Prometheus job's bearer_token_file to point at the DVARA_ACTUATOR_METRICS_API_KEY value. The metrics secret is intentionally distinct from DVARA_ACTUATOR_API_KEY (which guards /actuator/gateway-status) so a leaked scrape token can't unlock the license envelope. A worked Prometheus scrape config (bearer_token_file placement, metrics_path: /actuator/prometheus) is deferred to the Security section in 1.1.0-GA; for now, configure Prometheus per its standard documentation pointing at the gateway's /actuator/prometheus endpoint with the metrics-key bearer.

Available Metrics

Core Metrics

MetricTypeLabelsDescription
gateway_requests_totalCountertenant, model, provider, status, regionTotal gateway requests
gateway_latency_secondsHistogramtenant, model, provider, status, regionRequest latency (P50/P95/P99)
gateway_tokens_totalCountertenant, model, directionToken usage (direction=input/output)
gateway_provider_errors_totalCounterprovider, error_codeProvider errors
gateway_retries_totalCounterproviderRetry attempts
gateway_fallbacks_totalCounterfrom_provider, to_providerFallback activations

Policy & Routing Metrics

MetricTypeLabelsDescription
gateway_policy_shadow_divergence_totalCounterpolicy_id, rule_id, divergence_typeShadow policy divergence events
gateway_canary_requests_totalCounterroute_id, variant, modelCanary A/B test requests
gateway_shadow_requests_totalCounterroute_id, primary_provider, shadow_providerShadow traffic routing events
gateway_priority_requests_totalCountertenant, tierPriority-routed requests
gateway_priority_throttled_totalCountertenant, tierPriority admission rejections

FinOps Metrics

MetricTypeLabelsDescription
gateway_cost_dollars_totalCountertenant, model, providerCumulative cost in USD
gateway_budget_blocked_totalCountertenant, budget_idHard budget cap rejections
gateway_budget_soft_alert_totalCountertenant, budget_idSoft budget cap alerts
gateway_budget_warning_totalCountertenant, budget_idBudget warning policy triggers
gateway_model_downgrades_totalCountertenant, original_model, downgraded_modelAutomatic model downgrades
gateway_cost_anomaly_totalCountertenant, modelCost anomaly detections

Guardrail & Security Metrics

MetricTypeLabelsDescription
gateway_guardrail_blocked_totalCountertenant, categoryGuardrail BLOCK actions
gateway_guardrail_flagged_totalCountertenant, categoryGuardrail FLAG actions
gateway_schema_validations_totalCountertenant, model, resultOutput schema validations
gateway_schema_retries_totalCountertenant, modelSchema validation retries
gateway_context_window_warnings_totalCountertenant, modelContext window threshold warnings
gateway_context_window_pruned_totalCountertenant, model, strategyContext window pruning events
gateway_mcp_injection_detections_totalCountertenant, server_id, actionMCP injection detections
gateway_ml_guardrail_totalCounterprovider, category, actionML-classifier guardrail decisions (Lakera, ShieldGemini)
gateway_plugin_guardrail_totalCounterplugin, category, actionExternal guardrail plugin decisions
gateway_grounding_check_totalCountergrounded, actionEmbedding-based grounding check outcomes

Intelligent Routing & Config Metrics

MetricTypeLabelsDescription
gateway_intelligent_routing_totalCountercomplexity, selected_modelIntelligent routing selections by complexity tier
gateway_config_import_totalCountermode, dry_runConfig import operations via the Automation API

Flightdeck-only metrics

These counters live in the flightdeck app's MeterRegistry (not in gateway-server's), so they're only scrapeable from flightdeck:8090/actuator/prometheus with the flightdeck's own DVARA_ACTUATOR_METRICS_API_KEY. Configure a second Prometheus job pointed at the flightdeck pod if you want email-pipeline health on your dashboards.

MetricTypeLabelsDescription
dvara_emails_sent_totalCountertemplate, transport, resultEmail send attempts. resultSUCCESS / TRANSIENT / PERMANENT / MAX_ATTEMPTS_EXCEEDED. transportlog / smtp / resend.
dvara_emails_retried_totalCountertemplate, attemptRetries scheduled by the durability layer (counts retry attempt N, not the original send).

MCP Proxy Metrics

MetricTypeLabelsDescription
mcp_tool_calls_totalCountertenant, server_id, tool_name, statusMCP tool call count
mcp_tool_call_latency_secondsHistogramserver_id, tool_nameMCP tool call latency
mcp_approval_requests_totalCountertenant, server_id, tool_nameApproval gate requests
mcp_approval_granted_totalCountertenantApprovals granted
mcp_approval_denied_totalCountertenantApprovals denied
mcp_approval_timeout_totalCountertenantApproval timeouts
mcp_agent_loop_detected_totalCountertenant, loop_typeAgent loop detection fires
mcp_agent_sessions_killed_totalCountertenantAgent sessions killed

Configuration

Metrics are enabled by default in application.yml. The exact exposure.include list differs per app:

# gateway-server (port 8080)
management:
endpoints:
web:
exposure:
include: health,prometheus,gateway-status,info
prometheus:
metrics:
export:
enabled: true
AppDefault include
gateway-server (port 8080)health,prometheus,gateway-status,info
mcp-proxy-server (port 8070)health,prometheus,info (no gateway-status — that endpoint is gateway-server-only)
flightdeck (port 8090)health,prometheus,info

Dropping gateway-status from the gateway-server list breaks the Flightdeck License page (it reads license metadata from /actuator/gateway-status); dropping info breaks anonymous build-info polling by uptime monitors.

Grafana Dashboard

Example PromQL queries:

# Request rate by provider
rate(gateway_requests_total[5m])

# P95 latency by model
histogram_quantile(0.95, rate(gateway_latency_seconds_bucket[5m]))

# Token throughput by tenant
rate(gateway_tokens_total[5m])

# Error rate by provider
rate(gateway_provider_errors_total[5m])

Token Usage Metering

Every non-streaming chat request records token usage to the DVARA token ledger, backed by PostgreSQL. Records are queryable through the Automation API below.

Query Endpoints

# List all token usage records
curl http://localhost:8090/v1/admin/token-usage

# Filter by tenant
curl http://localhost:8090/v1/admin/token-usage?tenantId=acme-corp

# Filter by API key
curl http://localhost:8090/v1/admin/token-usage?apiKey=sk-prod-123

# Filter by model
curl http://localhost:8090/v1/admin/token-usage?model=gpt-4o

# Aggregated summary
curl "http://localhost:8090/v1/admin/token-usage/summary?tenantId=acme-corp&model=gpt-4o"

Record Fields

FieldTypeDescription
idstringUnique record ID (UUID)
tenantIdstringTenant identifier
apiKeystringAPI key used
modelstringModel requested
providerstringProvider that served the request
inputTokensintPrompt tokens
outputTokensintCompletion tokens
totalTokensintTotal tokens
estimatedbooleanWhether counts are estimated
timestampISO 8601When the request was made

Summary Response

{
"tenantId": "acme-corp",
"model": "gpt-4o",
"totalInputTokens": 15000,
"totalOutputTokens": 8500,
"totalTokens": 23500,
"requestCount": 42
}

Pre-Built Grafana Dashboards

Five production-ready Grafana dashboards ship in the dvarahq/dvara-examples repo under grafana/dashboards/ — not in the gateway distribution itself. Clone or download that repo to use them; every path below is relative to its root.

DashboardFileDescription
Gateway Overviewdvara-overview.jsonRequest volume, latency (P50/P95/P99), error rates, provider health, token usage, cost/hour
FinOps & Budgetdvara-finops.jsonCost by tenant/model/provider, budget enforcement, model downgrades, anomalies
MCP Proxy & Agenticdvara-mcp.jsonTool calls, agent sessions, loop detection, approval gates, injection detection
Policy, Routing & Fleetdvara-policy-routing.jsonShadow policy divergence, canary testing, priority routing, config sync, fleet health
Infrastructuredvara-infrastructure.jsonJVM, connection pool, Hazelcast cache, PostgreSQL — gateway-host process health

One-Command Setup

From inside the cloned dvara-examples repo:

docker compose -f docker-compose.yml -f grafana/docker-compose.monitoring.yml up

This starts Prometheus (port 9090) and Grafana (port 3000, admin/dvara) with dashboards auto-provisioned.

Manual Import

for f in grafana/dashboards/*.json; do
curl -X POST http://admin:dvara@localhost:3000/api/dashboards/db \
-H "Content-Type: application/json" \
-d "{\"dashboard\": $(cat "$f"), \"overwrite\": true}"
done

Alerting Rules

grafana/alerts/dvara-alerts.yml in dvarahq/dvara-examples defines 12 Prometheus alerting rules:

AlertSeverityTrigger
DvaraHighErrorRatecriticalError rate > 5% for 5 min
DvaraHighP95LatencywarningP95 > 5s for 5 min
DvaraProviderErrorSpikewarningProvider errors > 1/sec for 3 min
DvaraCircuitBreakerOpencriticalErrors with zero successes for 2 min
DvaraBudgetHardLimitcriticalHard budget cap hit
DvaraBudgetSoftLimitwarningSoft limit breached
DvaraCostAnomalywarningCost exceeds baseline
DvaraGuardrailBlockswarningGuardrail blocks > 0.1/sec
DvaraAgentLoopDetectedwarningAgent loop detected
DvaraApprovalTimeoutswarningApproval gate timeouts
DvaraMcpToolErrorRatewarningMCP error rate > 10%
DvaraInjectionDetectedcriticalPrompt injection detected

Alert names are Title-cased Dvara*, not all-caps DVARA* — match them exactly when wiring into PagerDuty / OpsGenie routing rules or runbook entries.

Datadog Integration

The Datadog assets — Agent config and pre-built monitors — also ship in the dvarahq/dvara-examples repo, under datadog/. Clone that repo to use them.

OpenMetrics Scraping

Copy the Datadog Agent config to scrape all DVARA Prometheus metrics:

cp datadog/conf.d/dvara.yaml /etc/datadog-agent/conf.d/openmetrics.d/dvara.yaml
sudo systemctl restart datadog-agent

OTLP Traces to Datadog

Configure the Datadog Agent as an OTLP collector, then point DVARA to it:

OTEL_EXPORTER_OTLP_ENDPOINT=http://datadog-agent:4318/v1/traces

Pre-built Datadog monitors are provided in datadog/monitors.yaml.

Health Endpoints

The gateway uses a two-key model for authenticated actuator endpoints, with the probe paths left anonymous so container orchestrators don't need credentials. The worked Prometheus bearer_token_file scrape config is deferred to the Security section in 1.1.0-GA.

EndpointAuthDescription
GET /actuator/healthAnonymous (permitAll)Liveness summary. Returns {"status":"UP|DOWN"} only — detail is gated by management.endpoint.health.show-details=when-authorized.
GET /actuator/health/livenessAnonymous (k8s probe)Liveness probe target.
GET /actuator/health/readinessAnonymous (k8s probe)Readiness probe target — fails closed when the database, license, or any other registered indicator reports DOWN.
GET /actuator/infoAnonymousBuild info (app.*, build.*).
GET /actuator/gateway-statusAuthorization: Bearer $DVARA_ACTUATOR_API_KEYRich gateway status: mode, version, providers, routes, rate limits, license, warnings, uptime. Powers the DVARA Flightdeck License page.
GET /actuator/prometheusAuthorization: Bearer $DVARA_ACTUATOR_METRICS_API_KEYPrometheus scrape endpoint. The metrics secret is intentionally distinct from DVARA_ACTUATOR_API_KEY — by the principle of least privilege, a leaked scrape token must not unlock the gateway-status surface.
/actuator/gateway-status exposes metadata only

Even with DVARA_ACTUATOR_API_KEY, the gateway-status endpoint returns license metadata (licensee name, expiry, ID, runtime status) — never the raw DVARA_LICENSE_KEY envelope. Once validated at startup the envelope stays in process memory; no API surfaces it. Provider API keys, vault credentials, and the audit HMAC secret are likewise never returned by this endpoint — only operator-facing metadata.

| GET /v1/models | Tenant API key | Lists all registered providers and models with capabilities. | | GET /v1/admin/providers/\{id\}/capabilities | Automation API auth | Provider-specific capability details. |

/actuator/env, /heapdump, /threaddump, /beans, /mappings, /configprops, /loggers, /scheduledtasks, /caches, /sessions, and /quartz are excluded from the registry and return 404 regardless of authentication.

OpenTelemetry Distributed Tracing

DVARA includes OpenTelemetry distributed tracing out of the box. Traces are automatically created for every request and provider call, with W3C traceparent headers propagated to upstream LLM providers.

How It Works

Both the DVARA LLM Gateway and the DVARA MCP Proxy have full OTLP tracing support:

  1. Server spans — one per incoming HTTP request
  2. LLM provider spans — one per provider call (chat, streamChat, embed), enriched with token usage and session ID
  3. MCP filter spans — one per stage of the MCP filter pipeline (server registry lookup, policy evaluation, PII scanning, upstream server call)
  4. Client spans — one per outbound HTTP call, with W3C traceparent headers propagated to upstream LLM and MCP providers

LLM Span Hierarchy

HTTP POST /v1/chat/completions (server span)
└── gateway.provider.chat (gateway observation)
├── low-card: provider, model
├── high-card: input_tokens, output_tokens, session_id
└── HTTP POST https://api.openai.com (client span)

MCP Span Hierarchy

HTTP POST /mcp/{serverId}/tools/call (server span)
└── dvara.mcp-gateway.request (parent MCP observation)
├── low-card: server_id, operation, tool_name
├── high-card: session_id, latency_ms, response_bytes, pii_in_response

├── mcp.filter.registry (low-card: server_id)
├── mcp.filter.policy (low-card: decision)
├── mcp.filter.pii_args (low-card: action)
├── mcp.server.call (upstream HTTP call)
│ ├── low-card: server_id, operation, tool_name
│ ├── high-card: http_status, latency_ms, response_bytes
│ ├── event: mcp_request_sent
│ └── event: mcp_response_received
└── mcp.filter.pii_response (low-card: action, conditional)

LLM Span Names and Attributes

Observation NameOperationLow-CardinalityHigh-Cardinality
gateway.provider.chatNon-streaming chatprovider, modelinput_tokens, output_tokens, session_id
gateway.provider.streamStreaming chatprovider, modelsession_id
gateway.provider.embedEmbeddingsprovider, model

MCP Span Names and Attributes

Observation NameOperationLow-CardinalityHigh-Cardinality
dvara.mcp-gateway.requestParent MCP requestserver_id, operation, tool_namesession_id, latency_ms, response_bytes, pii_in_response
mcp.filter.registryServer registry lookupserver_id
mcp.filter.policyPolicy evaluationdecision
mcp.filter.pii_argsPII scan on request argsaction
mcp.filter.pii_responsePII scan on responseaction
mcp.server.callUpstream HTTP callserver_id, operation, tool_namehttp_status, latency_ms, response_bytes

Configuration

management:
tracing:
sampling:
probability: ${TRACING_SAMPLING_PROBABILITY:1.0} # 0.0–1.0, default: sample all
otlp:
tracing:
endpoint: ${OTEL_EXPORTER_OTLP_ENDPOINT:http://localhost:4318/v1/traces}
Environment VariableDefaultDescription
TRACING_SAMPLING_PROBABILITY1.0Fraction of traces to sample (0.0 = none, 1.0 = all)
OTEL_EXPORTER_OTLP_ENDPOINThttp://localhost:4318/v1/tracesOTLP HTTP endpoint for trace export

X-Trace-ID Integration

When OpenTelemetry tracing is active, the X-Trace-ID response header uses the OTel 32-character hex trace ID instead of a random UUID. Client-supplied X-Trace-ID headers still take precedence.

ScenarioX-Trace-ID Value
Client sends X-Trace-ID headerClient's value (echoed back)
OTel tracing active, no client headerOTel trace ID (32-char hex)
No tracing, no client headerRandom UUID hex (32-char)

Viewing Traces

Start a local Jaeger instance for trace visualization:

# Start Jaeger with OTLP collector
docker run -d --name jaeger -p 16686:16686 -p 4318:4318 jaegertracing/all-in-one:latest

# Start the gateway with traces auto-exported to localhost:4318
docker run -d --name dvara-llm-proxy \
-p 8080:8080 \
-e MOCK_PROVIDER_ENABLED=true \
-e OTEL_EXPORTER_OTLP_ENDPOINT=http://host.docker.internal:4318/v1/traces \
ghcr.io/dvarahq/dvara/dvara-llm-gateway:latest

# Send a request
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"mock/test","messages":[{"role":"user","content":"Hello"}]}'

# View traces at http://localhost:16686

Log Correlation

When tracing is active, OTel trace and span IDs are automatically added to every log line in the request scope:

{
"trace_id": "abcdef1234567890abcdef1234567890",
"traceId": "abcdef1234567890abcdef1234567890",
"spanId": "1234567890abcdef",
"span_id": "1234567890abcdef",
"message": "Request completed",
...
}

Disabling Tracing

Set the sampling probability to 0.0 to disable trace collection while keeping the infrastructure in place:

TRACING_SAMPLING_PROBABILITY=0.0 docker run -d ghcr.io/dvarahq/dvara/dvara-llm-gateway:latest

Audit Event Stream

Every API request through /v1/* generates audit events that are persisted and queryable.

How it works

  1. On the way out, the gateway writes a GATEWAY_RESPONSE audit event with a rich payload: model, provider, HTTP status, latency, tokens, masked API key, tenant ID, policy decision, and error code.
  2. Every event is both logged to stdout (for your log pipeline) and persisted to the audit store.

Query Endpoints

# List all audit events (newest first)
curl http://localhost:8090/v1/admin/audit/events

# Filter by tenant
curl http://localhost:8090/v1/admin/audit/events?tenant_id=acme-corp

# Filter by event type
curl http://localhost:8090/v1/admin/audit/events?event_type=GATEWAY_RESPONSE

# Filter by date range
curl "http://localhost:8090/v1/admin/audit/events?from=2026-01-01T00:00:00Z&to=2026-01-02T00:00:00Z"

# Export as CSV
curl -o audit.csv http://localhost:8090/v1/admin/audit/events/export

# Export as JSON
curl -o audit.json http://localhost:8090/v1/admin/audit/events/export/json

Event Fields

FieldTypeDescription
eventIdstringUnique event ID (UUID)
timestampISO 8601When the event occurred
tenantIdstringTenant identifier (may be null)
eventTypestringEvent type — GATEWAY_RESPONSE on every data-plane request; many other types fire for policy decisions, PII / guardrail / MCP enforcement, and admin actions (e.g. POLICY_DENIED, PII_DETECTED, AGENT_LOOP_DETECTED, TENANT_CREATED). See SIEM and Webhooks for the full event-type catalog.
payloadobjectEvent-specific data (model, provider, status, latency, etc.)

Storage

Audit events are persisted to PostgreSQL and indexed by tenant, event type, and timestamp. The audit store is append-only — events cannot be updated or deleted through the API.

Enterprise audit trail (with license)

When running with a valid enterprise license key, the audit subsystem is upgraded with:

  • HMAC-SHA256 signing and hash-chaining. Every audit event is wrapped in a signed envelope. Each envelope's HMAC includes the previous event's hash, creating a tamper-evident chain that can be verified end-to-end.
  • Event enrichment. Events are automatically enriched with trace_id from the request context and with actor_user_id / actor_user_name / actor_roles from the authenticated principal.
  • Prompt storage opt-in. By default, prompt and message content is stripped from audit events. Tenants opt in by setting audit.store-prompts: "true" in tenant metadata. The global default is controlled by dvara.audit.store-prompts-by-default.
  • SIEM export. Signed envelopes are fanned out to any combination of built-in exporters: a JSON log exporter (always active), Kafka (tenant-keyed partitioning, dead-letter topic, SASL support), Splunk HEC, and AWS CloudWatch Logs. Export failure never blocks audit persistence. See SIEM and Webhooks for the full exporter configuration.
  • Chain integrity verification. The whole hash chain or individual events can be verified by recomputing HMACs, either through the admin API or from a background integrity sweep.

Enterprise audit configuration:

docker run -d --name dvara-llm-proxy \
-p 8080:8080 \
-e DVARA_LICENSE_KEY=<your-license-key> \
-e DVARA_AUDIT_HMAC_SECRET=your-hmac-secret \
-e DVARA_AUDIT_MAX_EVENTS=100000 \
-e DVARA_AUDIT_STORE_PROMPTS=false \
ghcr.io/dvarahq/dvara/dvara-llm-gateway:latest

DVARA Flightdeck

The audit log is viewable in the DVARA Flightdeck at /audit with live 3-second polling, filtering by tenant / event type / date range, click-to-expand event details, pause and resume, and CSV export.