Observability
Dvara provides structured JSON logging, Prometheus metrics, token usage metering, and request tracing out of the box.
X-Trace-ID Propagation
Every response includes an X-Trace-ID header for request correlation. This applies to both the LLM gateway (port 8080) and the MCP proxy (port 8070).
Behavior:
- If the incoming request includes an
X-Trace-IDheader, the same value is echoed back. - When OpenTelemetry tracing is active, the OTel 32-character hex trace ID is used if no client header is present.
- Otherwise, the gateway generates a new random 32-character hex ID.
- The trace ID is embedded in every error response body as
error.trace_id. - The trace ID is added to SLF4J MDC as
trace_idfor inclusion in all structured log lines.
# With custom trace ID
curl -i -H "X-Trace-ID: my-custom-trace-001" http://localhost:8080/v1/models
# → X-Trace-ID: my-custom-trace-001
# Without trace ID (auto-generated)
curl -i http://localhost:8080/v1/models
# → X-Trace-ID: a6783439db1f46a6bfed511a0011e955
X-Session-Id Header
Both the LLM gateway and MCP proxy accept an optional X-Session-Id header for agent session correlation. When present, the session ID is:
- Stored as a servlet request attribute (
sessionId) - Added to SLF4J MDC as
session_idfor structured log correlation - Attached as a high-cardinality attribute on OpenTelemetry spans
This allows traces from multiple LLM turns and MCP tool calls within a single agent session to be correlated by session ID.
# LLM request with session ID
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Session-Id: agent-session-42" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}'
# MCP request with same session ID
curl http://localhost:8070/mcp/filesystem/tools/call \
-H "Content-Type: application/json" \
-H "Authorization: Bearer gw_mykey" \
-H "X-Session-Id: agent-session-42" \
-d '{"name":"read_file","arguments":{"path":"/data/file.txt"}}'
Structured JSON Logging
Dvara uses Logstash Logback Encoder for structured JSON logging. Every log line is valid JSON, compatible with ELK, Loki, Datadog, and Splunk log ingest.
Configuration
Logging is configured in logback-spring.xml:
- JSON mode (default) — Every log line is a JSON object with
@timestamp,level,message,logger_name, and MDC fields. - Plain-text mode — Human-readable output for local development. Activate with:
spring.profiles.active=log-plain
MDC Fields
The following fields are automatically added to log context by servlet filters:
| Field | Source | Description |
|---|---|---|
trace_id | TraceIdFilter / McpTraceIdFilter | Request correlation ID |
session_id | TraceIdFilter / McpTraceIdFilter | Agent session ID (from X-Session-Id header, if present) |
tenant_id | AccessLogFilter | Tenant identifier |
model | AccessLogFilter | Requested model name |
provider | AccessLogFilter | Selected provider |
method | AccessLogFilter | HTTP method |
path | AccessLogFilter | Request URI |
status | AccessLogFilter | HTTP response status |
latency_ms | AccessLogFilter | Request duration in milliseconds |
api_key | AccessLogFilter | API key (masked to first 8 chars) |
cache_status | AccessLogFilter | HIT or MISS |
stream | AccessLogFilter | Whether request was streaming |
tokens_prompt | AccessLogFilter | Prompt token count |
tokens_completion | AccessLogFilter | Completion token count |
tokens_total | AccessLogFilter | Total token count |
error_code | AccessLogFilter | Gateway error code (if any) |
Access Log Example
Every request produces a single structured access log entry:
{
"@timestamp": "2026-02-25T10:23:45.123Z",
"level": "INFO",
"message": "Request completed",
"logger_name": "ai.dvara.server.web.AccessLogFilter",
"trace_id": "a6783439db1f46a6bfed511a0011e955",
"tenant_id": "acme-corp",
"model": "gpt-4o",
"provider": "openai",
"method": "POST",
"path": "/v1/chat/completions",
"status": "200",
"latency_ms": "342",
"api_key": "sk-prod-1...",
"cache_status": "MISS",
"tokens_prompt": "150",
"tokens_completion": "85",
"tokens_total": "235",
"service": "dvara-gateway"
}
Prometheus Metrics
Dvara exposes Prometheus metrics via Spring Boot Actuator and Micrometer.
Scrape Endpoint
GET /actuator/prometheus
Available Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
gateway_requests_total | Counter | tenant, model, provider, status, region | Total gateway requests |
gateway_latency_seconds | Histogram | tenant, model, provider, status, region | Request latency with P50/P95/P99 percentiles |
gateway_tokens_total | Counter | tenant, model, direction | Token usage (direction=input or output) |
gateway_provider_errors_total | Counter | provider, error_code | Provider errors |
gateway_retries_total | Counter | provider | Retry attempts |
gateway_fallbacks_total | Counter | from_provider, to_provider | Fallback activations |
gateway_config_sync_failures_total | Counter | — | Config sync failures (enterprise) |
gateway_config_sync_version | Gauge | — | Last successfully synced config version |
gateway_fleet_config_lag_versions | Gauge | instance_id | Config version lag per fleet instance |
Configuration
Metrics are enabled by default in application.yml:
management:
endpoints:
web:
exposure:
include: health,prometheus,gateway-status
prometheus:
metrics:
export:
enabled: true
Grafana Dashboard
Example PromQL queries:
# Request rate by provider
rate(gateway_requests_total[5m])
# P95 latency by model
histogram_quantile(0.95, rate(gateway_latency_seconds_bucket[5m]))
# Token throughput by tenant
rate(gateway_tokens_total[5m])
# Error rate by provider
rate(gateway_provider_errors_total[5m])
Token Usage Metering
Every non-streaming chat request records token usage to a TokenUsageRepository. The Default implementation stores records in memory (capped at 10,000 entries). Enterprise deployments replace this with a database-backed implementation.
Query Endpoints
# List all token usage records
curl http://localhost:8080/admin/v1/token-usage
# Filter by tenant
curl http://localhost:8080/admin/v1/token-usage?tenantId=acme-corp
# Filter by API key
curl http://localhost:8080/admin/v1/token-usage?apiKey=sk-prod-123
# Filter by model
curl http://localhost:8080/admin/v1/token-usage?model=gpt-4o
# Aggregated summary
curl "http://localhost:8080/admin/v1/token-usage/summary?tenantId=acme-corp&model=gpt-4o"
Record Fields
| Field | Type | Description |
|---|---|---|
id | string | Unique record ID (UUID) |
tenantId | string | Tenant identifier |
apiKey | string | API key used |
model | string | Model requested |
provider | string | Provider that served the request |
inputTokens | int | Prompt tokens |
outputTokens | int | Completion tokens |
totalTokens | int | Total tokens |
estimated | boolean | Whether counts are estimated |
timestamp | ISO 8601 | When the request was made |
Summary Response
{
"tenantId": "acme-corp",
"model": "gpt-4o",
"totalInputTokens": 15000,
"totalOutputTokens": 8500,
"totalTokens": 23500,
"requestCount": 42
}
Pre-Built Grafana Dashboards
Dvara ships with four production-ready Grafana dashboards in grafana/dashboards/:
| Dashboard | File | Description |
|---|---|---|
| Gateway Overview | dvara-overview.json | Request volume, latency (P50/P95/P99), error rates, provider health, token usage, cost/hour |
| FinOps & Budget | dvara-finops.json | Cost by tenant/model/provider, budget enforcement, model downgrades, anomalies |
| MCP Proxy & Agentic | dvara-mcp.json | Tool calls, agent sessions, loop detection, approval gates, injection detection |
| Policy, Routing & Fleet | dvara-policy-routing.json | Shadow policy divergence, canary testing, priority routing, config sync, fleet health |
One-Command Setup
docker compose -f docker-compose.yml -f grafana/docker-compose.monitoring.yml up
This starts Prometheus (port 9090) and Grafana (port 3000, admin/dvara) with dashboards auto-provisioned.
Manual Import
for f in grafana/dashboards/*.json; do
curl -X POST http://admin:dvara@localhost:3000/api/dashboards/db \
-H "Content-Type: application/json" \
-d "{\"dashboard\": $(cat "$f"), \"overwrite\": true}"
done
Alerting Rules
grafana/alerts/dvara-alerts.yml defines 14 Prometheus alerting rules across three groups:
| Alert | Severity | Trigger |
|---|---|---|
DvaraHighErrorRate | critical | Error rate > 5% for 5 min |
DvaraHighP95Latency | warning | P95 > 5s for 5 min |
DvaraProviderErrorSpike | warning | Provider errors > 1/sec for 3 min |
DvaraCircuitBreakerOpen | critical | Errors with zero successes for 2 min |
DvaraBudgetHardLimit | critical | Hard budget cap hit |
DvaraBudgetSoftLimit | warning | Soft limit breached |
DvaraCostAnomaly | warning | Cost exceeds baseline |
DvaraGuardrailBlocks | warning | Guardrail blocks > 0.1/sec |
DvaraAgentLoopDetected | warning | Agent loop detected |
DvaraApprovalTimeouts | warning | Approval gate timeouts |
DvaraMcpToolErrorRate | warning | MCP error rate > 10% |
DvaraInjectionDetected | critical | Prompt injection detected |
DvaraConfigSyncFailure | warning | Config sync failures |
DvaraFleetConfigLag | warning | Instance lag > 5 versions |
Datadog Integration
OpenMetrics Scraping
Copy the Datadog Agent config to scrape all Dvara Prometheus metrics:
cp datadog/conf.d/dvara.yaml /etc/datadog-agent/conf.d/openmetrics.d/dvara.yaml
sudo systemctl restart datadog-agent
OTLP Traces to Datadog
Configure the Datadog Agent as an OTLP collector, then point Dvara to it:
OTEL_EXPORTER_OTLP_ENDPOINT=http://datadog-agent:4318/v1/traces
Pre-built Datadog monitors are provided in datadog/monitors.yaml.
Health Endpoints
| Endpoint | Description |
|---|---|
GET /status | Returns JSON with status, mode, providers, routes, config version, and warnings |
GET /actuator/gateway-status | Full actuator status endpoint (same data as /status) |
GET /actuator/health | Spring Boot health check |
GET /try | Built-in browser-based chat test panel |
GET /v1/models | Lists all registered providers and models with capabilities |
GET /admin/v1/providers/\{id\}/capabilities | Provider-specific capability details |
OpenTelemetry Distributed Tracing
Dvara includes OpenTelemetry distributed tracing via Micrometer Tracing with the OTel bridge. Traces are automatically created for every request and provider call, with W3C traceparent headers propagated to upstream LLM providers.
How It Works
Spring Boot auto-configures the tracing infrastructure when the tracing dependencies are present. Both gateway-server (LLM) and mcp-proxy-server (MCP) have full OTLP tracing support:
- Server spans — Spring MVC auto-instruments every incoming HTTP request
- LLM provider spans —
ProviderDispatchercreates child observations forchat,streamChat, andembedoperations, enriched with token usage and session ID - MCP filter spans — Each MCP filter creates a child observation: registry lookup, policy evaluation, PII scanning, and upstream server call
- Client spans —
RestClientauto-instruments outbound HTTP calls, addingtraceparentheaders for W3C trace context propagation
LLM Span Hierarchy
HTTP POST /v1/chat/completions (server span, auto by Spring MVC)
└── gateway.provider.chat (custom observation in ProviderDispatcher)
├── low-card: provider, model
├── high-card: input_tokens, output_tokens, session_id
└── HTTP POST https://api.openai.com (client span, auto by RestClient)
MCP Span Hierarchy
HTTP POST /mcp/{serverId}/tools/call (server span, auto by Spring MVC)
└── gateway.mcp.request (parent observation in McpProxyController)
├── low-card: server_id, operation, tool_name
├── high-card: session_id, latency_ms, response_bytes, pii_in_response
│
├── mcp.filter.registry (low-card: server_id)
├── mcp.filter.policy (low-card: decision)
├── mcp.filter.pii_args (low-card: action)
├── mcp.server.call (upstream HTTP call)
│ ├── low-card: server_id, operation, tool_name
│ ├── high-card: http_status, latency_ms, response_bytes
│ ├── event: mcp_request_sent
│ └── event: mcp_response_received
└── mcp.filter.pii_response (low-card: action, conditional)
LLM Span Names and Attributes
| Observation Name | Operation | Low-Cardinality | High-Cardinality |
|---|---|---|---|
gateway.provider.chat | Non-streaming chat | provider, model | input_tokens, output_tokens, session_id |
gateway.provider.stream | Streaming chat | provider, model | session_id |
gateway.provider.embed | Embeddings | provider, model | — |
MCP Span Names and Attributes
| Observation Name | Operation | Low-Cardinality | High-Cardinality |
|---|---|---|---|
gateway.mcp.request | Parent MCP request | server_id, operation, tool_name | session_id, latency_ms, response_bytes, pii_in_response |
mcp.filter.registry | Server registry lookup | server_id | — |
mcp.filter.policy | Policy evaluation | decision | — |
mcp.filter.pii_args | PII scan on request args | action | — |
mcp.filter.pii_response | PII scan on response | action | — |
mcp.server.call | Upstream HTTP call | server_id, operation, tool_name | http_status, latency_ms, response_bytes |
Configuration
management:
tracing:
sampling:
probability: ${TRACING_SAMPLING_PROBABILITY:1.0} # 0.0–1.0, default: sample all
otlp:
tracing:
endpoint: ${OTEL_EXPORTER_OTLP_ENDPOINT:http://localhost:4318/v1/traces}
| Environment Variable | Default | Description |
|---|---|---|
TRACING_SAMPLING_PROBABILITY | 1.0 | Fraction of traces to sample (0.0 = none, 1.0 = all) |
OTEL_EXPORTER_OTLP_ENDPOINT | http://localhost:4318/v1/traces | OTLP HTTP endpoint for trace export |
X-Trace-ID Integration
When OpenTelemetry tracing is active, the X-Trace-ID response header uses the OTel 32-character hex trace ID instead of a random UUID. Client-supplied X-Trace-ID headers still take precedence.
| Scenario | X-Trace-ID Value |
|---|---|
Client sends X-Trace-ID header | Client's value (echoed back) |
| OTel tracing active, no client header | OTel trace ID (32-char hex) |
| No tracing, no client header | Random UUID hex (32-char) |
Viewing Traces
Start a local Jaeger instance for trace visualization:
# Start Jaeger with OTLP collector
docker run -d --name jaeger -p 16686:16686 -p 4318:4318 jaegertracing/all-in-one:latest
# Start gateway (traces auto-exported to localhost:4318)
MOCK_PROVIDER_ENABLED=true ./mvnw -pl gateway-server spring-boot:run
# Send a request
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"mock/test","messages":[{"role":"user","content":"Hello"}]}'
# View traces at http://localhost:16686
Log Correlation
When tracing is active, OTel trace and span IDs are automatically added to the SLF4J MDC and included in structured JSON logs:
{
"trace_id": "abcdef1234567890abcdef1234567890",
"traceId": "abcdef1234567890abcdef1234567890",
"spanId": "1234567890abcdef",
"span_id": "1234567890abcdef",
"message": "Request completed",
...
}
Disabling Tracing
Set the sampling probability to 0.0 to disable trace collection while keeping the infrastructure in place:
TRACING_SAMPLING_PROBABILITY=0.0 ./mvnw -pl gateway-server spring-boot:run
Audit Event Stream
Every API request through /v1/* generates audit events that are persisted and queryable.
How It Works
AuditFilter(gateway filter chain) writesGATEWAY_REQUESTevents with the requested modelAuditResponseFilter(servlet filter, orderHIGHEST_PRECEDENCE + 4) writesGATEWAY_RESPONSEevents after request completion with rich payload: model, provider, HTTP status, latency, tokens, masked API key, tenant ID, and error codePersistingAuditWriterlogs events to stdout and saves them toAuditEventRepository
Query Endpoints
# List all audit events (newest first)
curl http://localhost:8080/admin/v1/audit/events
# Filter by tenant
curl http://localhost:8080/admin/v1/audit/events?tenant_id=acme-corp
# Filter by event type
curl http://localhost:8080/admin/v1/audit/events?event_type=GATEWAY_RESPONSE
# Filter by date range
curl "http://localhost:8080/admin/v1/audit/events?from=2026-01-01T00:00:00Z&to=2026-01-02T00:00:00Z"
# Export as CSV
curl -o audit.csv http://localhost:8080/admin/v1/audit/events/export
# Export as JSON
curl -o audit.json http://localhost:8080/admin/v1/audit/events/export/json
Event Fields
| Field | Type | Description |
|---|---|---|
eventId | string | Unique event ID (UUID) |
timestamp | ISO 8601 | When the event occurred |
tenantId | string | Tenant identifier (may be null) |
eventType | string | GATEWAY_REQUEST or GATEWAY_RESPONSE |
payload | object | Event-specific data (model, provider, status, latency, etc.) |
Storage
The Default implementation uses InMemoryAuditEventRepository (capped at 10,000 events). Enterprise deployments replace this with AppendOnlyAuditEventRepository (capped at 100,000 events, configurable) via the @ConditionalOnMissingBean pattern.
Enterprise Audit Trail (with License)
When running with a valid JWT enterprise license key, the audit subsystem is upgraded with:
- HMAC-SHA256 Signing: Every audit event is wrapped in a
SignedAuditEnvelopewith a cryptographic signature. Events are hash-chained — each event's HMAC includes the previous event's hash, creating a tamper-evident chain. - Event Enrichment: Events are automatically enriched with
trace_idfrom the request context,actor_user_id/actor_user_name/actor_rolesfrom the authenticated principal. - Prompt Storage Opt-In: By default, prompt/message content is stripped from audit events. Tenants can opt in by setting
audit.store-prompts: "true"in their tenant metadata. The global default is controlled bygateway.audit.store-prompts-by-default. - SIEM Export: Signed audit envelopes are fanned out to pluggable
SiemExporterimplementations. ALoggingSiemExporter(JSON tosiem.exportlogger) is included by default. Splunk HEC and CloudWatch stubs are provided for future integration. Export failure never blocks audit persistence. - Chain Integrity Verification:
AuditIntegrityServicecan verify the entire hash chain or individual events by recomputing HMACs.
Enterprise audit configuration:
GATEWAY_ENTERPRISE_LICENSE_KEY=<jwt-token> \
GATEWAY_AUDIT_HMAC_SECRET=your-hmac-secret \
GATEWAY_AUDIT_MAX_EVENTS=100000 \
GATEWAY_AUDIT_STORE_PROMPTS=false \
./mvnw -pl enterprise-server spring-boot:run
Admin UI
The audit log is viewable in the admin UI at /audit with live 3-second polling, filtering by tenant/type/date range, click-to-expand event details, pause/resume, and CSV export.
Extension Point
Both AuditWriter and AuditEventRepository are overridable via @ConditionalOnMissingBean:
public interface AuditWriter {
void write(AuditEvent event);
}
public interface AuditEventRepository {
void save(AuditEvent event);
List<AuditEvent> findAll();
// ... query methods
}
Enterprise deployments replace these with EnterpriseAuditWriter (enrichment + HMAC + SIEM) and AppendOnlyAuditEventRepository (hash-chained, signed append-only store).