Skip to main content

Observability

Dvara provides structured JSON logging, Prometheus metrics, token usage metering, and request tracing out of the box.

X-Trace-ID Propagation

Every response includes an X-Trace-ID header for request correlation. This applies to both the LLM gateway (port 8080) and the MCP proxy (port 8070).

Behavior:

  • If the incoming request includes an X-Trace-ID header, the same value is echoed back.
  • When OpenTelemetry tracing is active, the OTel 32-character hex trace ID is used if no client header is present.
  • Otherwise, the gateway generates a new random 32-character hex ID.
  • The trace ID is embedded in every error response body as error.trace_id.
  • The trace ID is added to SLF4J MDC as trace_id for inclusion in all structured log lines.
# With custom trace ID
curl -i -H "X-Trace-ID: my-custom-trace-001" http://localhost:8080/v1/models
# → X-Trace-ID: my-custom-trace-001

# Without trace ID (auto-generated)
curl -i http://localhost:8080/v1/models
# → X-Trace-ID: a6783439db1f46a6bfed511a0011e955

X-Session-Id Header

Both the LLM gateway and MCP proxy accept an optional X-Session-Id header for agent session correlation. When present, the session ID is:

  • Stored as a servlet request attribute (sessionId)
  • Added to SLF4J MDC as session_id for structured log correlation
  • Attached as a high-cardinality attribute on OpenTelemetry spans

This allows traces from multiple LLM turns and MCP tool calls within a single agent session to be correlated by session ID.

# LLM request with session ID
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Session-Id: agent-session-42" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}'

# MCP request with same session ID
curl http://localhost:8070/mcp/filesystem/tools/call \
-H "Content-Type: application/json" \
-H "Authorization: Bearer gw_mykey" \
-H "X-Session-Id: agent-session-42" \
-d '{"name":"read_file","arguments":{"path":"/data/file.txt"}}'

Structured JSON Logging

Dvara uses Logstash Logback Encoder for structured JSON logging. Every log line is valid JSON, compatible with ELK, Loki, Datadog, and Splunk log ingest.

Configuration

Logging is configured in logback-spring.xml:

  • JSON mode (default) — Every log line is a JSON object with @timestamp, level, message, logger_name, and MDC fields.
  • Plain-text mode — Human-readable output for local development. Activate with:
    spring.profiles.active=log-plain

MDC Fields

The following fields are automatically added to log context by servlet filters:

FieldSourceDescription
trace_idTraceIdFilter / McpTraceIdFilterRequest correlation ID
session_idTraceIdFilter / McpTraceIdFilterAgent session ID (from X-Session-Id header, if present)
tenant_idAccessLogFilterTenant identifier
modelAccessLogFilterRequested model name
providerAccessLogFilterSelected provider
methodAccessLogFilterHTTP method
pathAccessLogFilterRequest URI
statusAccessLogFilterHTTP response status
latency_msAccessLogFilterRequest duration in milliseconds
api_keyAccessLogFilterAPI key (masked to first 8 chars)
cache_statusAccessLogFilterHIT or MISS
streamAccessLogFilterWhether request was streaming
tokens_promptAccessLogFilterPrompt token count
tokens_completionAccessLogFilterCompletion token count
tokens_totalAccessLogFilterTotal token count
error_codeAccessLogFilterGateway error code (if any)

Access Log Example

Every request produces a single structured access log entry:

{
"@timestamp": "2026-02-25T10:23:45.123Z",
"level": "INFO",
"message": "Request completed",
"logger_name": "ai.dvara.server.web.AccessLogFilter",
"trace_id": "a6783439db1f46a6bfed511a0011e955",
"tenant_id": "acme-corp",
"model": "gpt-4o",
"provider": "openai",
"method": "POST",
"path": "/v1/chat/completions",
"status": "200",
"latency_ms": "342",
"api_key": "sk-prod-1...",
"cache_status": "MISS",
"tokens_prompt": "150",
"tokens_completion": "85",
"tokens_total": "235",
"service": "dvara-gateway"
}

Prometheus Metrics

Dvara exposes Prometheus metrics via Spring Boot Actuator and Micrometer.

Scrape Endpoint

GET /actuator/prometheus

Available Metrics

MetricTypeLabelsDescription
gateway_requests_totalCountertenant, model, provider, status, regionTotal gateway requests
gateway_latency_secondsHistogramtenant, model, provider, status, regionRequest latency with P50/P95/P99 percentiles
gateway_tokens_totalCountertenant, model, directionToken usage (direction=input or output)
gateway_provider_errors_totalCounterprovider, error_codeProvider errors
gateway_retries_totalCounterproviderRetry attempts
gateway_fallbacks_totalCounterfrom_provider, to_providerFallback activations
gateway_config_sync_failures_totalCounterConfig sync failures (enterprise)
gateway_config_sync_versionGaugeLast successfully synced config version
gateway_fleet_config_lag_versionsGaugeinstance_idConfig version lag per fleet instance

Configuration

Metrics are enabled by default in application.yml:

management:
endpoints:
web:
exposure:
include: health,prometheus,gateway-status
prometheus:
metrics:
export:
enabled: true

Grafana Dashboard

Example PromQL queries:

# Request rate by provider
rate(gateway_requests_total[5m])

# P95 latency by model
histogram_quantile(0.95, rate(gateway_latency_seconds_bucket[5m]))

# Token throughput by tenant
rate(gateway_tokens_total[5m])

# Error rate by provider
rate(gateway_provider_errors_total[5m])

Token Usage Metering

Every non-streaming chat request records token usage to a TokenUsageRepository. The Default implementation stores records in memory (capped at 10,000 entries). Enterprise deployments replace this with a database-backed implementation.

Query Endpoints

# List all token usage records
curl http://localhost:8080/admin/v1/token-usage

# Filter by tenant
curl http://localhost:8080/admin/v1/token-usage?tenantId=acme-corp

# Filter by API key
curl http://localhost:8080/admin/v1/token-usage?apiKey=sk-prod-123

# Filter by model
curl http://localhost:8080/admin/v1/token-usage?model=gpt-4o

# Aggregated summary
curl "http://localhost:8080/admin/v1/token-usage/summary?tenantId=acme-corp&model=gpt-4o"

Record Fields

FieldTypeDescription
idstringUnique record ID (UUID)
tenantIdstringTenant identifier
apiKeystringAPI key used
modelstringModel requested
providerstringProvider that served the request
inputTokensintPrompt tokens
outputTokensintCompletion tokens
totalTokensintTotal tokens
estimatedbooleanWhether counts are estimated
timestampISO 8601When the request was made

Summary Response

{
"tenantId": "acme-corp",
"model": "gpt-4o",
"totalInputTokens": 15000,
"totalOutputTokens": 8500,
"totalTokens": 23500,
"requestCount": 42
}

Pre-Built Grafana Dashboards

Dvara ships with four production-ready Grafana dashboards in grafana/dashboards/:

DashboardFileDescription
Gateway Overviewdvara-overview.jsonRequest volume, latency (P50/P95/P99), error rates, provider health, token usage, cost/hour
FinOps & Budgetdvara-finops.jsonCost by tenant/model/provider, budget enforcement, model downgrades, anomalies
MCP Proxy & Agenticdvara-mcp.jsonTool calls, agent sessions, loop detection, approval gates, injection detection
Policy, Routing & Fleetdvara-policy-routing.jsonShadow policy divergence, canary testing, priority routing, config sync, fleet health

One-Command Setup

docker compose -f docker-compose.yml -f grafana/docker-compose.monitoring.yml up

This starts Prometheus (port 9090) and Grafana (port 3000, admin/dvara) with dashboards auto-provisioned.

Manual Import

for f in grafana/dashboards/*.json; do
curl -X POST http://admin:dvara@localhost:3000/api/dashboards/db \
-H "Content-Type: application/json" \
-d "{\"dashboard\": $(cat "$f"), \"overwrite\": true}"
done

Alerting Rules

grafana/alerts/dvara-alerts.yml defines 14 Prometheus alerting rules across three groups:

AlertSeverityTrigger
DvaraHighErrorRatecriticalError rate > 5% for 5 min
DvaraHighP95LatencywarningP95 > 5s for 5 min
DvaraProviderErrorSpikewarningProvider errors > 1/sec for 3 min
DvaraCircuitBreakerOpencriticalErrors with zero successes for 2 min
DvaraBudgetHardLimitcriticalHard budget cap hit
DvaraBudgetSoftLimitwarningSoft limit breached
DvaraCostAnomalywarningCost exceeds baseline
DvaraGuardrailBlockswarningGuardrail blocks > 0.1/sec
DvaraAgentLoopDetectedwarningAgent loop detected
DvaraApprovalTimeoutswarningApproval gate timeouts
DvaraMcpToolErrorRatewarningMCP error rate > 10%
DvaraInjectionDetectedcriticalPrompt injection detected
DvaraConfigSyncFailurewarningConfig sync failures
DvaraFleetConfigLagwarningInstance lag > 5 versions

Datadog Integration

OpenMetrics Scraping

Copy the Datadog Agent config to scrape all Dvara Prometheus metrics:

cp datadog/conf.d/dvara.yaml /etc/datadog-agent/conf.d/openmetrics.d/dvara.yaml
sudo systemctl restart datadog-agent

OTLP Traces to Datadog

Configure the Datadog Agent as an OTLP collector, then point Dvara to it:

OTEL_EXPORTER_OTLP_ENDPOINT=http://datadog-agent:4318/v1/traces

Pre-built Datadog monitors are provided in datadog/monitors.yaml.

Health Endpoints

EndpointDescription
GET /statusReturns JSON with status, mode, providers, routes, config version, and warnings
GET /actuator/gateway-statusFull actuator status endpoint (same data as /status)
GET /actuator/healthSpring Boot health check
GET /tryBuilt-in browser-based chat test panel
GET /v1/modelsLists all registered providers and models with capabilities
GET /admin/v1/providers/\{id\}/capabilitiesProvider-specific capability details

OpenTelemetry Distributed Tracing

Dvara includes OpenTelemetry distributed tracing via Micrometer Tracing with the OTel bridge. Traces are automatically created for every request and provider call, with W3C traceparent headers propagated to upstream LLM providers.

How It Works

Spring Boot auto-configures the tracing infrastructure when the tracing dependencies are present. Both gateway-server (LLM) and mcp-proxy-server (MCP) have full OTLP tracing support:

  1. Server spans — Spring MVC auto-instruments every incoming HTTP request
  2. LLM provider spansProviderDispatcher creates child observations for chat, streamChat, and embed operations, enriched with token usage and session ID
  3. MCP filter spans — Each MCP filter creates a child observation: registry lookup, policy evaluation, PII scanning, and upstream server call
  4. Client spansRestClient auto-instruments outbound HTTP calls, adding traceparent headers for W3C trace context propagation

LLM Span Hierarchy

HTTP POST /v1/chat/completions              (server span, auto by Spring MVC)
└── gateway.provider.chat (custom observation in ProviderDispatcher)
├── low-card: provider, model
├── high-card: input_tokens, output_tokens, session_id
└── HTTP POST https://api.openai.com (client span, auto by RestClient)

MCP Span Hierarchy

HTTP POST /mcp/{serverId}/tools/call        (server span, auto by Spring MVC)
└── gateway.mcp.request (parent observation in McpProxyController)
├── low-card: server_id, operation, tool_name
├── high-card: session_id, latency_ms, response_bytes, pii_in_response

├── mcp.filter.registry (low-card: server_id)
├── mcp.filter.policy (low-card: decision)
├── mcp.filter.pii_args (low-card: action)
├── mcp.server.call (upstream HTTP call)
│ ├── low-card: server_id, operation, tool_name
│ ├── high-card: http_status, latency_ms, response_bytes
│ ├── event: mcp_request_sent
│ └── event: mcp_response_received
└── mcp.filter.pii_response (low-card: action, conditional)

LLM Span Names and Attributes

Observation NameOperationLow-CardinalityHigh-Cardinality
gateway.provider.chatNon-streaming chatprovider, modelinput_tokens, output_tokens, session_id
gateway.provider.streamStreaming chatprovider, modelsession_id
gateway.provider.embedEmbeddingsprovider, model

MCP Span Names and Attributes

Observation NameOperationLow-CardinalityHigh-Cardinality
gateway.mcp.requestParent MCP requestserver_id, operation, tool_namesession_id, latency_ms, response_bytes, pii_in_response
mcp.filter.registryServer registry lookupserver_id
mcp.filter.policyPolicy evaluationdecision
mcp.filter.pii_argsPII scan on request argsaction
mcp.filter.pii_responsePII scan on responseaction
mcp.server.callUpstream HTTP callserver_id, operation, tool_namehttp_status, latency_ms, response_bytes

Configuration

management:
tracing:
sampling:
probability: ${TRACING_SAMPLING_PROBABILITY:1.0} # 0.0–1.0, default: sample all
otlp:
tracing:
endpoint: ${OTEL_EXPORTER_OTLP_ENDPOINT:http://localhost:4318/v1/traces}
Environment VariableDefaultDescription
TRACING_SAMPLING_PROBABILITY1.0Fraction of traces to sample (0.0 = none, 1.0 = all)
OTEL_EXPORTER_OTLP_ENDPOINThttp://localhost:4318/v1/tracesOTLP HTTP endpoint for trace export

X-Trace-ID Integration

When OpenTelemetry tracing is active, the X-Trace-ID response header uses the OTel 32-character hex trace ID instead of a random UUID. Client-supplied X-Trace-ID headers still take precedence.

ScenarioX-Trace-ID Value
Client sends X-Trace-ID headerClient's value (echoed back)
OTel tracing active, no client headerOTel trace ID (32-char hex)
No tracing, no client headerRandom UUID hex (32-char)

Viewing Traces

Start a local Jaeger instance for trace visualization:

# Start Jaeger with OTLP collector
docker run -d --name jaeger -p 16686:16686 -p 4318:4318 jaegertracing/all-in-one:latest

# Start gateway (traces auto-exported to localhost:4318)
MOCK_PROVIDER_ENABLED=true ./mvnw -pl gateway-server spring-boot:run

# Send a request
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"mock/test","messages":[{"role":"user","content":"Hello"}]}'

# View traces at http://localhost:16686

Log Correlation

When tracing is active, OTel trace and span IDs are automatically added to the SLF4J MDC and included in structured JSON logs:

{
"trace_id": "abcdef1234567890abcdef1234567890",
"traceId": "abcdef1234567890abcdef1234567890",
"spanId": "1234567890abcdef",
"span_id": "1234567890abcdef",
"message": "Request completed",
...
}

Disabling Tracing

Set the sampling probability to 0.0 to disable trace collection while keeping the infrastructure in place:

TRACING_SAMPLING_PROBABILITY=0.0 ./mvnw -pl gateway-server spring-boot:run

Audit Event Stream

Every API request through /v1/* generates audit events that are persisted and queryable.

How It Works

  1. AuditFilter (gateway filter chain) writes GATEWAY_REQUEST events with the requested model
  2. AuditResponseFilter (servlet filter, order HIGHEST_PRECEDENCE + 4) writes GATEWAY_RESPONSE events after request completion with rich payload: model, provider, HTTP status, latency, tokens, masked API key, tenant ID, and error code
  3. PersistingAuditWriter logs events to stdout and saves them to AuditEventRepository

Query Endpoints

# List all audit events (newest first)
curl http://localhost:8080/admin/v1/audit/events

# Filter by tenant
curl http://localhost:8080/admin/v1/audit/events?tenant_id=acme-corp

# Filter by event type
curl http://localhost:8080/admin/v1/audit/events?event_type=GATEWAY_RESPONSE

# Filter by date range
curl "http://localhost:8080/admin/v1/audit/events?from=2026-01-01T00:00:00Z&to=2026-01-02T00:00:00Z"

# Export as CSV
curl -o audit.csv http://localhost:8080/admin/v1/audit/events/export

# Export as JSON
curl -o audit.json http://localhost:8080/admin/v1/audit/events/export/json

Event Fields

FieldTypeDescription
eventIdstringUnique event ID (UUID)
timestampISO 8601When the event occurred
tenantIdstringTenant identifier (may be null)
eventTypestringGATEWAY_REQUEST or GATEWAY_RESPONSE
payloadobjectEvent-specific data (model, provider, status, latency, etc.)

Storage

The Default implementation uses InMemoryAuditEventRepository (capped at 10,000 events). Enterprise deployments replace this with AppendOnlyAuditEventRepository (capped at 100,000 events, configurable) via the @ConditionalOnMissingBean pattern.

Enterprise Audit Trail (with License)

When running with a valid JWT enterprise license key, the audit subsystem is upgraded with:

  • HMAC-SHA256 Signing: Every audit event is wrapped in a SignedAuditEnvelope with a cryptographic signature. Events are hash-chained — each event's HMAC includes the previous event's hash, creating a tamper-evident chain.
  • Event Enrichment: Events are automatically enriched with trace_id from the request context, actor_user_id/actor_user_name/actor_roles from the authenticated principal.
  • Prompt Storage Opt-In: By default, prompt/message content is stripped from audit events. Tenants can opt in by setting audit.store-prompts: "true" in their tenant metadata. The global default is controlled by gateway.audit.store-prompts-by-default.
  • SIEM Export: Signed audit envelopes are fanned out to pluggable SiemExporter implementations. A LoggingSiemExporter (JSON to siem.export logger) is included by default. Splunk HEC and CloudWatch stubs are provided for future integration. Export failure never blocks audit persistence.
  • Chain Integrity Verification: AuditIntegrityService can verify the entire hash chain or individual events by recomputing HMACs.

Enterprise audit configuration:

GATEWAY_ENTERPRISE_LICENSE_KEY=<jwt-token> \
GATEWAY_AUDIT_HMAC_SECRET=your-hmac-secret \
GATEWAY_AUDIT_MAX_EVENTS=100000 \
GATEWAY_AUDIT_STORE_PROMPTS=false \
./mvnw -pl enterprise-server spring-boot:run

Admin UI

The audit log is viewable in the admin UI at /audit with live 3-second polling, filtering by tenant/type/date range, click-to-expand event details, pause/resume, and CSV export.

Extension Point

Both AuditWriter and AuditEventRepository are overridable via @ConditionalOnMissingBean:

public interface AuditWriter {
void write(AuditEvent event);
}

public interface AuditEventRepository {
void save(AuditEvent event);
List<AuditEvent> findAll();
// ... query methods
}

Enterprise deployments replace these with EnterpriseAuditWriter (enrichment + HMAC + SIEM) and AppendOnlyAuditEventRepository (hash-chained, signed append-only store).