Skip to main content

Production Security Checklist

This guide covers the essential security hardening steps before deploying Dvara Gateway to production. Each item includes the relevant configuration property and recommended action.


1. Audit HMAC Secret

The audit subsystem signs every event with HMAC-SHA256 to guarantee tamper-proof integrity and hash-chain linking. The default value (default-dev-secret-change-in-production) is intended for development only.

Action: Generate a cryptographically strong random secret (minimum 32 bytes) and set it via environment variable.

# Generate a 256-bit random secret
openssl rand -base64 32

# Set in your deployment
export GATEWAY_AUDIT_HMAC_SECRET="<generated-secret>"

Property: gateway.audit.hmac-secret

Risk if skipped

An attacker who knows the default secret can forge audit events and break chain integrity verification, undermining compliance reports (SOC2, HIPAA, GDPR).


2. Enterprise License Key

The enterprise license key unlocks all enterprise modules (policy engine, PII detection, guardrails, FinOps, agentic governance, semantic cache, RBAC, and more). Without it, the gateway runs without license with no-op defaults.

Action: Obtain a signed JWT license key and set it at startup.

export GATEWAY_ENTERPRISE_LICENSE_KEY="<signed-jwt-token>"

Property: gateway.enterprise.license-key


3. OIDC/JWT Authentication

By default, all admin endpoints are unauthenticated. Enabling security requires an OIDC-compliant identity provider (Keycloak, Auth0, Okta, Azure AD, etc.).

Action: Enable security and configure your OIDC issuer.

gateway:
security:
enabled: true
oidc:
issuer-uri: https://your-idp.example.com/realms/dvara
audience: dvara-gateway
role-claim: realm_access.roles # adjust for your IdP
tenant-claim: tenant_id
rbac:
enabled: true # enforce URL-pattern RBAC
session:
timeout-seconds: 3600

Environment variables:

  • GATEWAY_SECURITY_ENABLED=true
  • GATEWAY_OIDC_ISSUER_URI=https://your-idp.example.com/realms/dvara
  • GATEWAY_OIDC_AUDIENCE=dvara-gateway
Risk if skipped

Anyone with network access can read and modify tenants, routes, policies, budgets, and audit logs.


4. Vault-Backed Secret Management

Storing API keys and credentials in environment variables is acceptable for development but not for production. Dvara supports three vault backends.

Action: Configure one of the supported vault backends.

HashiCorp Vault

export GATEWAY_VAULT_BACKEND=hashicorp
export VAULT_ADDR=https://vault.internal:8200
export VAULT_AUTH_METHOD=approle
export VAULT_ROLE_ID=<role-id>
export VAULT_SECRET_ID=<secret-id>
export VAULT_SECRET_PATH=secret/data/dvara

AWS Secrets Manager

export GATEWAY_VAULT_BACKEND=aws-secrets-manager
export AWS_VAULT_REGION=us-east-1
export AWS_SECRET_NAME=dvara/provider-credentials

Azure Key Vault

export GATEWAY_VAULT_BACKEND=azure-key-vault
export AZURE_VAULT_URL=https://dvara-kv.vault.azure.net
export AZURE_CLIENT_ID=<client-id>
export AZURE_CLIENT_SECRET=<client-secret>
export AZURE_TENANT_ID=<tenant-id>

Property: gateway.vault.backend

Risk if skipped

Provider API keys live in plaintext environment variables, which may be exposed via process inspection, container metadata, or log leaks.


5. IP Access Control

Restrict which IP addresses and CIDR ranges can access the gateway.

Action: Enable IP access control and configure allowlists/denylists.

gateway:
ip-access:
enabled: true
scope: all # 'all' or 'data-plane'
global-allowlist:
- 10.0.0.0/8
- 172.16.0.0/12
global-denylist:
- 0.0.0.0/0 # deny all not in allowlist

Per-tenant overrides are configured via tenant metadata:

  • ip-access.allowlist (comma-separated CIDRs)
  • ip-access.denylist (comma-separated CIDRs)

Environment variable: GATEWAY_IP_ACCESS_ENABLED=true

Risk if skipped

The gateway is accessible from any IP address, increasing exposure to brute-force and reconnaissance attacks.


6. TLS 1.3 Enforcement

Dvara can enforce TLS 1.3 on all outbound connections to LLM providers and optionally apply mTLS with client certificates.

Action: Ensure TLS 1.3 enforcement is enabled (it is by default).

gateway:
tls:
enforce-tls13: true

Environment variable: GATEWAY_TLS_ENFORCE_TLS13=true

Risk if skipped

Connections to providers may negotiate weaker TLS versions vulnerable to downgrade attacks.


7. PII Detection and Enforcement

Configure PII scanning to prevent sensitive data (SSN, credit cards, emails, etc.) from reaching LLM providers.

Action: Set the default PII action to REDACT or BLOCK.

gateway:
pii:
enabled: true
default-action: REDACT # BLOCK, REDACT, or LOG
scan-responses: true # detect PII in LLM responses
strip-before-cache: true # redact PII before semantic caching
token-encryption-password: "<strong-password>"

Per-tenant overrides via tenant metadata:

  • pii.enabled=true
  • pii.action=REDACT
  • pii.custom-patterns (Map of label to regex)

Environment variable: GATEWAY_PII_DEFAULT_ACTION=REDACT

Risk if skipped (LOG mode)

PII reaches LLM providers and may be stored in their training data or logs.


8. Budget Caps

Prevent runaway costs by configuring per-tenant and per-API-key budget caps.

Action: Create budget caps via the admin API or UI.

# Create a monthly budget cap for a tenant
curl -X POST http://localhost:8080/admin/v1/budgets \
-H "Content-Type: application/json" \
-d '{
"tenantId": "tenant-prod",
"name": "Monthly limit",
"period": "MONTHLY",
"limitUsd": 5000.00,
"softLimitPct": 80,
"enabled": true
}'

Configure automatic model downgrade on soft limit breach via tenant metadata:

  • cost.downgrade-threshold-pct=80
  • cost.downgrade-rules=gpt-4o:gpt-4o-mini,claude-3-opus:claude-3-sonnet
Risk if skipped

A misconfigured client or prompt injection attack can generate unbounded LLM costs.


9. Webhook Alerting

Configure webhooks to receive real-time notifications for security and operational events.

Action: Create webhooks for critical event types.

curl -X POST http://localhost:8080/admin/v1/webhooks \
-H "Content-Type: application/json" \
-d '{
"name": "Security alerts",
"url": "https://your-siem.example.com/webhooks/dvara",
"secret": "<webhook-signing-secret>",
"eventTypes": [
"POLICY_DENIAL",
"PII_DETECTED",
"IP_ACCESS_DENIED",
"INJECTION_DETECTED",
"BUDGET_CAP_HARD",
"AGENT_LOOP_DETECTED",
"GUARDRAIL_BLOCKED",
"COST_ANOMALY"
],
"status": "ACTIVE"
}'

Properties:

  • gateway.webhooks.enabled=true
  • gateway.webhooks.max-retries=3
  • gateway.webhooks.delivery-timeout-ms=5000
Risk if skipped

Security events go unnoticed until the next manual audit review.


10. RBAC Role Assignments

Review and restrict role assignments. Follow the principle of least privilege.

Recommended role mapping:

TeamRolePermissions
Platform teamorg-adminFull access (use sparingly)
Security/compliancepolicy-adminPolicy, PII, guardrail, audit management
Financebilling-adminPricing, costs, budgets, chargeback reports
EngineeringdeveloperRoutes, API keys, read-only for most resources
StakeholdersviewerRead-only access to all resources

Action: Audit current user roles and remove unnecessary org-admin assignments.

# List all users
curl http://localhost:8080/admin/v1/users

# Update roles for a user
curl -X PUT http://localhost:8080/admin/v1/users/{id}/roles \
-H "Content-Type: application/json" \
-d '{"roles": ["developer"]}'
Risk if skipped

Over-privileged users can modify policies, delete audit logs, or change budget caps.


11. Rate Limiting

Enable rate limiting to protect against abuse and ensure fair usage across tenants.

Action: The enterprise rate limiter uses Bucket4j with Redis. Ensure Redis is provisioned and rate limiting is configured per tenant or API key.

Rate limits are enforced by RateLimitServletFilter on all /v1/* data-plane paths. MCP proxy has McpRateLimitFilter at order 200.

Risk if skipped

A single client can monopolize gateway capacity, causing denial of service for other tenants.


12. API Key Rotation

Regularly rotate API keys to limit the blast radius of a compromised key.

Action: Use the key rotation endpoint and update clients.

# Rotate an API key (returns new key, old key is revoked)
curl -X POST http://localhost:8080/admin/v1/tenants/{tid}/keys/{kid}/rotate

Recommended cadence: Every 90 days, or immediately if a compromise is suspected.

For MCP server credentials:

# Rotate MCP server credentials
curl -X POST http://localhost:8080/admin/v1/mcp/servers/{id}/credentials/rotate \
-H "Content-Type: application/json" \
-d '{"newCredentialRef": "secret/data/dvara/mcp/new-cred"}'
Risk if skipped

A leaked API key provides indefinite access until manually revoked.


13. SIEM Export

Forward audit events to your Security Information and Event Management (SIEM) system for centralized monitoring and alerting.

Action: Configure SIEM export. The built-in LoggingSiemExporter writes JSON to the siem.export logger, which can be routed to Splunk, Elasticsearch, or CloudWatch via log shipping.

For direct integration, configure Splunk HEC or CloudWatch exporters (requires custom bean registration).

Ensure the following audit event types are monitored:

  • POLICY_DENIED -- blocked requests
  • PII_DETECTED / PII_REDACTED -- sensitive data handling
  • IP_ACCESS_DENIED -- unauthorized access attempts
  • AUTHORIZATION_DENIED -- RBAC violations
  • INJECTION_DETECTED -- prompt injection attempts
  • GUARDRAIL_BLOCKED -- content policy violations
  • BUDGET_CAP_HARD -- budget overruns
  • AGENT_LOOP_DETECTED -- runaway agent sessions
  • CONFIG_SYNC_FAILURE -- control plane connectivity issues
Risk if skipped

Security incidents are only visible in local logs, which may be lost or tampered with.


14. mTLS for Provider Connections

Configure mutual TLS (client certificates) for outbound connections to LLM providers, especially in regulated environments.

Action: Configure per-provider mTLS settings.

gateway:
tls:
enforce-tls13: true
providers:
openai:
mtls-enabled: true
client-cert-path: /etc/dvara/certs/openai-client.pem
client-key-path: /etc/dvara/certs/openai-client-key.pem
trust-store-path: /etc/dvara/certs/openai-truststore.p12
trust-store-password: "${OPENAI_TRUSTSTORE_PASSWORD}"
anthropic:
mtls-enabled: true
client-cert-path: /etc/dvara/certs/anthropic-client.pem
client-key-path: /etc/dvara/certs/anthropic-client-key.pem
Risk if skipped

Provider connections use one-way TLS only; the provider cannot verify the gateway's identity.


15. Monitoring with Prometheus and Grafana

Set up comprehensive monitoring using the built-in Prometheus metrics endpoint.

Action: Configure Prometheus to scrape the gateway and set up Grafana dashboards.

Prometheus scrape config

scrape_configs:
- job_name: dvara-gateway
metrics_path: /actuator/prometheus
static_configs:
- targets: ['gateway:8080']
- job_name: dvara-mcp-proxy
metrics_path: /actuator/prometheus
static_configs:
- targets: ['mcp-proxy:8070']

Key metrics to alert on

MetricConditionSeverity
gateway_provider_errors_totalRate > 10/minWarning
gateway_latency_seconds (P99)> 5s sustainedWarning
gateway_budget_blocked_totalAny incrementInfo
gateway_guardrail_blocked_totalRate > 5/minCritical
gateway_policy_shadow_divergence_totalAny incrementInfo
mcp_agent_loop_detected_totalAny incrementWarning
gateway_config_sync_failures_totalAny incrementCritical
gateway_cost_anomaly_totalAny incrementWarning

OpenTelemetry distributed tracing

export OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318/v1/traces
export TRACING_SAMPLING_PROBABILITY=0.1 # 10% sampling in production
Risk if skipped

Degraded performance, provider outages, and security events go undetected until users report issues.


Quick Reference: Minimum Production Configuration

The following environment variables represent the minimum set for a secure production deployment:

# Enterprise license
GATEWAY_ENTERPRISE_LICENSE_KEY=<jwt-token>

# Audit integrity
GATEWAY_AUDIT_HMAC_SECRET=<random-32-byte-base64>

# Authentication
GATEWAY_SECURITY_ENABLED=true
GATEWAY_OIDC_ISSUER_URI=https://your-idp.example.com/realms/dvara
GATEWAY_OIDC_AUDIENCE=dvara-gateway

# Secrets management
GATEWAY_VAULT_BACKEND=hashicorp # or aws-secrets-manager, azure-key-vault

# IP access control
GATEWAY_IP_ACCESS_ENABLED=true

# PII protection
GATEWAY_PII_DEFAULT_ACTION=REDACT

# TLS
GATEWAY_TLS_ENFORCE_TLS13=true

# Observability
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318/v1/traces
TRACING_SAMPLING_PROBABILITY=0.1