Production Security Checklist

This guide covers the essential security hardening steps before deploying Dvara Gateway to production. Each item includes the relevant configuration property and recommended action.

1. Audit HMAC Secret

The audit subsystem signs every event with HMAC-SHA256 to guarantee tamper-proof integrity and hash-chain linking. The default value (default-dev-secret-change-in-production) is intended for development only.

Action: Generate a cryptographically strong random secret (minimum 32 bytes) and set it via environment variable.

# Generate a 256-bit random secret
openssl rand -base64 32

# Set in your deployment
export GATEWAY_AUDIT_HMAC_SECRET="<generated-secret>"

Property: gateway.audit.hmac-secret

Risk if skipped

An attacker who knows the default secret can forge audit events and break chain integrity verification, undermining compliance reports (SOC2, HIPAA, GDPR).

2. Enterprise License Key

The enterprise license key unlocks all enterprise modules (policy engine, PII detection, guardrails, FinOps, agentic governance, semantic cache, RBAC, and more). Without it, the gateway runs without license with no-op defaults.

Action: Obtain a signed JWT license key and set it at startup.

export GATEWAY_ENTERPRISE_LICENSE_KEY="<signed-jwt-token>"

Property: gateway.enterprise.license-key

3. OIDC/JWT Authentication

By default, all admin endpoints are unauthenticated. Enabling security requires an OIDC-compliant identity provider (Keycloak, Auth0, Okta, Azure AD, etc.).

Action: Enable security and configure your OIDC issuer.

gateway:
  security:
    enabled: true
    oidc:
      issuer-uri: https://your-idp.example.com/realms/dvara
      audience: dvara-gateway
      role-claim: realm_access.roles   # adjust for your IdP
      tenant-claim: tenant_id
    rbac:
      enabled: true                    # enforce URL-pattern RBAC
    session:
      timeout-seconds: 3600

Environment variables:

GATEWAY_SECURITY_ENABLED=true
GATEWAY_OIDC_ISSUER_URI=https://your-idp.example.com/realms/dvara
GATEWAY_OIDC_AUDIENCE=dvara-gateway

Risk if skipped

Anyone with network access can read and modify tenants, routes, policies, budgets, and audit logs.

4. Vault-Backed Secret Management

Storing API keys and credentials in environment variables is acceptable for development but not for production. Dvara supports three vault backends.

Action: Configure one of the supported vault backends.

HashiCorp Vault

export GATEWAY_VAULT_BACKEND=hashicorp
export VAULT_ADDR=https://vault.internal:8200
export VAULT_AUTH_METHOD=approle
export VAULT_ROLE_ID=<role-id>
export VAULT_SECRET_ID=<secret-id>
export VAULT_SECRET_PATH=secret/data/dvara

AWS Secrets Manager

export GATEWAY_VAULT_BACKEND=aws-secrets-manager
export AWS_VAULT_REGION=us-east-1
export AWS_SECRET_NAME=dvara/provider-credentials

Azure Key Vault

export GATEWAY_VAULT_BACKEND=azure-key-vault
export AZURE_VAULT_URL=https://dvara-kv.vault.azure.net
export AZURE_CLIENT_ID=<client-id>
export AZURE_CLIENT_SECRET=<client-secret>
export AZURE_TENANT_ID=<tenant-id>

Property: gateway.vault.backend

Risk if skipped

Provider API keys live in plaintext environment variables, which may be exposed via process inspection, container metadata, or log leaks.

5. IP Access Control

Restrict which IP addresses and CIDR ranges can access the gateway.

Action: Enable IP access control and configure allowlists/denylists.

gateway:
  ip-access:
    enabled: true
    scope: all               # 'all' or 'data-plane'
    global-allowlist:
      - 10.0.0.0/8
      - 172.16.0.0/12
    global-denylist:
      - 0.0.0.0/0            # deny all not in allowlist

Per-tenant overrides are configured via tenant metadata:

ip-access.allowlist (comma-separated CIDRs)
ip-access.denylist (comma-separated CIDRs)

Environment variable: GATEWAY_IP_ACCESS_ENABLED=true

Risk if skipped

The gateway is accessible from any IP address, increasing exposure to brute-force and reconnaissance attacks.

6. TLS 1.3 Enforcement

Dvara can enforce TLS 1.3 on all outbound connections to LLM providers and optionally apply mTLS with client certificates.

Action: Ensure TLS 1.3 enforcement is enabled (it is by default).

gateway:
  tls:
    enforce-tls13: true

Environment variable: GATEWAY_TLS_ENFORCE_TLS13=true

Risk if skipped

Connections to providers may negotiate weaker TLS versions vulnerable to downgrade attacks.

7. PII Detection and Enforcement

Configure PII scanning to prevent sensitive data (SSN, credit cards, emails, etc.) from reaching LLM providers.

Action: Set the default PII action to REDACT or BLOCK.

gateway:
  pii:
    enabled: true
    default-action: REDACT          # BLOCK, REDACT, or LOG
    scan-responses: true            # detect PII in LLM responses
    strip-before-cache: true        # redact PII before semantic caching
    token-encryption-password: "<strong-password>"

Per-tenant overrides via tenant metadata:

pii.enabled=true
pii.action=REDACT
pii.custom-patterns (Map of label to regex)

Environment variable: GATEWAY_PII_DEFAULT_ACTION=REDACT

Risk if skipped (LOG mode)

PII reaches LLM providers and may be stored in their training data or logs.

8. Budget Caps

Prevent runaway costs by configuring per-tenant and per-API-key budget caps.

Action: Create budget caps via the admin API or UI.

# Create a monthly budget cap for a tenant
curl -X POST http://localhost:8080/admin/v1/budgets \
  -H "Content-Type: application/json" \
  -d '{
    "tenantId": "tenant-prod",
    "name": "Monthly limit",
    "period": "MONTHLY",
    "limitUsd": 5000.00,
    "softLimitPct": 80,
    "enabled": true
  }'

Configure automatic model downgrade on soft limit breach via tenant metadata:

cost.downgrade-threshold-pct=80
cost.downgrade-rules=gpt-4o:gpt-4o-mini,claude-3-opus:claude-3-sonnet

Risk if skipped

A misconfigured client or prompt injection attack can generate unbounded LLM costs.

9. Webhook Alerting

Configure webhooks to receive real-time notifications for security and operational events.

Action: Create webhooks for critical event types.

curl -X POST http://localhost:8080/admin/v1/webhooks \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Security alerts",
    "url": "https://your-siem.example.com/webhooks/dvara",
    "secret": "<webhook-signing-secret>",
    "eventTypes": [
      "POLICY_DENIAL",
      "PII_DETECTED",
      "IP_ACCESS_DENIED",
      "INJECTION_DETECTED",
      "BUDGET_CAP_HARD",
      "AGENT_LOOP_DETECTED",
      "GUARDRAIL_BLOCKED",
      "COST_ANOMALY"
    ],
    "status": "ACTIVE"
  }'

Properties:

gateway.webhooks.enabled=true
gateway.webhooks.max-retries=3
gateway.webhooks.delivery-timeout-ms=5000

Risk if skipped

Security events go unnoticed until the next manual audit review.

10. RBAC Role Assignments

Review and restrict role assignments. Follow the principle of least privilege.

Recommended role mapping:

Team	Role	Permissions
Platform team	`org-admin`	Full access (use sparingly)
Security/compliance	`policy-admin`	Policy, PII, guardrail, audit management
Finance	`billing-admin`	Pricing, costs, budgets, chargeback reports
Engineering	`developer`	Routes, API keys, read-only for most resources
Stakeholders	`viewer`	Read-only access to all resources

Action: Audit current user roles and remove unnecessary org-admin assignments.

# List all users
curl http://localhost:8080/admin/v1/users

# Update roles for a user
curl -X PUT http://localhost:8080/admin/v1/users/{id}/roles \
  -H "Content-Type: application/json" \
  -d '{"roles": ["developer"]}'

Risk if skipped

Over-privileged users can modify policies, delete audit logs, or change budget caps.

11. Rate Limiting

Enable rate limiting to protect against abuse and ensure fair usage across tenants.

Action: The enterprise rate limiter uses Bucket4j with Redis. Ensure Redis is provisioned and rate limiting is configured per tenant or API key.

Rate limits are enforced by RateLimitServletFilter on all /v1/* data-plane paths. MCP proxy has McpRateLimitFilter at order 200.

Risk if skipped

A single client can monopolize gateway capacity, causing denial of service for other tenants.

12. API Key Rotation

Regularly rotate API keys to limit the blast radius of a compromised key.

Action: Use the key rotation endpoint and update clients.

# Rotate an API key (returns new key, old key is revoked)
curl -X POST http://localhost:8080/admin/v1/tenants/{tid}/keys/{kid}/rotate

Recommended cadence: Every 90 days, or immediately if a compromise is suspected.

For MCP server credentials:

# Rotate MCP server credentials
curl -X POST http://localhost:8080/admin/v1/mcp/servers/{id}/credentials/rotate \
  -H "Content-Type: application/json" \
  -d '{"newCredentialRef": "secret/data/dvara/mcp/new-cred"}'

Risk if skipped

A leaked API key provides indefinite access until manually revoked.

13. SIEM Export

Forward audit events to your Security Information and Event Management (SIEM) system for centralized monitoring and alerting.

Action: Configure SIEM export. The built-in LoggingSiemExporter writes JSON to the siem.export logger, which can be routed to Splunk, Elasticsearch, or CloudWatch via log shipping.

For direct integration, configure Splunk HEC or CloudWatch exporters (requires custom bean registration).

Ensure the following audit event types are monitored:

POLICY_DENIED -- blocked requests
PII_DETECTED / PII_REDACTED -- sensitive data handling
IP_ACCESS_DENIED -- unauthorized access attempts
AUTHORIZATION_DENIED -- RBAC violations
INJECTION_DETECTED -- prompt injection attempts
GUARDRAIL_BLOCKED -- content policy violations
BUDGET_CAP_HARD -- budget overruns
AGENT_LOOP_DETECTED -- runaway agent sessions
CONFIG_SYNC_FAILURE -- control plane connectivity issues

Risk if skipped

Security incidents are only visible in local logs, which may be lost or tampered with.

14. mTLS for Provider Connections

Configure mutual TLS (client certificates) for outbound connections to LLM providers, especially in regulated environments.

Action: Configure per-provider mTLS settings.

gateway:
  tls:
    enforce-tls13: true
    providers:
      openai:
        mtls-enabled: true
        client-cert-path: /etc/dvara/certs/openai-client.pem
        client-key-path: /etc/dvara/certs/openai-client-key.pem
        trust-store-path: /etc/dvara/certs/openai-truststore.p12
        trust-store-password: "${OPENAI_TRUSTSTORE_PASSWORD}"
      anthropic:
        mtls-enabled: true
        client-cert-path: /etc/dvara/certs/anthropic-client.pem
        client-key-path: /etc/dvara/certs/anthropic-client-key.pem

Risk if skipped

Provider connections use one-way TLS only; the provider cannot verify the gateway's identity.

15. Monitoring with Prometheus and Grafana

Set up comprehensive monitoring using the built-in Prometheus metrics endpoint.

Action: Configure Prometheus to scrape the gateway and set up Grafana dashboards.

Prometheus scrape config

scrape_configs:
  - job_name: dvara-gateway
    metrics_path: /actuator/prometheus
    static_configs:
      - targets: ['gateway:8080']
  - job_name: dvara-mcp-proxy
    metrics_path: /actuator/prometheus
    static_configs:
      - targets: ['mcp-proxy:8070']

Key metrics to alert on

Metric	Condition	Severity
`gateway_provider_errors_total`	Rate > 10/min	Warning
`gateway_latency_seconds` (P99)	> 5s sustained	Warning
`gateway_budget_blocked_total`	Any increment	Info
`gateway_guardrail_blocked_total`	Rate > 5/min	Critical
`gateway_policy_shadow_divergence_total`	Any increment	Info
`mcp_agent_loop_detected_total`	Any increment	Warning
`gateway_config_sync_failures_total`	Any increment	Critical
`gateway_cost_anomaly_total`	Any increment	Warning

OpenTelemetry distributed tracing

export OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318/v1/traces
export TRACING_SAMPLING_PROBABILITY=0.1   # 10% sampling in production

Risk if skipped

Degraded performance, provider outages, and security events go undetected until users report issues.

Quick Reference: Minimum Production Configuration

The following environment variables represent the minimum set for a secure production deployment:

# Enterprise license
GATEWAY_ENTERPRISE_LICENSE_KEY=<jwt-token>

# Audit integrity
GATEWAY_AUDIT_HMAC_SECRET=<random-32-byte-base64>

# Authentication
GATEWAY_SECURITY_ENABLED=true
GATEWAY_OIDC_ISSUER_URI=https://your-idp.example.com/realms/dvara
GATEWAY_OIDC_AUDIENCE=dvara-gateway

# Secrets management
GATEWAY_VAULT_BACKEND=hashicorp   # or aws-secrets-manager, azure-key-vault

# IP access control
GATEWAY_IP_ACCESS_ENABLED=true

# PII protection
GATEWAY_PII_DEFAULT_ACTION=REDACT

# TLS
GATEWAY_TLS_ENFORCE_TLS13=true

# Observability
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318/v1/traces
TRACING_SAMPLING_PROBABILITY=0.1

1. Audit HMAC Secret​

2. Enterprise License Key​

3. OIDC/JWT Authentication​

4. Vault-Backed Secret Management​

HashiCorp Vault​

AWS Secrets Manager​

Azure Key Vault​

5. IP Access Control​

6. TLS 1.3 Enforcement​

7. PII Detection and Enforcement​

8. Budget Caps​

9. Webhook Alerting​

10. RBAC Role Assignments​

11. Rate Limiting​

12. API Key Rotation​

13. SIEM Export​

14. mTLS for Provider Connections​

15. Monitoring with Prometheus and Grafana​

Prometheus scrape config​

Key metrics to alert on​

OpenTelemetry distributed tracing​

Quick Reference: Minimum Production Configuration​

1. Audit HMAC Secret

2. Enterprise License Key

3. OIDC/JWT Authentication

4. Vault-Backed Secret Management

HashiCorp Vault

AWS Secrets Manager

Azure Key Vault

5. IP Access Control

6. TLS 1.3 Enforcement

7. PII Detection and Enforcement

8. Budget Caps

9. Webhook Alerting

10. RBAC Role Assignments

11. Rate Limiting

12. API Key Rotation

13. SIEM Export

14. mTLS for Provider Connections

15. Monitoring with Prometheus and Grafana

Prometheus scrape config

Key metrics to alert on

OpenTelemetry distributed tracing

Quick Reference: Minimum Production Configuration