Skip to main content

Platform architecture

DVARA is an AI governance platform delivered as three services plus a PostgreSQL. Two of those services — the LLM Gateway and the MCP Proxy — are governed data planes; the third is DVARA Flightdeck, the control plane and management UI. Every request, whether from an application's OpenAI SDK or an AI agent's MCP client, walks the same governance pipeline, backed by the same policy engine and the same immutable audit trail.

This page has two parts:

  1. System architecture — the services that make up the platform and how they relate.
  2. Request lifecycle — the in-order pipeline every request runs through before and after the single upstream hop.

System architecture

DVARA Architecture

Figure 1. DVARA platform components. The LLM Gateway (port 8080) and MCP Proxy (port 8070) are the governed data planes; DVARA Flightdeck (port 8090) is the shared control plane. All three connect to a single PostgreSQL instance for configuration and audit.

The platform ships as three services plus PostgreSQL. The DVARA LLM Gateway and the DVARA MCP Proxy are components of the platform — each is a governed data plane, each lives in the network zone where it belongs, and both flow through the same control plane for policy, audit, RBAC, and cost attribution.

  • DVARA LLM Gateway — the data plane for LLM traffic. Applications send OpenAI-format requests; the gateway applies the full governance pipeline (policy, PII, guardrails, budget, routing, audit) and returns a provider-agnostic response in OpenAI format. Runs at the network edge with outbound access to public AI providers.
  • DVARA MCP Proxy — the data plane for MCP tool traffic. AI agents send tool calls through the proxy; it enforces policy, scans for PII on arguments and responses, checks approval gates, and forwards to your internal MCP servers. Runs inside the perimeter where the MCP servers live.
  • DVARA Flightdeck — the control plane, serving three audiences from one process: a platform dashboard for operators, a tenant self-service portal where tenant users manage their own API keys, BYOK credentials, and usage, and a REST API for CI/CD and automation.
  • PostgreSQL — the single source of truth for configuration (tenants, routes, policies, budgets, credentials) and the tamper-evident audit trail.

Data plane vs control plane

The separation is deliberate.

The data plane — the LLM Gateway and the MCP Proxy — is on the request path. It has to be fast, horizontally scalable, and survive a control-plane outage. It holds no write state of its own; configuration reads (policies, routes, credentials) and audit writes share a tight two-connection PostgreSQL pool — one connection for config reads, one for audit writes — and configuration is cached locally for sub-millisecond lookup. Changes propagate via PostgreSQL NOTIFY, not a message broker.

The control plane — the DVARA Flightdeck — is where operators change the governance. Creating a policy, rotating a credential, updating a budget cap, importing a GitOps config bundle, running a compliance report — all of that happens through the DVARA Flightdeck's REST API and UI. It writes to PostgreSQL; the data plane picks up the change on the next notification without a restart.

Treating these as separate concerns keeps the hot path lean (no Admin SDK on the gateway, no live UI code in the request path) and makes the governance auditable (every configuration change leaves a signed audit event before it reaches any data plane).

Request lifecycle

The diagram below shows the six request stages and five response stages. The stages run in order; each can reject, redact, or annotate the request before it reaches the provider. The response traverses the mirrored stages back to the caller.

REQUEST ↓RESPONSE ↑ClientBudgetPolicyPII scanGuardrailsRoutingClientAuditPII scanGuardrailsGroundingSchemaPROVIDERsingle upstream hop — every stage applies uniformly

Figure 2. DVARA request and response pipeline. The request descends the left column, every stage runs in order, and the single amber Provider block is the only upstream hop regardless of which model answers the call. The response ascends the right column back to the client — every guardrail on either side applies uniformly.

The filter chain is ordered and composable. The request chain is Budget → Policy → PII → Guardrails → Routing; the response chain is Schema → Grounding → Guardrails → PII → Audit. A stage early in the chain can reject, redact, or annotate the request before anything downstream sees it, so expensive operations (like the upstream call) never fire for requests that would have been denied. Several additional stages — template resolution, priority admission, model downgrade, context-window governance — run alongside; the deeper Routes & Policies doc has the full list.

What each stage enforces

If you've read about a specific governance feature and want to know where in the pipeline it runs, here's the map. Each row links to the deeper doc.

StageWhat it enforcesDeep dive
BudgetHard / soft limits at global / tenant / API-key scope. Hard breaches reject the request immediately; soft breaches fire webhook alerts and trigger automatic model downgrade.Cost Management
PolicyYAML DSL — DENY / WARN_AGENT rules using closed-form conditions: matchers or a CEL expression: for patterns the matchers can't compose.Routes & Policies
PII scanChecksum-validated regex (Luhn, DEA) + optional Microsoft Presidio NER, covering PII, PHI (MRN / DEA / NPI), and PCI (credit card, bank routing) entity types. Per-tenant BLOCK / REDACT / LOG. Runs on the request and again on the response.PII Detection
GuardrailsPattern + ML scanning across prompt injection, jailbreaks, system-prompt leakage (OWASP LLM07), output sanitization (LLM05 — XSS / SQLi / cmd injection / SSRF in responses), and input size limits (LLM10). Content filters: profanity / violence / sexual / competitor / topic restrictions. Per-tenant BLOCK / FLAG / LOG. BYO scanner via HMAC-SHA256 signed webhook plugin; optional Lakera or Google Shield-Gemini ML classifier hooks.Guardrails
RoutingEight strategies (model-prefix, round-robin, weighted, latency-aware, cost-aware, canary, geo-aware, intelligent), composed with capability-aware filtering — providers without the requested json_schema / vision / tool-calls capability are excluded before selection. Failover on circuit-open with capability-aware fallback. Per-tenant priority admission (premium / standard / bulk). Hard DATA_RESIDENCY_VIOLATION gate on region mismatch. Shadow traffic to a comparison provider on virtual threads.Routing
Provider14 first-class providers (OpenAI, Anthropic, Gemini, Bedrock, Azure OpenAI, Mistral, Cohere, Groq, Ollama, Qwen, DeepSeek, Moonshot, ChatGLM, Grok) + mock. Single upstream hop with OpenAI-compatible translation. Per-tenant BYOK chain: tenant credential → platform default → Vault → env var; optional strict-BYOK refuses the platform fallback. Per-provider mTLS; TLS 1.3 default. Vault: HashiCorp / AWS SM / Azure KV.Provider Setup
Schemaresponse_format: json_object / json_schema translation per provider — native passthrough (OpenAI, Azure, Mistral, Gemini), tool-use rewrite (Anthropic, Bedrock) with X-Gateway-Strict-Downgraded signal, UNSUPPORTED_RESPONSE_FORMAT on providers without JSON-mode support. Optional server-side schema registry (per route or model pattern) revalidates responses with networknt/json-schema-validator; failures → HTTP 422 SCHEMA_VALIDATION_FAILED with up to 2 corrective retries.Structured Outputs
GroundingPer-claim embedding similarity between the LLM response and source documents passed in request.metadata["grounding.sources"]. Response split into sentence-level claims; claims below the configurable cosine threshold (default 0.7) are flagged. Per-tenant BLOCK (HTTP 403 HALLUCINATION_DETECTED) / FLAG / LOG. Streaming responses checked at stream end after SSE-chunk accumulation.Hallucination Detection
AuditAppend-only event for every request, response, and admin action — HMAC-SHA256 signed + hash-chained to the previous event; scheduled AuditChainVerificationJob verifies the chain end-to-end. Per-tenant scoped. Fans out to webhook subscribers and SIEM exporters (Splunk HEC, CloudWatch Logs, Kafka). Optional cold-storage archive to S3 / DigitalOcean Spaces. SOC 2 / HIPAA / GDPR compliance reports generated on demand from the same chain.Observability

Governance for MCP is the mirror image

Everything on this page describes the LLM Gateway. The MCP Proxy is the same model applied to tool calls: the proxy runs the same governance pipeline (policy, loop detection, approval gates, PII, injection, audit) before the tool executes, and runs PII on the tool's response on the way back. The filter stages and the policy engine are the same code as the LLM path. See Agentic governance for the MCP-specific filter order.

Next steps

Depth:

  • Providers — per-provider setup and the capability matrix the routing stage uses
  • Routing — every routing strategy in detail, including canary and shadow traffic
  • Multi-tenancy — how tenant isolation is enforced at every layer of this pipeline
  • Credentials & BYOK — how the provider stage resolves a credential at request time

Go deeper on governance: