Version: 1.3.0

Platform architecture

DVARA is an AI governance platform delivered as three core services plus a PostgreSQL, with an opt-in fourth data plane for agent-to-agent traffic. Two of the core services — the LLM Gateway and the MCP Proxy — are governed data planes; a third governed data plane, the opt-in A2A Proxy, governs agent-to-agent hops; and DVARA Flightdeck is the control plane and management UI. Every request — whether from an application's OpenAI SDK, an AI agent's MCP client, or one agent delegating to another — walks the same governance model, backed by the same policy discipline and immutable audit trail.

This page has two parts:

System architecture — the services that make up the platform and how they relate.
Request lifecycle — the in-order pipeline every request runs through before and after the single upstream hop.

System architecture

DVARA Architecture — **Figure 1.** DVARA platform components. The LLM Gateway (port 8080) and MCP Proxy (port 8070) are the governed data planes, with the opt-in A2A Proxy (port 8075) as a third; DVARA Flightdeck (port 8090) is the shared control plane. All connect to a single PostgreSQL instance for configuration and audit.

The platform ships as three core services plus PostgreSQL, with the A2A Proxy as an opt-in fourth. Each data plane lives in the network zone where it belongs, and all of them flow through the same control plane for policy, audit, RBAC, and cost attribution.

DVARA LLM Gateway — the data plane for LLM traffic. Applications send OpenAI-format requests; the gateway applies the full governance pipeline (policy, PII, guardrails, budget, routing, audit) and returns a provider-agnostic response in OpenAI format. Runs at the network edge with outbound access to public AI providers.
DVARA MCP Proxy — the data plane for MCP tool traffic. AI agents send tool calls through the proxy; it enforces policy, scans for PII on arguments and responses, checks approval gates, and forwards to your internal MCP servers. Runs inside the perimeter where the MCP servers live.
DVARA A2A Proxy (opt-in, port 8075) — the data plane for agent-to-agent traffic. One agent delegates to another through the proxy; it authenticates the caller, checks per-hop policy, scans the message for PII, enforces approval and loop detection, and forwards to the registered peer agent — governing exactly one hop with its own tamper-evident audit chain. Off by default; see A2A Governance Plane.
DVARA Flightdeck — the control plane, serving three audiences from one process: a platform dashboard for operators, a tenant self-service portal where tenant users manage their own API keys, BYOK credentials, and usage, and a REST API for CI/CD and automation.
PostgreSQL — the single source of truth for configuration (tenants, routes, policies, budgets, credentials) and the tamper-evident audit trail.

Data plane vs control plane

The separation is deliberate.

The data plane — the LLM Gateway, the MCP Proxy, and (when enabled) the A2A Proxy — is on the request path. It has to be fast, horizontally scalable, and survive a control-plane outage. It holds no write state of its own; configuration reads (policies, routes, credentials) and audit writes share a tight two-connection PostgreSQL pool — one connection for config reads, one for audit writes — and configuration is cached locally for sub-millisecond lookup. Changes propagate via a lightweight config_versions version-table poll (a few-second interval), not a message broker — gap-immune and pooler-agnostic.

The control plane — the DVARA Flightdeck — is where operators change the governance. Creating a policy, rotating a credential, updating a budget cap, importing a GitOps config bundle, running a compliance report — all of that happens through the DVARA Flightdeck's REST API and UI. It writes to PostgreSQL; the data plane picks up the change on its next config poll without a restart.

Treating these as separate concerns keeps the hot path lean (no Admin SDK on the gateway, no live UI code in the request path) and makes the governance auditable (every configuration change leaves a signed audit event before it reaches any data plane).

Request lifecycle

The diagram below shows the six request stages and five response stages. The stages run in order; each can reject, redact, or annotate the request before it reaches the provider. The response traverses the mirrored stages back to the caller.

Figure 2. DVARA request and response pipeline. The request descends the left column, every stage runs in order, and the single amber Provider block is the only upstream hop regardless of which model answers the call. The response ascends the right column back to the client — every guardrail on either side applies uniformly.

The filter chain is ordered and composable. The request chain is Budget → Policy → PII → Guardrails → Routing; the response chain is Schema → Grounding → Guardrails → PII → Audit. A stage early in the chain can reject, redact, or annotate the request before anything downstream sees it, so expensive operations (like the upstream call) never fire for requests that would have been denied. Several additional stages — template resolution, priority admission, model downgrade, context-window governance — run alongside; the deeper Routes & Policies doc has the full list.

What each stage enforces

If you've read about a specific governance feature and want to know where in the pipeline it runs, here's the map. Each row links to the deeper doc.

Stage	What it enforces	Deep dive
Budget	Hard / soft limits at global / tenant / API-key scope. Hard breaches reject the request immediately; soft breaches fire webhook alerts and trigger automatic model downgrade.	Cost Management
Policy	YAML DSL — `DENY` / `WARN_AGENT` rules using closed-form `conditions:` matchers or a CEL `expression:` for patterns the matchers can't compose.	Routes & Policies
PII scan	Checksum-validated regex (Luhn, DEA) + optional Microsoft Presidio NER, covering PII, PHI (MRN / DEA / NPI), and PCI (credit card, bank routing) entity types. Per-tenant `BLOCK` / `REDACT` / `LOG`. Runs on the request and again on the response.	PII Detection
Guardrails	Pattern + ML scanning across prompt injection, jailbreaks, system-prompt leakage (OWASP LLM07), output sanitization (LLM05 — XSS / SQLi / cmd injection / SSRF in responses), and input size limits (LLM10). Content filters: profanity / violence / sexual / competitor / topic restrictions. Per-tenant `BLOCK` / `FLAG` / `LOG`. BYO scanner via HMAC-SHA256 signed webhook plugin; optional Lakera or Google Shield-Gemini ML classifier hooks.	Guardrails
Routing	Eight strategies (model-prefix, round-robin, weighted, latency-aware, cost-aware, canary, geo-aware, intelligent), composed with capability-aware filtering — providers without the requested `json_schema` / vision / tool-calls capability are excluded before selection. Failover on circuit-open with capability-aware fallback. Per-tenant priority admission (premium / standard / bulk). Hard `DATA_RESIDENCY_VIOLATION` gate on region mismatch. Shadow traffic to a comparison provider on virtual threads.	Routing
Provider	14 first-class providers (OpenAI, Anthropic, Gemini, Bedrock, Azure OpenAI, Mistral, Cohere, Groq, Ollama, Qwen, DeepSeek, Moonshot, ChatGLM, Grok) + mock. Single upstream hop with OpenAI-compatible translation. Per-tenant BYOK chain: tenant credential → platform default → Vault → env var; optional strict-BYOK refuses the platform fallback. Per-provider mTLS; TLS 1.3 default. Vault: HashiCorp / AWS SM / Azure KV.	Provider Setup
Schema	`response_format: json_object` / `json_schema` translation per provider — native passthrough (OpenAI, Azure, Mistral, Gemini), tool-use rewrite (Anthropic, Bedrock) with `X-Gateway-Strict-Downgraded` signal, `UNSUPPORTED_RESPONSE_FORMAT` on providers without JSON-mode support. Optional server-side schema registry (per route or model pattern) revalidates responses against the registered JSON schema; failures → HTTP 422 `SCHEMA_VALIDATION_FAILED` with up to 2 corrective retries.	Structured Outputs
Grounding	Per-claim embedding similarity between the LLM response and source documents passed in `request.metadata["grounding.sources"]`. Response split into sentence-level claims; claims below the configurable cosine threshold (default 0.7) are flagged. Per-tenant `BLOCK` (HTTP 403 `HALLUCINATION_DETECTED`) / `FLAG` / `LOG`. Streaming responses checked at stream end after SSE-chunk accumulation.	Hallucination Detection
Audit	Append-only event for every request, response, and admin action — HMAC-SHA256 signed + hash-chained to the previous event; a scheduled job verifies the chain end-to-end. Per-tenant scoped. Fans out to webhook subscribers and SIEM exporters (Splunk HEC, CloudWatch Logs, Kafka). Optional cold-storage archive to S3 / DigitalOcean Spaces. SOC 2 / HIPAA / GDPR compliance reports generated on demand from the same chain.	Observability

Governance for MCP is the mirror image

Everything on this page describes the LLM Gateway. The MCP Proxy is the same model applied to tool calls: the proxy runs the same governance pipeline (policy, loop detection, approval gates, PII, injection, audit) before the tool executes, and runs PII on the tool's response on the way back. The filter stages and the policy engine are the same code as the LLM path. See Agentic governance for the MCP-specific filter order.

Next steps

Depth:

Providers — per-provider setup and the capability matrix the routing stage uses
Routing — every routing strategy in detail, including canary and shadow traffic
Multi-tenancy — how tenant isolation is enforced at every layer of this pipeline
Credentials & BYOK — how the provider stage resolves a credential at request time

Go deeper on governance:

PII Detection & Redaction
Guardrails
Agentic Governance — MCP Proxy, approval gates, loop detection
Observability — the signed audit trail, SIEM exports, and distributed tracing

System architecture​

Data plane vs control plane​

Request lifecycle​

What each stage enforces​

Governance for MCP is the mirror image​

Next steps​