Overview
The Data Plane API is the OpenAI-compatible HTTP surface served by the DVARA LLM Gateway component. Every LLM call routes through a single governance pipeline — policy, PII scanning, guardrails, budget enforcement, routing, and audit — before handing the request off to an upstream provider. Applications and agents call it the same way they would call OpenAI directly, so you can move traffic behind DVARA without changing a line of SDK code, and governance, observability, and multi-provider routing kick in on call one.
You can use this API to
- Run chat completions against any configured provider (OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure OpenAI, Mistral, Cohere, Groq, Qwen, DeepSeek, Moonshot/Kimi, ChatGLM/Zhipu, Grok/xAI, Ollama, or the built-in Mock provider) using the familiar
POST /v1/chat/completionsshape — streaming or non-streaming, with tool calls, structured outputs, and multimodal input where the upstream supports it. - Generate embeddings for retrieval, clustering, and semantic cache warming via
POST /v1/embeddings, routed through the same policy and budget pipeline as chat traffic. - Discover available models across every registered provider in one call with
GET /v1/models, and read back each model's capabilities (streaming, vision, tool calls, JSON mode, structured outputs, max context tokens) without hard-coding a client-side matrix. - Health-check the gateway from liveness probes and uptime monitors with the unauthenticated
GET /actuator/healthendpoint. For the rich gateway status (providers, routes, license), useGET /actuator/gateway-statuswithAuthorization: Bearer $DVARA_ACTUATOR_API_KEY— that surface is for operator tooling, not client SDKs.
Key features
Every Data Plane operation inherits the same guarantees, so you can pick a provider per request without giving up any of them:
- Uniform request pipeline. Authentication, policy evaluation, PII scanning, guardrail enforcement, budget caps, routing, and audit log emission apply in the same order to every call, on every provider, with no carve-outs. Flip a policy or budget in the DVARA Flightdeck and the change is live across every model inside a second or two.
- Provider-agnostic contract. The request and response shapes are the OpenAI chat and embeddings schemas. Provider-specific quirks (Anthropic's system-prompt split, Gemini's
generationConfig, Bedrock's SigV4, Azure's deployment names) are handled inside the gateway — client code never sees them. - Structured outputs with automatic downgrade. When a request sets
response_format: json_schemaorjson_object, the gateway filters the route's provider pool to only the providers that natively support the requested format before the routing strategy runs, and rewrites the request into each provider's native structured-output dialect where needed. - Streaming as a first-class path.
stream: truereturns SSE chunks with the same governance pipeline enforced on the way out — PII and content filters scan each chunk in a rolling window, and a violation can terminate the stream with acontent_filterfinish reason before the client renders the leak. - Observability out of the box. Every request emits an access log line with a trace ID, records Prometheus metrics (request count, latency percentiles, token counts, cost, fallbacks), and writes a signed audit event. Agent traffic can correlate across providers and tool calls by passing a consistent
X-Session-Idheader.
Getting started
Prerequisites
- A running DVARA LLM Gateway reachable on port
8080(default). See the Quickstart for a five-minute Docker Compose setup. DVARA requires a valid license key and a PostgreSQL instance. - At least one provider credential configured, either as an environment variable on the gateway process (
OPENAI_API_KEY,ANTHROPIC_API_KEY, ...) or as a tenant-scoped credential created from the DVARA Flightdeck. The data plane returnsNO_PROVIDER(HTTP 400) for any model whose provider has no credentials. - A tenant API key issued from the DVARA Flightdeck — this is the
gw_...Bearer token the client sends on every request. See Authentication for how to create and rotate keys.
Authentication
Every /v1/* call requires an API key in the Authorization header:
Authorization: Bearer gw_<your-key>
The key resolves to a tenant server-side, so you never pass a tenant ID explicitly. Every downstream stage — rate limits, policies, budgets, PII scans, audit events — applies under that tenant's configuration. See the dedicated Authentication page for details on creating, rotating, revoking, and scoping API keys.
Test your connection
The /actuator/health endpoint is unauthenticated and returns gateway liveness. Use it as a liveness probe and as a smoke test before making your first real call:
curl -s http://localhost:8080/actuator/health | jq
A healthy gateway returns:
{
"status": "UP"
}
For the rich gateway status (providers, routes, license, warnings), use GET /actuator/gateway-status with Authorization: Bearer $DVARA_ACTUATOR_API_KEY. If that endpoint reports license.status: DEGRADED, the license has expired beyond its 14-day grace window and data plane calls will start returning LICENSE_EXPIRED (HTTP 402). Renew from the DVARA Flightdeck.
Try your first chat completion
curl -s http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer $DVARA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "Summarize what a gateway does in one sentence."}
]
}'
The response is the standard OpenAI chat-completions envelope with id, object, created, model, choices, and usage. The gateway also adds a X-Trace-ID response header that correlates the call to structured access logs, Prometheus time series, and audit events.
Explore the API
Every operation has its own reference page in the Specifications section. Each one shows the request shape, response shape, example payloads, and a Try it panel that sends a real request to whichever gateway you point at — see Try the API for how to change the target.
API reference
Base URL
The default base URL is http://localhost:8080 for a local Docker Compose or spring-boot:run gateway. Production deployments typically sit behind a TLS terminator at a hostname like https://gateway.example.com. The servers[] block on every operation page lets you override protocol, host, and port from the Try-it panel without editing the spec.
Versioning
Data plane endpoints live under /v1/*. The v1 prefix is stable — breaking changes will ship under /v2/* and both versions will run in parallel for at least one release. Within v1, additive changes (new optional fields, new response headers, new values on existing enums) ship without a version bump.
Authentication
HTTP Bearer token: Authorization: Bearer gw_<key>. Missing, malformed, or revoked keys return HTTP 401. Keys are tenant-scoped — a key that belongs to tenant A can never read tenant B's data even via the same endpoint.
Content type
Requests: application/json. Responses: application/json for non-streaming calls, text/event-stream for stream: true chat completions. SSE chunks carry data: {...} lines with the same OpenAI delta envelope OpenAI emits, plus a terminating data: [DONE].
Request size limits
Every tenant has a configurable guardrail cap on input tokens, message count, and per-message length. The defaults match the OWASP LLM10 unbounded-consumption recommendations: 32,000 input tokens, 100 messages per request, 50,000 characters per message. Requests that exceed these caps return HTTP 413 with error code INPUT_TOO_LARGE.
HTTP status codes
| Code | Meaning | Typical error codes |
|---|---|---|
200 | Success | — |
400 | Request was rejected before dispatch | NO_PROVIDER, NO_CAPABLE_PROVIDER, INVALID_REQUEST, CONTEXT_WINDOW_EXCEEDED |
401 | Missing or invalid API key | UNAUTHORIZED |
402 | Hard budget cap hit or license expired | BUDGET_CAP_HARD, LICENSE_EXPIRED |
403 | Request blocked by policy, PII, guardrail, IP access, data residency | POLICY_DENIED, PII_DETECTED, GUARDRAIL_BLOCKED, IP_ACCESS_DENIED, DATA_RESIDENCY_VIOLATION |
413 | Request exceeds input size limits | INPUT_TOO_LARGE |
422 | Response failed schema validation after retries | SCHEMA_VALIDATION_FAILED |
429 | Rate limit or priority admission control rejected the request | RATE_LIMIT_EXCEEDED, PRIORITY_THROTTLED |
502 | Every provider on the route (and every fallback) failed | PROVIDER_ERROR |
503 | Primary provider failed and no capable fallback exists | FAILOVER_CAPABILITY_MISMATCH |
Every non-2xx response body is a structured {"error": {"message", "type", "code", "param"}} envelope modeled on OpenAI's error shape, so existing SDKs surface DVARA errors without special handling.
Date formats
Timestamps in request and response bodies are ISO-8601 in UTC with a Z suffix (2026-01-01T00:00:00Z). The X-Request-Start response header, when present, carries a Unix millisecond epoch for clock skew diagnostics.
What's next?
- Routing & Load Balancing — pick a provider per request with prefix match, round-robin, weighted, latency-aware, cost-aware, canary, geo-aware, or intelligent strategies.
- Governance & Safety — configure policies, PII enforcement, guardrails, and budget caps that apply to every call on this API automatically.
- DVARA Flightdeck — create API keys, rotate provider credentials, and view live traffic, latency, and cost dashboards.