Skip to main content

Overview

The Data Plane API is the OpenAI-compatible HTTP surface served by the DVARA LLM Gateway component. Every LLM call routes through a single governance pipeline — policy, PII scanning, guardrails, budget enforcement, routing, and audit — before handing the request off to an upstream provider. Applications and agents call it the same way they would call OpenAI directly, so you can move traffic behind DVARA without changing a line of SDK code, and governance, observability, and multi-provider routing kick in on call one.

You can use this API to

  • Run chat completions against any configured provider (OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure OpenAI, Mistral, Cohere, Groq, Qwen, DeepSeek, Moonshot/Kimi, ChatGLM/Zhipu, Grok/xAI, Ollama, or the built-in Mock provider) using the familiar POST /v1/chat/completions shape — streaming or non-streaming, with tool calls, structured outputs, and multimodal input where the upstream supports it.
  • Generate embeddings for retrieval, clustering, and semantic cache warming via POST /v1/embeddings, routed through the same policy and budget pipeline as chat traffic.
  • Discover available models across every registered provider in one call with GET /v1/models, and read back each model's capabilities (streaming, vision, tool calls, JSON mode, structured outputs, max context tokens) without hard-coding a client-side matrix.
  • Health-check the gateway from liveness probes and uptime monitors with the unauthenticated GET /actuator/health endpoint. For the rich gateway status (providers, routes, license), use GET /actuator/gateway-status with Authorization: Bearer $DVARA_ACTUATOR_API_KEY — that surface is for operator tooling, not client SDKs.

Key features

Every Data Plane operation inherits the same guarantees, so you can pick a provider per request without giving up any of them:

  • Uniform request pipeline. Authentication, policy evaluation, PII scanning, guardrail enforcement, budget caps, routing, and audit log emission apply in the same order to every call, on every provider, with no carve-outs. Flip a policy or budget in the DVARA Flightdeck and the change is live across every model inside a second or two.
  • Provider-agnostic contract. The request and response shapes are the OpenAI chat and embeddings schemas. Provider-specific quirks (Anthropic's system-prompt split, Gemini's generationConfig, Bedrock's SigV4, Azure's deployment names) are handled inside the gateway — client code never sees them.
  • Structured outputs with automatic downgrade. When a request sets response_format: json_schema or json_object, the gateway filters the route's provider pool to only the providers that natively support the requested format before the routing strategy runs, and rewrites the request into each provider's native structured-output dialect where needed.
  • Streaming as a first-class path. stream: true returns SSE chunks with the same governance pipeline enforced on the way out — PII and content filters scan each chunk in a rolling window, and a violation can terminate the stream with a content_filter finish reason before the client renders the leak.
  • Observability out of the box. Every request emits an access log line with a trace ID, records Prometheus metrics (request count, latency percentiles, token counts, cost, fallbacks), and writes a signed audit event. Agent traffic can correlate across providers and tool calls by passing a consistent X-Session-Id header.

Getting started

Prerequisites

  • A running DVARA LLM Gateway reachable on port 8080 (default). See the Quickstart for a five-minute Docker Compose setup. DVARA requires a valid license key and a PostgreSQL instance.
  • At least one provider credential configured, either as an environment variable on the gateway process (OPENAI_API_KEY, ANTHROPIC_API_KEY, ...) or as a tenant-scoped credential created from the DVARA Flightdeck. The data plane returns NO_PROVIDER (HTTP 400) for any model whose provider has no credentials.
  • A tenant API key issued from the DVARA Flightdeck — this is the gw_... Bearer token the client sends on every request. See Authentication for how to create and rotate keys.

Authentication

Every /v1/* call requires an API key in the Authorization header:

Authorization: Bearer gw_<your-key>

The key resolves to a tenant server-side, so you never pass a tenant ID explicitly. Every downstream stage — rate limits, policies, budgets, PII scans, audit events — applies under that tenant's configuration. See the dedicated Authentication page for details on creating, rotating, revoking, and scoping API keys.

Test your connection

The /actuator/health endpoint is unauthenticated and returns gateway liveness. Use it as a liveness probe and as a smoke test before making your first real call:

curl -s http://localhost:8080/actuator/health | jq

A healthy gateway returns:

{
"status": "UP"
}

For the rich gateway status (providers, routes, license, warnings), use GET /actuator/gateway-status with Authorization: Bearer $DVARA_ACTUATOR_API_KEY. If that endpoint reports license.status: DEGRADED, the license has expired beyond its 14-day grace window and data plane calls will start returning LICENSE_EXPIRED (HTTP 402). Renew from the DVARA Flightdeck.

Try your first chat completion

curl -s http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer $DVARA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "Summarize what a gateway does in one sentence."}
]
}'

The response is the standard OpenAI chat-completions envelope with id, object, created, model, choices, and usage. The gateway also adds a X-Trace-ID response header that correlates the call to structured access logs, Prometheus time series, and audit events.

Explore the API

Every operation has its own reference page in the Specifications section. Each one shows the request shape, response shape, example payloads, and a Try it panel that sends a real request to whichever gateway you point at — see Try the API for how to change the target.

API reference

Base URL

The default base URL is http://localhost:8080 for a local Docker Compose or spring-boot:run gateway. Production deployments typically sit behind a TLS terminator at a hostname like https://gateway.example.com. The servers[] block on every operation page lets you override protocol, host, and port from the Try-it panel without editing the spec.

Versioning

Data plane endpoints live under /v1/*. The v1 prefix is stable — breaking changes will ship under /v2/* and both versions will run in parallel for at least one release. Within v1, additive changes (new optional fields, new response headers, new values on existing enums) ship without a version bump.

Authentication

HTTP Bearer token: Authorization: Bearer gw_<key>. Missing, malformed, or revoked keys return HTTP 401. Keys are tenant-scoped — a key that belongs to tenant A can never read tenant B's data even via the same endpoint.

Content type

Requests: application/json. Responses: application/json for non-streaming calls, text/event-stream for stream: true chat completions. SSE chunks carry data: {...} lines with the same OpenAI delta envelope OpenAI emits, plus a terminating data: [DONE].

Request size limits

Every tenant has a configurable guardrail cap on input tokens, message count, and per-message length. The defaults match the OWASP LLM10 unbounded-consumption recommendations: 32,000 input tokens, 100 messages per request, 50,000 characters per message. Requests that exceed these caps return HTTP 413 with error code INPUT_TOO_LARGE.

HTTP status codes

CodeMeaningTypical error codes
200Success
400Request was rejected before dispatchNO_PROVIDER, NO_CAPABLE_PROVIDER, INVALID_REQUEST, CONTEXT_WINDOW_EXCEEDED
401Missing or invalid API keyUNAUTHORIZED
402Hard budget cap hit or license expiredBUDGET_CAP_HARD, LICENSE_EXPIRED
403Request blocked by policy, PII, guardrail, IP access, data residencyPOLICY_DENIED, PII_DETECTED, GUARDRAIL_BLOCKED, IP_ACCESS_DENIED, DATA_RESIDENCY_VIOLATION
413Request exceeds input size limitsINPUT_TOO_LARGE
422Response failed schema validation after retriesSCHEMA_VALIDATION_FAILED
429Rate limit or priority admission control rejected the requestRATE_LIMIT_EXCEEDED, PRIORITY_THROTTLED
502Every provider on the route (and every fallback) failedPROVIDER_ERROR
503Primary provider failed and no capable fallback existsFAILOVER_CAPABILITY_MISMATCH

Every non-2xx response body is a structured {"error": {"message", "type", "code", "param"}} envelope modeled on OpenAI's error shape, so existing SDKs surface DVARA errors without special handling.

Date formats

Timestamps in request and response bodies are ISO-8601 in UTC with a Z suffix (2026-01-01T00:00:00Z). The X-Request-Start response header, when present, carries a Unix millisecond epoch for clock skew diagnostics.

What's next?

  • Routing & Load Balancing — pick a provider per request with prefix match, round-robin, weighted, latency-aware, cost-aware, canary, geo-aware, or intelligent strategies.
  • Governance & Safety — configure policies, PII enforcement, guardrails, and budget caps that apply to every call on this API automatically.
  • DVARA Flightdeck — create API keys, rotate provider credentials, and view live traffic, latency, and cost dashboards.