Structured Outputs
DVARA supports OpenAI-compatible response_format on POST /v1/chat/completions. Clients send a single format specification and the gateway transparently translates it to each provider's native mechanism.
Supported Formats
type | Description |
|---|---|
text | Default. No constraint on the response format. |
json_object | Instructs the model to return valid JSON. |
json_schema | Instructs the model to return JSON conforming to a specific JSON Schema. |
json_object Mode
Request the model to return valid JSON:
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "List 3 colors as a JSON array."}
],
"response_format": {"type": "json_object"}
}'
json_schema Mode
Request the model to return JSON conforming to a specific schema:
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Extract the person name and age from: John is 30 years old."}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "person",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
},
"required": ["name", "age"]
},
"strict": true
}
}
}'
Inline schemas vs. server-side validation
DVARA has two independent mechanisms for working with structured outputs. They solve different problems and you can use them together — they are not alternatives:
Inline response_format | Server-side schema validation | |
|---|---|---|
| What it does | Sent to the upstream provider as a generation constraint (constrained decoding on OpenAI / Azure / Mistral, tool-use rewrite on Anthropic / Bedrock, responseSchema on Gemini). | Validates the provider's response against a registered JSON schema after the call returns. Independent of how the request was sent. |
| Where the schema lives | In the request body. The caller decides per-call. | Registered with the gateway via the Admin API or the Schemas page in the DVARA Flightdeck. Auto-applied to every request whose model matches the schema's modelPattern glob. |
| What happens on mismatch | Provider's own behaviour — typically the model retries internally or returns a closest-match string. | Gateway returns HTTP 422 schema_validation_failed; the response body lists the JSON-schema validation errors. |
| When it runs | Before the upstream call (request rewrite). | After the upstream call (response gate). |
| Best for | Single call sites where the schema is part of the application code. | Centrally-enforced contracts the gateway should police regardless of caller — e.g. ensuring every response from a customer-facing model conforms to a published schema. |
The inline response_format example earlier in this page covers the first mechanism. The rest of this section describes the second.
Server-side validation: register a schema once, gateway validates every matching response
# 1. Register the schema. Scope it with `modelPattern` (a glob against the
# request's `model` field) OR `routeId` (bind to a specific configured
# route), or both. At least one is required — the API rejects a schema
# with neither scope with HTTP 400 `invalid_output_schema_scope`.
curl -s -X POST http://localhost:8090/v1/admin/schemas \
-H "Authorization: Bearer <admin-pat>" \
-H "Content-Type: application/json" \
-d '{
"id": "person-v1",
"modelPattern": "gpt-4o*",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
},
"required": ["name", "age"]
},
"maxRetries": 2,
"enabled": true
}'
# 2. Make a normal chat completion. No special syntax in the request body.
# The gateway sees `model: gpt-4o`, finds the matching schema, and
# validates the response after the upstream call returns.
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "John is 30."}]
}'
A response that conforms to the schema returns normally. A response that doesn't returns HTTP 422:
{
"error": {
"code": "schema_validation_failed",
"message": "Output schema validation failed: $.age: integer expected, found string"
}
}
| Field | Required | Description |
|---|---|---|
id | yes | Stable identifier; referenced in error messages and used as the cache key for the compiled schema. |
modelPattern | one of modelPattern / routeId is required | Glob against the request's model (e.g. gpt-4o*, claude-*). The schema applies to every response whose model matches. |
routeId | one of modelPattern / routeId is required | Bind the schema to a specific configured route (matched against routes.id). Use this when the policy is "every response coming out of route X must validate against this schema" regardless of which model the request landed on. Combinable with modelPattern for both-must-match scoping. |
schema | yes | A JSON Schema Draft 7 document. Compiled once per schema id and cached. |
maxRetries | no (default 2) | Reserved for a forthcoming retry-on-mismatch flow; currently the gateway returns 422 immediately on validation failure. |
correctionPrompt | no | Reserved for the same retry flow. |
enabled | no (default true) | Soft-disable a schema without deleting it. |
Creating a schema with neither routeId nor modelPattern set fails fast with HTTP 400 invalid_output_schema_scope — a dead-config schema can't silently never-fire on the request path.
Registered schemas can be managed from the Schemas page in the DVARA Flightdeck or through the Admin API. Both edit the same registry — pick whichever fits your workflow.
When to use which:
- Most callers should use inline
response_format— the schema lives next to the call site, no extra config to keep in sync. - Reach for server-side validation when the policy needs to be enforced regardless of caller (e.g. a published API contract you want the gateway to police), or when you want JSON-schema validation on responses from providers that don't support
response_formatnatively (Ollama, Cohere) — the validation runs after the upstream call, so it works on any provider.
Provider Translation
The gateway translates response_format to each provider's native mechanism. Your application always sends the same OpenAI-format request.
| Provider | json_object | json_schema | strict Support |
|---|---|---|---|
| OpenAI | Native passthrough | Native passthrough | Native |
| Azure OpenAI | Native passthrough | Native passthrough | Native |
| Mistral | Native passthrough | Native passthrough | Native |
| Grok | Native passthrough | Native passthrough | Native |
| Anthropic | System prompt injection | Tool-use rewrite (structured_output tool) | Downgraded |
| Gemini | generationConfig.responseMimeType set to "application/json" | generationConfig.responseMimeType + responseSchema | Native * |
| Bedrock | System message injection | toolConfig rewrite (structured_output tool) | Downgraded |
| DeepSeek | Native passthrough | Filtered out by capability-aware routing | N/A |
| Moonshot | Native passthrough | Filtered out by capability-aware routing | N/A |
| ChatGLM | Native passthrough | Filtered out by capability-aware routing | N/A |
| Groq | Native passthrough | Filtered out by capability-aware routing | N/A |
| Qwen | Filtered out by capability-aware routing | Filtered out by capability-aware routing | N/A |
| Cohere | Filtered out by capability-aware routing | Filtered out by capability-aware routing | N/A |
| Ollama | Filtered out by capability-aware routing | Filtered out by capability-aware routing | N/A |
| Mock | Wraps response in {"result": …} | Wraps response in {"result": …} | N/A |
strict flagGemini's API doesn't accept a strict field directly, but responseSchema always enforces structure when json_schema is used. So although the gateway doesn't forward strict: true to Gemini, the net behaviour is the same as a natively-strict provider — schema conformance is enforced by the upstream.
OpenAI-compatible group (OpenAI, Azure OpenAI, Mistral, Grok) shares the same pass-through code path: every field of the client's response_format — including the name, schema, and strict fields of json_schema — is forwarded verbatim in the upstream request body. Use any of these four without special handling on the client side.
Mechanism-rewrite group (Anthropic, Bedrock) does not speak response_format natively, so the gateway rewrites the request into each provider's own structured-output dialect. For json_object both providers prepend a system-prompt instruction:
Respond with valid JSON only. Do not include any text outside the JSON object.
Anthropic appends this instruction (with a leading blank-line break) to the existing system prompt; Bedrock adds the trimmed text as a fresh entry in the system messages list. Either way, the client never sees the rewrite — the assistant response arrives with a normal content field.
json_schema rewrite on Anthropic and Bedrock — the gateway registers a tool named structured_output with input_schema set to the client's schema, sets tool_choice to force the model to call that specific tool, and extracts the tool-use response block back into a plain content text field. The client still receives a standard chat completion envelope; the tool-use round-trip is internal.
Partial-support group (DeepSeek, Moonshot, ChatGLM, Groq) accept json_object natively but don't support json_schema. A request with response_format: json_schema is filtered out of the candidate pool before any upstream call.
Unsupported providers (Qwen, Cohere, Ollama) are filtered out of the candidate pool for both json_object and json_schema. On a route that contains only unsupported providers, the request fails fast with NO_CAPABLE_PROVIDER (HTTP 400). On a mixed route, a capable provider is selected automatically. The filtering is deterministic — you won't accidentally hit a provider that can't honor the format.
Strict Mode Downgrade
strict: true on a json_schema request asks the provider to use constrained decoding — output is guaranteed to conform to the schema. Not every provider supports it. When the request lands on one that doesn't, the gateway downgrades to best-effort and tells you about it.
| Provider | What happens with strict: true |
|---|---|
| OpenAI, Azure OpenAI, Mistral, Grok | Forwarded verbatim. Constrained decoding is on — the upstream guarantees the shape. |
| Gemini | responseSchema always enforces structure. Net behaviour matches strict mode (see the note above). |
| Anthropic, Bedrock | Downgraded. No native constrained decoding; the gateway uses the structured_output tool-use rewrite, which is best-effort. The model usually obeys the schema, but stretched type coercion or a missing required field is possible. |
How to detect a downgrade
When the request lands on Anthropic or Bedrock with strict: true, the response carries:
X-Gateway-Strict-Downgraded: true
The header is added by the upstream provider's response handler and lifted onto the HTTP response by the gateway, so you check it the same way no matter which provider actually handled the call. If the header is absent, the upstream guaranteed strict and no client-side check is needed.
Example: validate client-side when downgraded
import json
import jsonschema
import requests
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
},
"required": ["name", "age"]
}
r = requests.post(
"http://localhost:8080/v1/chat/completions",
headers={"Authorization": "Bearer $DVARA_API_KEY"},
json={
"model": "claude-sonnet-4-5",
"messages": [{"role": "user", "content": "John is 30 years old."}],
"response_format": {
"type": "json_schema",
"json_schema": {"name": "person", "schema": schema, "strict": True},
},
},
)
content = r.json()["choices"][0]["message"]["content"]
# If the gateway downgraded strict, validate the body ourselves so we
# fail loudly instead of letting a malformed response leak downstream.
if r.headers.get("X-Gateway-Strict-Downgraded") == "true":
jsonschema.validate(json.loads(content), schema)
# Otherwise the upstream guaranteed schema conformance — no extra check.
If the response feeds a downstream pipeline that breaks on bad shape (a structured ETL step, a regulated audit record, a contract with a partner), don't rely on the downgraded best-effort path. Pin the route to a natively-strict provider:
- id: contract-strict
model-pattern: "claude-strict"
pinned-model-version: "gpt-4o-2024-08-06"
providers:
- provider: openai
The strict flag then runs on hardware that guarantees it.
Capability-Aware Routing
When response_format is present, the gateway filters the provider pool before routing to ensure only capable providers are considered:
response_format type | Required capability |
|---|---|
json_schema | supportsStructuredOutputs = true |
json_object | supportsJsonMode = true |
text / absent | No filtering applied |
No Capable Provider
If no provider on the route supports the requested format, the gateway returns HTTP 400:
{
"error": {
"code": "no_capable_provider",
"type": "invalid_request_error",
"message": "No provider supports response_format: json_schema. Providers on route: [ollama]. Capable providers: []",
"trace_id": "a1b2c3d4e5f6789012345678abcdef01"
}
}
Failover Blocked
If the primary provider fails and no capable fallback exists, the gateway returns HTTP 503 with an X-Gateway-Failover-Blocked: capability_mismatch header:
{
"error": {
"code": "failover_capability_mismatch",
"type": "provider_unavailable",
"message": "Failover blocked: no fallback provider supports response_format: json_schema. Primary provider [openai] failed: HTTP 500 from upstream",
"trace_id": "a1b2c3d4e5f6789012345678abcdef01"
}
}
The text after failed: echoes the primary provider's own error message, so it varies per outage ("HTTP 500 from upstream", "Connection timeout after 30s", "Circuit breaker open", etc.). Don't grep for an exact string; check the code field for routing logic.
This header lets clients distinguish a normal provider outage (retry later) from a capability gap (reconfigure the route or drop the response_format requirement).
Every error response carries the same four fields under error:
code— machine-readable identifier (lowercase snake_case). Stable across releases; the field to switch on for routing logic.type— broad category (invalid_request_error,provider_unavailable,policy_violation, etc.). Useful when grouping errors at the call-site.message— human-readable description. Free-form, not stable across releases.trace_id— request trace id (hex). Echoes theX-Trace-IDresponse header. Use it to find the request in logs.
A param field also appears on validation errors that point at a specific request field; it's omitted otherwise.
Order of operations
A request that uses both inline response_format and a registered server-side schema runs through these stages, in order:
- Capability filter — providers that can't honor the requested
response_formatare dropped from the candidate pool. If the pool ends up empty, the request fails fast withno_capable_provider(HTTP 400). - Provider call — the gateway rewrites
response_formatper the translation table and dispatches to the chosen provider. - Failover — on
provider_errororprovider_circuit_open, an alternative on the same route is tried. The capability filter and any data-residency constraints re-apply; an incapable fallback is never picked. If no capable healthy fallback exists, the gateway returnsfailover_capability_mismatch(HTTP 503) with anX-Gateway-Failover-Blocked: capability_mismatchheader. - Server-side schema validation — if a registered schema's
modelPatternmatches the request'smodel, the response body is parsed as JSON and validated against the schema. A mismatch returnsschema_validation_failed(HTTP 422) — the upstream call is not retried (seemaxRetriesnote above). - Response delivered — including any
X-Gateway-Strict-Downgraded: trueheader lifted from the provider response.
Stages 1–3 fire only when response_format is on the request. Stage 4 fires only when a registered schema matches the model. Stage 5 always runs.
Validation Errors
| Condition | HTTP | Error Code |
|---|---|---|
Missing response_format.type | 400 | invalid_request |
Unknown type (e.g., xml) | 400 | invalid_request |
json_schema without schema | 400 | invalid_request |
| Unsupported provider (e.g., Ollama) | 400 | unsupported_response_format |
| No capable provider on route | 400 | no_capable_provider |
| Failover blocked by capability | 503 | failover_capability_mismatch |
| Registered schema mismatched the response body | 422 | schema_validation_failed |
Schema registration with neither routeId nor modelPattern scope | 400 | invalid_output_schema_scope |