Version: 1.3.0

Structured Outputs

DVARA supports OpenAI-compatible response_format on POST /v1/chat/completions (and on POST /v1/responses, the Responses API). Clients send a single format specification and the gateway transparently translates it to each provider's native mechanism.

Supported Formats

`type`	Description
`text`	Default. No constraint on the response format.
`json_object`	Instructs the model to return valid JSON.
`json_schema`	Instructs the model to return JSON conforming to a specific JSON Schema.

`json_object` Mode

Request the model to return valid JSON:

curl -s -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "List 3 colors as a JSON array."}
    ],
    "response_format": {"type": "json_object"}
  }'

`json_schema` Mode

Request the model to return JSON conforming to a specific schema:

curl -s -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Extract the person name and age from: John is 30 years old."}
    ],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "person",
        "schema": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "age": {"type": "integer"}
          },
          "required": ["name", "age"]
        },
        "strict": true
      }
    }
  }'

Inline schemas vs. server-side validation

DVARA has two independent mechanisms for working with structured outputs. They solve different problems and you can use them together — they are not alternatives:

	Inline `response_format`	Server-side schema validation
What it does	Sent to the upstream provider as a generation constraint (constrained decoding on OpenAI / Azure / Mistral, tool-use rewrite on Anthropic / Bedrock, `responseSchema` on Gemini).	Validates the provider's response against a registered JSON schema after the call returns. Independent of how the request was sent.
Where the schema lives	In the request body. The caller decides per-call.	Registered with the gateway via the Admin API or the Schemas page in the DVARA Flightdeck. Auto-applied to every request whose `model` matches the schema's `modelPattern` glob.
What happens on mismatch	Provider's own behaviour — typically the model retries internally or returns a closest-match string.	Gateway returns HTTP 422 `schema_validation_failed`; the response body lists the JSON-schema validation errors.
When it runs	Before the upstream call (request rewrite).	After the upstream call (response gate).
Best for	Single call sites where the schema is part of the application code.	Centrally-enforced contracts the gateway should police regardless of caller — e.g. ensuring every response from a customer-facing model conforms to a published schema.

The inline response_format example earlier in this page covers the first mechanism. The rest of this section describes the second.

Server-side validation: register a schema once, gateway validates every matching response

# 1. Register the schema. Scope it with `modelPattern` (a glob against the
#    request's `model` field) OR `routeId` (bind to a specific configured
#    route), or both. At least one is required — the API rejects a schema
#    with neither scope with HTTP 400 `invalid_output_schema_scope`.
curl -s -X POST http://localhost:8090/v1/admin/schemas \
  -H "Authorization: Bearer <admin-pat>" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "person-v1",
    "modelPattern": "gpt-4o*",
    "schema": {
      "type": "object",
      "properties": {
        "name": {"type": "string"},
        "age":  {"type": "integer"}
      },
      "required": ["name", "age"]
    },
    "maxRetries": 2,
    "enabled": true
  }'

# 2. Make a normal chat completion. No special syntax in the request body.
#    The gateway sees `model: gpt-4o`, finds the matching schema, and
#    validates the response after the upstream call returns.
curl -s -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "John is 30."}]
  }'

A response that conforms to the schema returns normally. A response that doesn't returns HTTP 422:

{
  "error": {
    "code": "schema_validation_failed",
    "message": "Output schema validation failed: $.age: integer expected, found string"
  }
}

Field	Required	Description
`id`	yes	Stable identifier; referenced in error messages and used as the cache key for the compiled schema.
`modelPattern`	one of `modelPattern` / `routeId` is required	Glob against the request's `model` (e.g. `gpt-4o`, `claude-`). The schema applies to every response whose model matches.
`routeId`	one of `modelPattern` / `routeId` is required	Bind the schema to a specific configured route (matched against `routes.id`). Use this when the policy is "every response coming out of route X must validate against this schema" regardless of which model the request landed on. Combinable with `modelPattern` for both-must-match scoping.
`schema`	yes	A JSON Schema Draft 7 document. Compiled once per schema id and cached.
`maxRetries`	no (default `2`)	Accepted on the schema config, but the gateway does not retry on validation failure — a mismatch returns HTTP 422 immediately.
`correctionPrompt`	no	Accepted on the schema config, but not applied — validation failure returns HTTP 422 immediately rather than re-prompting.
`enabled`	no (default `true`)	Soft-disable a schema without deleting it.

Creating a schema with neither routeId nor modelPattern set fails fast with HTTP 400 invalid_output_schema_scope — a dead-config schema can't silently never-fire on the request path.

Registered schemas can be managed from the Schemas page in the DVARA Flightdeck or through the Admin API. Both edit the same registry — pick whichever fits your workflow.

When to use which:

Most callers should use inline response_format — the schema lives next to the call site, no extra config to keep in sync.
Reach for server-side validation when the policy needs to be enforced regardless of caller (e.g. a published API contract you want the gateway to police), or when you want JSON-schema validation on responses from providers that don't support response_format natively (Ollama, Cohere) — the validation runs after the upstream call, so it works on any provider.

Provider Translation

The gateway translates response_format to each provider's native mechanism. Your application always sends the same OpenAI-format request.

Provider	`json_object`	`json_schema`	`strict` Support
OpenAI	Native passthrough	Native passthrough	Native
Azure OpenAI	Native passthrough	Native passthrough	Native
Mistral	Native passthrough	Native passthrough	Native
Grok	Native passthrough	Native passthrough	Native
Anthropic	System prompt injection	Tool-use rewrite (`structured_output` tool)	Downgraded
Gemini	`generationConfig.responseMimeType` set to `"application/json"`	`generationConfig.responseMimeType` + `responseSchema`	Native *
Bedrock	System message injection	`toolConfig` rewrite (`structured_output` tool)	Downgraded
DeepSeek	Native passthrough	Filtered out by capability-aware routing	N/A
Moonshot	Native passthrough	Filtered out by capability-aware routing	N/A
ChatGLM	Native passthrough	Filtered out by capability-aware routing	N/A
Groq	Native passthrough	Filtered out by capability-aware routing	N/A
Qwen	Filtered out by capability-aware routing	Filtered out by capability-aware routing	N/A
Cohere	Filtered out by capability-aware routing	Filtered out by capability-aware routing	N/A
Ollama	Filtered out by capability-aware routing	Filtered out by capability-aware routing	N/A
Mock	Wraps response in `{"result": …}`	Wraps response in `{"result": …}`	N/A

Gemini and the strict flag

Gemini's API doesn't accept a strict field directly, but responseSchema always enforces structure when json_schema is used. So although the gateway doesn't forward strict: true to Gemini, the net behaviour is the same as a natively-strict provider — schema conformance is enforced by the upstream.

OpenAI-compatible group (OpenAI, Azure OpenAI, Mistral, Grok) shares the same pass-through code path: every field of the client's response_format — including the name, schema, and strict fields of json_schema — is forwarded verbatim in the upstream request body. Use any of these four without special handling on the client side.

Mechanism-rewrite group (Anthropic, Bedrock) does not speak response_format natively, so the gateway rewrites the request into each provider's own structured-output dialect. For json_object both providers prepend a system-prompt instruction:

Respond with valid JSON only. Do not include any text outside the JSON object.

Anthropic appends this instruction (with a leading blank-line break) to the existing system prompt; Bedrock adds the trimmed text as a fresh entry in the system messages list. Either way, the client never sees the rewrite — the assistant response arrives with a normal content field.

json_schema rewrite on Anthropic and Bedrock — the gateway registers a tool named structured_output with input_schema set to the client's schema, sets tool_choice to force the model to call that specific tool, and extracts the tool-use response block back into a plain content text field. The client still receives a standard chat completion envelope; the tool-use round-trip is internal.

Partial-support group (DeepSeek, Moonshot, ChatGLM, Groq) accept json_object natively but don't support json_schema. A request with response_format: json_schema is filtered out of the candidate pool before any upstream call.

Unsupported providers (Qwen, Cohere, Ollama) are filtered out of the candidate pool for both json_object and json_schema. On a route that contains only unsupported providers, the request fails fast with NO_CAPABLE_PROVIDER (HTTP 400). On a mixed route, a capable provider is selected automatically. The filtering is deterministic — you won't accidentally hit a provider that can't honor the format.

Strict Mode Downgrade

strict: true on a json_schema request asks the provider to use constrained decoding — output is guaranteed to conform to the schema. Not every provider supports it. When the request lands on one that doesn't, the gateway downgrades to best-effort and tells you about it.

Provider	What happens with `strict: true`
OpenAI, Azure OpenAI, Mistral, Grok	Forwarded verbatim. Constrained decoding is on — the upstream guarantees the shape.
Gemini	`responseSchema` always enforces structure. Net behaviour matches strict mode (see the note above).
Anthropic, Bedrock	Downgraded. No native constrained decoding; the gateway uses the `structured_output` tool-use rewrite, which is best-effort. The model usually obeys the schema, but stretched type coercion or a missing `required` field is possible.

How to detect a downgrade

When the request lands on Anthropic or Bedrock with strict: true, the response carries:

X-Gateway-Strict-Downgraded: true

The header is added by the upstream provider's response handler and lifted onto the HTTP response by the gateway, so you check it the same way no matter which provider actually handled the call. If the header is absent, the upstream guaranteed strict and no client-side check is needed.

Example: validate client-side when downgraded

import json
import jsonschema
import requests

schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age":  {"type": "integer"}
    },
    "required": ["name", "age"]
}

r = requests.post(
    "http://localhost:8080/v1/chat/completions",
    headers={"Authorization": "Bearer $DVARA_API_KEY"},
    json={
        "model": "claude-sonnet-4-5",
        "messages": [{"role": "user", "content": "John is 30 years old."}],
        "response_format": {
            "type": "json_schema",
            "json_schema": {"name": "person", "schema": schema, "strict": True},
        },
    },
)

content = r.json()["choices"][0]["message"]["content"]

# If the gateway downgraded strict, validate the body ourselves so we
# fail loudly instead of letting a malformed response leak downstream.
if r.headers.get("X-Gateway-Strict-Downgraded") == "true":
    jsonschema.validate(json.loads(content), schema)

# Otherwise the upstream guaranteed schema conformance — no extra check.

When strict really matters, route around the downgrade

If the response feeds a downstream pipeline that breaks on bad shape (a structured ETL step, a regulated audit record, a contract with a partner), don't rely on the downgraded best-effort path. Pin the route to a natively-strict provider:

- id: contract-strict
  model-pattern: "claude-strict"
  pinned-model-version: "gpt-4o-2024-08-06"
  providers:
    - provider: openai

The strict flag then runs on hardware that guarantees it.

Capability-Aware Routing

When response_format is present, the gateway filters the provider pool before routing to ensure only capable providers are considered:

`response_format` type	Required capability
`json_schema`	`supportsStructuredOutputs = true`
`json_object`	`supportsJsonMode = true`
`text` / absent	No filtering applied

No Capable Provider

If no provider on the route supports the requested format, the gateway returns HTTP 400:

{
  "error": {
    "code": "no_capable_provider",
    "type": "invalid_request_error",
    "message": "No provider supports response_format: json_schema. Providers on route: [ollama]. Capable providers: []",
    "trace_id": "a1b2c3d4e5f6789012345678abcdef01"
  }
}

Failover Blocked

If the primary provider fails and no capable fallback exists, the gateway returns HTTP 503 with an X-Gateway-Failover-Blocked: capability_mismatch header:

{
  "error": {
    "code": "failover_capability_mismatch",
    "type": "provider_unavailable",
    "message": "Failover blocked: no fallback provider supports response_format: json_schema. Primary provider [openai] failed: HTTP 500 from upstream",
    "trace_id": "a1b2c3d4e5f6789012345678abcdef01"
  }
}

The text after failed: echoes the primary provider's own error message, so it varies per outage ("HTTP 500 from upstream", "Connection timeout after 30s", "Circuit breaker open", etc.). Don't grep for an exact string; check the code field for routing logic.

This header lets clients distinguish a normal provider outage (retry later) from a capability gap (reconfigure the route or drop the response_format requirement).

Error envelope fields

Every error response carries the same four fields under error:

code — machine-readable identifier (lowercase snake_case). Stable across releases; the field to switch on for routing logic.
type — broad category (invalid_request_error, provider_unavailable, policy_violation, etc.). Useful when grouping errors at the call-site.
message — human-readable description. Free-form, not stable across releases.
trace_id — request trace id (hex). Echoes the X-Trace-ID response header. Use it to find the request in logs.

A param field also appears on validation errors that point at a specific request field; it's omitted otherwise.

Order of operations

A request that uses both inline response_format and a registered server-side schema runs through these stages, in order:

Capability filter — providers that can't honor the requested response_format are dropped from the candidate pool. If the pool ends up empty, the request fails fast with no_capable_provider (HTTP 400).
Provider call — the gateway rewrites response_format per the translation table and dispatches to the chosen provider.
Failover — on provider_error or provider_circuit_open, an alternative on the same route is tried. The capability filter and any data-residency constraints re-apply; an incapable fallback is never picked. If no capable healthy fallback exists, the gateway returns failover_capability_mismatch (HTTP 503) with an X-Gateway-Failover-Blocked: capability_mismatch header.
Server-side schema validation — if a registered schema's modelPattern matches the request's model, the response body is parsed as JSON and validated against the schema. A mismatch returns schema_validation_failed (HTTP 422) — the upstream call is not retried (see maxRetries note above).
Response delivered — including any X-Gateway-Strict-Downgraded: true header lifted from the provider response.

Stages 1–3 fire only when response_format is on the request. Stage 4 fires only when a registered schema matches the model. Stage 5 always runs.

Validation Errors

Condition	HTTP	Error Code
Missing `response_format.type`	400	`invalid_request`
Unknown type (e.g., `xml`)	400	`invalid_request`
`json_schema` without `schema`	400	`invalid_request`
Unsupported provider (e.g., Ollama)	400	`unsupported_response_format`
No capable provider on route	400	`no_capable_provider`
Failover blocked by capability	503	`failover_capability_mismatch`
Registered schema mismatched the response body	422	`schema_validation_failed`
Schema registration with neither `routeId` nor `modelPattern` scope	400	`invalid_output_schema_scope`

Supported Formats​

json_object Mode​

json_schema Mode​

Inline schemas vs. server-side validation​

Server-side validation: register a schema once, gateway validates every matching response​

Provider Translation​

Strict Mode Downgrade​

How to detect a downgrade​

Example: validate client-side when downgraded​

Capability-Aware Routing​

No Capable Provider​

Failover Blocked​

Order of operations​

Validation Errors​