Version: 1.3.0

Provider Setup

DVARA ships with fourteen first-class providers plus a built-in Mock provider for CI and local development: OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure OpenAI, Mistral, Cohere, Groq, Ollama, Alibaba Qwen, DeepSeek, Moonshot (Kimi), Zhipu ChatGLM, and xAI Grok. It also routes any other OpenAI-compatible endpoint — Fireworks AI, Together AI, Perplexity, internal corporate proxies — through the same governance pipeline. See Additional OpenAI-compatible providers for that setup.

Each first-class provider is activated lazily at startup. The activation trigger depends on the provider:

Credential-gated (most providers) — registers when its *_API_KEY env var is non-blank (OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, MISTRAL_API_KEY, COHERE_API_KEY, GROQ_API_KEY, QWEN_API_KEY, DEEPSEEK_API_KEY, MOONSHOT_API_KEY, ZHIPU_API_KEY, XAI_API_KEY). Azure additionally requires AZURE_OPENAI_BASE_URL.
Flag-gated — registers when its *_ENABLED boolean is true: BEDROCK_ENABLED, OLLAMA_ENABLED, MOCK_PROVIDER_ENABLED. Note: Bedrock registers on the flag alone — AWS credentials are resolved per request by the SigV4 signer. If you set BEDROCK_ENABLED=true without AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY (or an EKS / EC2 instance role), the provider activates at startup and credential resolution fails per request, not at startup. Ollama is unauthenticated and Mock makes no upstream calls, so neither needs a credential at all.

If a provider is not activated, requests that target it return a clear NO_PROVIDER error (HTTP 400) instead of failing silently.

Environment variables configure a single shared credential

Setting OPENAI_API_KEY (or any provider key) via an environment variable registers one platform-wide credential that all tenants share. This is fine for single-tenant deployments and local development.

For multi-tenant deployments where each tenant must use their own provider key, create per-tenant credentials in DVARA Flightdeck under Credentials — no environment variable needed. Per-tenant credentials take precedence over the environment variable in the resolution chain. See Credentials & BYOK.

Every provider passes through the same governance pipeline before and after the upstream hop — see Platform architecture → Request lifecycle for the full pipeline diagram. Providers differ only in how they talk to the upstream API; everything else is uniform.

This page documents how to enable each provider, what features it supports, and the provider-specific quirks that affect what you can send through it. Credentials can come from environment variables, the DVARA Flightdeck (BYOK per tenant), or a vault — see Credentials & BYOK for the full picture.

All supported providers

Provider	Model prefix	Example models	Notes
OpenAI	`gpt`, `o1`, `o3`, `o4`, `chatgpt`, `text-embedding`	`gpt-4o`, `gpt-4.1`, `o3-mini`, `chatgpt-4o-latest`, `text-embedding-3-small`	Native structured outputs and JSON mode; reasoning + alias model families
Azure OpenAI	`azure/`	`azure/gpt-4o`	Requires both `AZURE_OPENAI_API_KEY` and `AZURE_OPENAI_BASE_URL`
Anthropic	`claude`	`claude-sonnet-4-5`, `claude-3-5-haiku-20241022`	`json_schema` via tool-use rewrite; 200K context
Google Gemini	`gemini`	`gemini-2.0-flash`, `gemini-1.5-pro`	1M token context window; API key in URI
AWS Bedrock	`bedrock/`	`bedrock/anthropic.claude-3-sonnet-…`	SigV4 signing; use IRSA/instance role on AWS
Mistral	`mistral`	`mistral-large-latest`, `mistral-small-latest`	OpenAI-compatible API; no vision
Cohere	`command`	`command-r-plus`, `command-r`	No structured output or JSON mode
Groq	`groq/`	`groq/llama-3.3-70b-versatile`	Ultra-low latency; no `json_schema`
Ollama	`ollama/`	`ollama/llama3.2`	Local inference; no vision, tools, or structured output
Qwen	`qwen`	`qwen2.5-72b-instruct`, `qwen-max`	Alibaba DashScope; conservative v1 capability declaration
DeepSeek	`deepseek`	`deepseek-chat`, `deepseek-reasoner`	Streaming + tools + JSON mode; `json_schema` declared off (reasoning models differ)
Moonshot	`moonshot`	`moonshot-v1-128k`, `moonshot-v1-32k`	200K context — largest of the long-context providers
ChatGLM	`glm`	`glm-4`, `glm-4-air`, `glm-4v-plus`	Vision is model-specific (`glm-4v-*` only); declared off at the provider level
Grok	`grok`	`grok-2-1212`, `grok-2-vision-1212`, `grok-3-latest`	Native structured outputs; vision over-declared at provider level (model-specific)
Mock	`mock/`	`mock/test`	Dev and CI only — no upstream calls
Any other OpenAI-compatible	custom prefix	Fireworks AI, Together AI, Perplexity, internal corporate proxies	Register a custom provider →

OpenAI

Model prefix: gpt for chat models, text-embedding for embeddings.

export OPENAI_API_KEY=sk-your-key

Example models: gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, o1-preview, o1-mini, o3-mini, o4-mini, chatgpt-4o-latest, text-embedding-3-small, text-embedding-3-large. OpenAI's chat catalogue spans three namespaces — classic GPT models (gpt-*), reasoning models (o1-*, o3-*, o4-*), and alias models (chatgpt-*); all three route to OpenAI under the same OPENAI_API_KEY. Embeddings route separately under the text-embedding-* prefix. Call GET /v1/models for the live list of models your deployment can see.

Features: Chat completions, embeddings, streaming, vision, tool calls, structured outputs (native json_schema and json_object).

Notes:

Credentials are read per request, so rotating your OpenAI key in the DVARA Flightdeck takes effect on the next request — no restart.
Chat vs embedding routing is by prefix: gpt* → chat, text-embedding* → embeddings.
response_format: json_schema and response_format: json_object pass through natively — no rewrite.

Azure OpenAI

Azure OpenAI is a separate provider from OpenAI because it uses deployment-based routing and a different auth header. Both api-key and base-url are required — the provider only registers when both are set.

export AZURE_OPENAI_API_KEY=your-azure-key
export AZURE_OPENAI_BASE_URL=https://my-resource.openai.azure.com/openai

Model prefix: azure/. The prefix is stripped before the call, so azure/gpt-4o routes to the deployment named gpt-4o on your Azure resource.

Features: Chat completions, streaming, vision, tool calls, structured outputs (native json_schema and json_object). 128K context.

Notes:

Azure uses the api-key header (not Authorization: Bearer). DVARA handles this automatically.
The base URL typically ends in /openai for the Microsoft-hosted path. Example: https://acme.openai.azure.com/openai.
Endpoint template used under the hood: /deployments/{deployment}/chat/completions?api-version=2024-10-21.
For tenant-scoped BYOK credentials, the platform-wide DVARA_ENCRYPTION_MASTER_PASSWORD env var is required when any credential is stored in ENCRYPTED mode (the default) — it protects every encrypted credential, not just Azure. Zero-trust deployments that only ever create credentials in REFERENCE mode (vault-pointer-only, no secret material at rest) can omit it. See Credentials & BYOK.

Anthropic

Model prefix: claude

export ANTHROPIC_API_KEY=sk-ant-your-key

Example models: claude-sonnet-4-5, claude-3-5-haiku-20241022, claude-3-opus-20240229. Any model starting with claude is routed to Anthropic.

Features: Chat completions, streaming, vision, tool calls, structured outputs (emulated — see notes).

Notes:

system role messages are automatically extracted and passed as Anthropic's separate system field.
If you do not set max_tokens, DVARA defaults it to 1024 — Anthropic requires the field.
response_format: json_object is implemented by injecting a system prompt instruction. The response is still standard text; you parse it as JSON.
response_format: json_schema is implemented by rewriting the call into Anthropic's tool-use API with a synthetic structured_output tool. The tool's input is extracted and returned as the assistant message. This works transparently to the caller — you send OpenAI-format json_schema, you get OpenAI-format output.
If your client sends json_schema with strict: true, the response includes an X-Gateway-Strict-Downgraded: true header — Anthropic cannot natively enforce strict JSON Schema, and DVARA flags that for observability.

Google Gemini

Model prefix: gemini

export GEMINI_API_KEY=AIza-your-key

Example models: gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash.

Features: Chat completions, streaming, vision, tool calls, structured outputs (native responseMimeType and responseSchema). 1 million token context window — the largest of any supported provider.

Notes:

The Gemini REST API puts its API key in the URI query string instead of a header. DVARA resolves the key per request and appends it to the outgoing URL.
json_schema translates to generationConfig.responseMimeType: "application/json" plus generationConfig.responseSchema.
json_object translates to generationConfig.responseMimeType: "application/json" without a schema.
Because the context is 1M tokens, DVARA's input-size guardrails rarely trigger for Gemini with default settings. Tune guardrail.max-input-tokens per tenant if you want tighter limits for a specific workload.

AWS Bedrock

Model prefix: bedrock/. The prefix is stripped before sending the model ID to Bedrock, so any Bedrock-hosted model works.

export BEDROCK_ENABLED=true
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION=us-east-1          # defaults to us-east-1 if unset

Example model IDs:

bedrock/anthropic.claude-3-sonnet-20240229-v1:0
bedrock/amazon.titan-text-express-v1
bedrock/meta.llama3-70b-instruct-v1:0
bedrock/mistral.mistral-7b-instruct-v0:2

Features: Chat completions, streaming, vision, tool calls, structured outputs (emulated via tool-use rewrite for Claude models hosted on Bedrock).

Authentication: AWS SigV4 request signing, computed per request. Credentials are resolved per request, so rotating them in the DVARA Flightdeck or in your vault takes effect on the next call.

Notes:

AWS credentials are required explicitly today — either as the AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY env vars (shown above), or as tenant-scoped credentials registered in the DVARA Flightdeck under Credentials. Setting BEDROCK_ENABLED=true and leaving the access / secret keys blank activates the provider at startup but every request fails with a credential error from the SigV4 signer. The AWS SDK's default credential provider chain (IRSA on EKS, EC2 instance metadata, container credentials) is not supported; an explicit key pair is mandatory on every deployment that uses Bedrock.
response_format: json_schema rewrites the call into Bedrock's toolConfig with a synthetic tool; the tool's input is extracted as the response text.
response_format: json_object injects a system message instructing the model to reply in JSON.
When the client sends json_schema with strict: true, the response includes X-Gateway-Strict-Downgraded: true — Bedrock does not natively enforce strict schemas.

Mistral

Model prefix: mistral

export MISTRAL_API_KEY=your-mistral-key

Example models: mistral-large-latest, mistral-small-latest, mistral-medium-latest. Any model starting with mistral is routed here.

Features: Chat completions, streaming, tool calls, structured outputs (native json_schema and json_object). 128K context.

Notes:

Mistral's public API is OpenAI-compatible, so DVARA talks to it using the standard OpenAI request and response shapes. You do not need a different client.

Limitations: Pixtral vision models (pixtral-12b, pixtral-large) are not supported in this release. See Provider capability gaps for the full list and the safe routing workaround.

Cohere

Model prefix: command

export COHERE_API_KEY=your-cohere-key

Example models: command-r-plus, command-r, command-light. Any model starting with command is routed here.

Features: Chat completions, streaming. 128K context.

Notes:

Cohere uses its v2 chat API at api.cohere.com/v2. DVARA translates OpenAI-style requests to Cohere's format and maps responses back — clients see standard OpenAI format.
Finish reason mapping: COMPLETE → stop, MAX_TOKENS → length.

Limitations: Vision, tool calls, and structured outputs are not supported in this release. See Provider capability gaps for the error codes you'll see and the routing workaround.

Groq

Model prefix: groq/. The prefix is stripped before sending the model name to Groq — groq/llama-3.3-70b-versatile sends llama-3.3-70b-versatile to the API.

export GROQ_API_KEY=your-groq-key

Example models: groq/llama-3.3-70b-versatile, groq/mixtral-8x7b-32768, groq/gemma2-9b-it.

Features: Chat completions, streaming, JSON mode (json_object). 131K context. Groq is well known for ultra-low inference latency on supported models — it's a good choice for interactive UX where wall-clock matters more than cost.

Notes:

Groq's API is OpenAI-compatible. DVARA reuses the standard OpenAI request/response shapes.
On a mixed route, json_schema requests are automatically routed to a capable provider; json_object requests can still go to Groq.

Limitations: Vision, tool calls, and json_schema structured outputs are not supported in this release; json_object mode works. See Provider capability gaps for the full list and the routing workaround.

Ollama

Model prefix: ollama/. The prefix is stripped before sending the model name to the Ollama API.

export OLLAMA_ENABLED=true
export OLLAMA_BASE_URL=http://localhost:11434   # optional, defaults to localhost:11434

Prerequisites: an Ollama server running at OLLAMA_BASE_URL with at least one model pulled:

ollama serve
ollama pull llama3.2

Example request:

curl -s -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "ollama/llama3.2", "messages": [{"role": "user", "content": "Hello!"}]}'

Features: Chat completions, streaming. 32K context.

Notes:

Ollama's endpoints are unauthenticated by design. Do not expose them to the public internet — run the Ollama server on localhost, inside a private network, or behind a separate authenticated reverse proxy.
The provider only registers when OLLAMA_ENABLED=true, so Ollama won't show up in GET /v1/models for deployments that haven't opted in.

Limitations: Vision, tool calls, structured outputs, JSON mode, and embeddings are not supported in this release. See Provider capability gaps for the error codes and the routing workaround. Local-only deployments that need any of those capabilities should pair Ollama with a hosted provider on the same route.

Alibaba Qwen

Model prefix: qwen

export QWEN_API_KEY=your-dashscope-key

Example models: qwen2.5-72b-instruct, qwen2.5-coder-32b-instruct, qwen-max, qwen-plus. Any model starting with qwen routes to the DashScope compatible-mode endpoint at https://dashscope.aliyuncs.com/compatible-mode/v1.

Features: Chat completions, streaming. 32K context.

Notes:

Qwen's catalogue spans text, vision (Qwen-VL family), and reasoning variants. Capabilities are declared conservatively at the provider level — streaming on, everything else off — so capability-aware routing won't dispatch a json_schema or vision request to Qwen and have it fail upstream. Use a route that explicitly targets a Qwen vision or tools-capable model when you need that capability.
The compatible-mode endpoint speaks the OpenAI wire format. The native DashScope API uses a different shape; DVARA doesn't talk to that one.

Limitations: Vision, tool calls, structured outputs, and JSON mode are conservatively-declared off in this release. See Provider capability gaps.

DeepSeek

Model prefix: deepseek

export DEEPSEEK_API_KEY=your-deepseek-key

Example models: deepseek-chat, deepseek-reasoner. Any model starting with deepseek is routed here.

Features: Chat completions, streaming, tool calls, JSON mode. 64K context.

Notes:

DeepSeek's chat models support tools and JSON mode natively; structured outputs (json_schema enforcement) are declared off at the provider level because the reasoning model family (deepseek-reasoner) handles structured output differently from the chat family. Operators who need strict json_schema should route to OpenAI, Gemini, Mistral, or Azure OpenAI.
API base URL is https://api.deepseek.com/v1 — fully OpenAI wire-format compatible.

Limitations: Strict json_schema structured outputs and vision are not declared at the provider level. See Provider capability gaps.

Moonshot AI (Kimi)

Model prefix: moonshot

export MOONSHOT_API_KEY=your-moonshot-key

Example models: moonshot-v1-8k, moonshot-v1-32k, moonshot-v1-128k. The brand is "Kimi" but the API uses the moonshot- prefix.

Features: Chat completions, streaming, tool calls, JSON mode. 200K context — the largest in the OpenAI-compatible provider set.

Notes:

Long context is the headline differentiator. Use Moonshot when the workload needs to feed the model long documents or long conversation histories that exceed the 128K of GPT-4o or the 131K of Groq.
API base URL is https://api.moonshot.cn/v1.

Limitations: Vision and strict structured outputs not first-class. See Provider capability gaps.

Zhipu AI (ChatGLM)

Model prefix: glm

export ZHIPU_API_KEY=your-zhipu-key

Example models: glm-4, glm-4-air, glm-4-flash, glm-4v-plus. The brand is "ChatGLM" but the API uses the glm- prefix.

Features: Chat completions, streaming, tool calls, JSON mode. 128K context.

Notes:

API base URL is https://open.bigmodel.cn/api/paas/v4.

Capabilities by model:

glm-4, glm-4-air, glm-4-flash — text only; chat + tools + streaming + JSON mode.
glm-4v-plus, glm-4v — vision-capable. The provider-level supportsVision is declared off in this release (conservative), so capability-aware routing won't auto-dispatch vision requests to ChatGLM. To use a glm-4v-* model for vision, configure an explicit route that targets it.

Limitations: Vision is conservatively declared off at the provider level. See Provider capability gaps.

Grok (xAI)

Model prefix: grok

export XAI_API_KEY=your-xai-key

Example models: grok-2-1212, grok-2-vision-1212, grok-3-latest, grok-beta.

Features: Chat completions, streaming, vision (model-specific), tool calls, structured outputs (native), JSON mode. 131K context.

Notes:

API base URL is https://api.x.ai/v1. xAI's API is OpenAI-compatible at the wire format; DVARA reuses the standard request/response shapes.

Capabilities by model:

grok-2-1212, grok-3-latest, grok-beta — text only; chat + tools + streaming + structured outputs + JSON mode.
grok-2-vision-1212 — vision-capable. The provider-level supportsVision is declared on because the buyer-signal value of "Grok supports vision" matters for route configuration even though most Grok models don't accept image input. Sending a vision request to a non-vision Grok model gets a PROVIDER_ERROR from the upstream rather than UNSUPPORTED_CAPABILITY from DVARA. Route vision traffic explicitly to grok-2-vision-1212 to avoid the noise.

Limitations: Vision is over-declared at the provider level — the per-model variance is the trade-off. See Provider capability gaps.

Mock Provider

The Mock provider returns configurable fake completions without calling any upstream API — ideal for integration tests, CI pipelines, load tests, and local development without real API keys. Model prefix is mock/.

Full setup, wiremock-style matchers, file-backed scenarios with hot reload, the DVARA Flightdeck authoring UI, Prometheus counters, and audit events are documented on the dedicated Mock Provider page.

Advanced: overriding the base URL

For the common case, you only need environment variables — the gateway's built-in configuration already maps each *_API_KEY env var to the right provider and points at the official upstream endpoint. You never write YAML for a standard OpenAI, Anthropic, Gemini, Mistral, Cohere, Groq, or Azure deployment.

The one time you do need YAML is when you want to override the upstream base URL, for example:

An OpenAI-compatible proxy (corporate LLM gateway, regional endpoint, OpenAI-protocol fork) that you want the OpenAI provider to talk to.
A Mistral / Groq fork hosted on your own infrastructure.
A non-default Ollama address (ollama-servers.internal:11434).

In those cases, create a tiny overrides.yml with just the provider base URL you want to change:

# overrides.yml
dvara:
  llm-gateway:
    providers:
      openai:
        base-url: https://llm.corp.internal/v1    # override — api-key still comes from env
      ollama:
        base-url: http://ollama-servers.internal:11434

Then tell the gateway to load it on top of the bundled config (not in place of it). The three recipes below show how to do that in each deployment topology. All three use the SPRING_CONFIG_ADDITIONAL_LOCATION environment variable so your override layers on the defaults — every other property (credentials, activation, routing, capabilities) keeps working unchanged.

VM / Desktop — bare Java process

Two equivalent options. Either drop the override next to the jar or pass its location on the command line.

Option 1 — drop ./config/application.yml next to the jar:

/opt/dvara/
├── gateway.jar
└── config/
    └── application.yml   ← your overrides.yml contents, renamed

Spring Boot picks up ./config/application.yml automatically — no flags needed.

cd /opt/dvara
export OPENAI_API_KEY=sk-your-key
java -jar gateway.jar

Option 2 — pass an explicit additional location:

Useful when the override file lives outside the jar directory (e.g. managed by a config-management tool).

export OPENAI_API_KEY=sk-your-key
export SPRING_CONFIG_ADDITIONAL_LOCATION=file:/etc/dvara/overrides.yml

java -jar /opt/dvara/gateway.jar

Or as a command-line flag:

java -jar /opt/dvara/gateway.jar \
  --spring.config.additional-location=file:/etc/dvara/overrides.yml

Either form layers the override on top of the bundled application.yml — values in the override win; everything else uses the default.

Docker Compose

Bind-mount the override into the container and point Spring at it.

# docker-compose.yml
services:
  llm-gateway:
    image: ghcr.io/dvarahq/dvara/dvara-llm-gateway:1.3.0
    environment:
      SPRING_CONFIG_ADDITIONAL_LOCATION: file:/config/
      OPENAI_API_KEY: ${OPENAI_API_KEY}
    volumes:
      - ./overrides.yml:/config/application.yml:ro
    ports:
      - "8080:8080"

Create overrides.yml alongside the compose file with the YAML from the top of this section, then bring the stack up:

export OPENAI_API_KEY=sk-your-key
docker compose up -d

The trailing slash on SPRING_CONFIG_ADDITIONAL_LOCATION=file:/config/ tells Spring to scan that directory for any application*.yml — which is why the mount target is /config/application.yml. The bundled application.yml inside the image keeps loading; your file layers on top.

Kubernetes

Store the override in a ConfigMap and mount it as a volume. The Deployment-level env var points Spring at the mount path.

# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: dvara-overrides
data:
  application.yml: |
    dvara:
      llm-gateway:
        providers:
          openai:
            base-url: https://llm.corp.internal/v1
          ollama:
            base-url: http://ollama-servers.internal:11434
---
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dvara-gateway
spec:
  replicas: 2
  selector:
    matchLabels: {app: dvara-gateway}
  template:
    metadata:
      labels: {app: dvara-gateway}
    spec:
      containers:
        - name: gateway
          image: ghcr.io/dvarahq/dvara/dvara-llm-gateway:1.3.0
          ports:
            - {containerPort: 8080}
          env:
            - name: SPRING_CONFIG_ADDITIONAL_LOCATION
              value: file:/config/
            - name: OPENAI_API_KEY
              valueFrom:
                secretKeyRef:
                  name: dvara-provider-keys
                  key: openai
          volumeMounts:
            - name: overrides
              mountPath: /config
              readOnly: true
      volumes:
        - name: overrides
          configMap:
            name: dvara-overrides

Apply:

kubectl apply -f configmap.yaml
kubectl apply -f deployment.yaml

Updating the ConfigMap rolls the override onto pods on the next restart (ConfigMap-as-volume updates are not automatic — trigger a rollout with kubectl rollout restart deployment/dvara-gateway if you change the override). If you use the DVARA Helm chart, set the equivalent value under extraConfig: and the chart wires the ConfigMap and volume mount for you — see the chart docs for the exact key name.

Common mount mistake

Don't mount your override over /workspace/config/application.yml (or wherever the image keeps its bundled application.yml). That replaces the defaults entirely — you lose every property the gateway expected to find, and startup fails with cryptic configuration-resolution errors.

Always mount into a separate directory (/config, /etc/dvara, or similar) and set SPRING_CONFIG_ADDITIONAL_LOCATION=file:/that-directory/. The overrides layer on top of the bundled defaults instead of replacing them. If startup logs show Spring loading zero application config files when you expected one, the mount path is wrong.

See Configuration Reference for the full set of properties you can override from YAML, and the Deployment pages for full topology-specific install guides.

Capabilities Matrix

Provider	Streaming	Vision	Tool Calls	Structured Outputs	JSON Mode	Max Context
OpenAI	yes	yes	yes	yes (native)	yes (native)	128,000
Azure OpenAI	yes	yes	yes	yes (native)	yes (native)	128,000
Anthropic	yes	yes	yes	yes (tool rewrite)	yes (prompt)	200,000
Google Gemini	yes	yes	yes	yes (native)	yes (native)	1,000,000
AWS Bedrock	yes	yes	yes	yes (tool rewrite)	yes (prompt)	200,000
Mistral	yes	no	yes	yes (native)	yes (native)	128,000
Cohere	yes	no	no	no	no	128,000
Groq	yes	no	no	no	yes (native)	131,072
Ollama	yes	no	no	no	no	32,000
Qwen	yes	no¹	no¹	no¹	no¹	32,768
DeepSeek	yes	no	yes	no¹	yes (native)	64,000
Moonshot	yes	no	yes	no	yes (native)	200,000
ChatGLM	yes	no¹	yes	no	yes (native)	128,000
Grok	yes	yes¹	yes	yes (native)	yes (native)	131,072
Mock	yes	no	no	yes (wraps JSON)	yes (wraps JSON)	128,000

¹ Conservative or model-specific declaration. Qwen is declared "no" across the board in this release — per-model verification deferred to follow-up issues. DeepSeek's json_schema is "no" because reasoning models differ from chat models. ChatGLM's vision is "no" at the provider level even though glm-4v-* exists; route to it explicitly. Grok's vision is "yes" at the provider level even though only grok-2-vision-1212 accepts images; route around the non-vision models or accept upstream PROVIDER_ERROR on the rest. See per-provider sections for the per-model breakdown.

Only the Structured Outputs and JSON Mode columns drive capability-aware routing. When a request carries response_format: json_schema or json_object, providers on the route that lack the capability are excluded from the candidate pool before dispatch. If no provider remains, the call fails fast with NO_CAPABLE_PROVIDER (HTTP 400).

Vision and tool calls are not pre-filtered at the routing layer. The request is dispatched to whichever provider the route selects, and the provider rejects unsupported content with UNSUPPORTED_CAPABILITY (HTTP 400) or silently degrades it. For reliable behavior, configure routes that contain only providers capable of the traffic they serve — a vision-only route for vision requests, a tool-capable route for tool traffic.

One caveat: an explicit model prefix overrides routing. If a client sends model: "mistral-large-latest" and the route has Mistral + OpenAI, DVARA routes to Mistral regardless of what the request contains — the explicit prefix wins over capability selection.

For OpenAI-compatible long-tail providers wired via base-URL override (Fireworks AI, Together AI, Perplexity, vLLM hosts, LiteLLM proxies, internal corporate gateways), capabilities depend on the upstream — see Additional OpenAI-compatible providers.

See Routing for how this interacts with failover, latency-aware routing, and canary splits. See Multimodal & Vision for request examples and a worked route setup for vision traffic.

Provider capability gaps in this release

Some upstream provider features are not implemented in this release. The table below is the single source of truth for what's missing per provider; the per-provider sections above link here rather than restating it. The governance pipeline (policy, PII, audit, budget, guardrails) applies to every request regardless of which capability the upstream supports — gaps are about how DVARA talks to the upstream, not about what governance covers.

The columns:

Capability — the upstream feature.
Status — what DVARA does today when the request reaches that provider.
Workaround — the safe configuration to use until the gap is closed.

Provider	Capability	Status in this release	Workaround
Mistral	Vision (Pixtral 12B, Pixtral Large)	Not supported. Mistral's text models are wired up; Pixtral is not. A vision request dispatched to Mistral has its image blocks serialized as garbage text — the model responds to the text portion only, with no error surfaced.	Use a vision-capable provider (OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure OpenAI) for vision traffic. See Multimodal & Vision.
Cohere	Vision	Not supported. Sending image content returns `UNSUPPORTED_CAPABILITY` (HTTP 400).	Use a vision-capable provider (see above).
Cohere	Tool calls	Not supported. Sending tool-use or tool-result content blocks returns `UNSUPPORTED_CAPABILITY` (HTTP 400).	Use a tool-capable provider (OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure OpenAI, Mistral).
Cohere	Structured outputs / JSON mode	Not supported. Sending `response_format: json_schema` or `json_object` to a Cohere-only route returns `NO_CAPABLE_PROVIDER` (HTTP 400) before dispatch.	Configure a route that includes a structured-outputs-capable provider; capability-aware routing automatically excludes Cohere from the candidate pool.
Groq	Vision	Not supported. Sending image content returns `UNSUPPORTED_CAPABILITY` (HTTP 400).	Use a vision-capable provider.
Groq	Tool calls	Not supported. Sending tool-use or tool-result content blocks returns `UNSUPPORTED_CAPABILITY` (HTTP 400).	Use a tool-capable provider.
Groq	Structured outputs (`json_schema`)	Not supported. Sending `response_format: json_schema` to a Groq-only route returns `NO_CAPABLE_PROVIDER` (HTTP 400). `json_object` mode works.	Use `response_format: json_object` if Groq's ultra-low latency matters and the schema is enforced client-side, or route to OpenAI / Gemini / Mistral / Azure for `json_schema`.
Ollama	Vision	Not supported. Sending image content returns `UNSUPPORTED_CAPABILITY` (HTTP 400).	Use a vision-capable provider.
Ollama	Tool calls	Not supported. Sending tool-use or tool-result content blocks returns `UNSUPPORTED_CAPABILITY` (HTTP 400).	Use a tool-capable provider.
Ollama	Structured outputs / JSON mode	Not supported. Sending `response_format: json_schema` or `json_object` returns `NO_CAPABLE_PROVIDER` (HTTP 400).	Route structured-output traffic to a capable provider. Local-only deployments that require structured outputs should pair Ollama with a hosted vision/structured provider on the same route.
Ollama	Embeddings	Not supported. Embedding requests with an `ollama/` model return `NO_PROVIDER` (HTTP 400).	Use OpenAI's `text-embedding-3-small` / `text-embedding-3-large` models, which DVARA routes through the OpenAI provider.

The fail-safe configuration is always a route that contains only providers capable of the traffic it serves. A vision-only route never dispatches a vision request to a non-vision provider; a tool-capable-only route never dispatches a tool call to Cohere. See Routing for how to compose these routes and Multimodal & Vision for a worked vision-route example.

Next steps

Routing — route by model, latency, cost, weight, or region, with canary splits and shadow traffic.
Credentials & BYOK — tenant-scoped provider keys, rotation, and vault integration.
Multi-Tenancy — how DVARA resolves the tenant from an API key and cascades per-tenant configuration.
Multimodal & Vision — sending images through the gateway, supported providers, and route configuration for vision traffic.
API — interactive OpenAPI reference for every /v1/* endpoint, with a configurable Try it panel.

All supported providers​

OpenAI​

Azure OpenAI​

Anthropic​

Google Gemini​

AWS Bedrock​

Mistral​

Cohere​

Groq​

Ollama​

Alibaba Qwen​

DeepSeek​

Moonshot AI (Kimi)​

Zhipu AI (ChatGLM)​

Grok (xAI)​

Mock Provider​

Advanced: overriding the base URL​

VM / Desktop — bare Java process​

Docker Compose​

Kubernetes​

Capabilities Matrix​

Provider capability gaps in this release​

Next steps​

All supported providers

OpenAI

Azure OpenAI

Anthropic

Google Gemini

AWS Bedrock

Mistral

Cohere

Groq

Ollama

Alibaba Qwen

DeepSeek

Moonshot AI (Kimi)

Zhipu AI (ChatGLM)

Grok (xAI)

Mock Provider

Advanced: overriding the base URL

VM / Desktop — bare Java process

Docker Compose

Kubernetes

Capabilities Matrix

Provider capability gaps in this release

Next steps