Provider Setup
DVARA ships with fourteen first-class providers plus a built-in Mock provider for CI and local development: OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure OpenAI, Mistral, Cohere, Groq, Ollama, Alibaba Qwen, DeepSeek, Moonshot (Kimi), Zhipu ChatGLM, and xAI Grok. It also routes any other OpenAI-compatible endpoint — Fireworks AI, Together AI, Perplexity, internal corporate proxies — through the same governance pipeline. See Additional OpenAI-compatible providers for that setup.
Each first-class provider is activated lazily at startup. The activation trigger depends on the provider:
- Credential-gated (most providers) — registers when its
*_API_KEYenv var is non-blank (OPENAI_API_KEY,ANTHROPIC_API_KEY,GEMINI_API_KEY,MISTRAL_API_KEY,COHERE_API_KEY,GROQ_API_KEY,QWEN_API_KEY,DEEPSEEK_API_KEY,MOONSHOT_API_KEY,ZHIPU_API_KEY,XAI_API_KEY). Azure additionally requiresAZURE_OPENAI_BASE_URL. - Flag-gated — registers when its
*_ENABLEDboolean istrue:BEDROCK_ENABLED,OLLAMA_ENABLED,MOCK_PROVIDER_ENABLED. Note: Bedrock registers on the flag alone — AWS credentials are resolved per request by the SigV4 signer. If you setBEDROCK_ENABLED=truewithoutAWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY(or an EKS / EC2 instance role), the provider activates at startup and credential resolution fails per request, not at startup. Ollama is unauthenticated and Mock makes no upstream calls, so neither needs a credential at all.
If a provider is not activated, requests that target it return a clear NO_PROVIDER error (HTTP 400) instead of failing silently.
Setting OPENAI_API_KEY (or any provider key) via an environment variable registers one platform-wide credential that all tenants share. This is fine for single-tenant deployments and local development.
For multi-tenant deployments where each tenant must use their own provider key, create per-tenant credentials in DVARA Flightdeck under Credentials — no environment variable needed. Per-tenant credentials take precedence over the environment variable in the resolution chain. See Credentials & BYOK.
Every provider passes through the same governance pipeline before and after the upstream hop — see Platform architecture → Request lifecycle for the full pipeline diagram. Providers differ only in how they talk to the upstream API; everything else is uniform.
This page documents how to enable each provider, what features it supports, and the provider-specific quirks that affect what you can send through it. Credentials can come from environment variables, the DVARA Flightdeck (BYOK per tenant), or a vault — see Credentials & BYOK for the full picture.
All supported providers
| Provider | Model prefix | Example models | Notes |
|---|---|---|---|
| OpenAI | gpt, o1, o3, o4, chatgpt, text-embedding | gpt-4o, gpt-4.1, o3-mini, chatgpt-4o-latest, text-embedding-3-small | Native structured outputs and JSON mode; reasoning + alias model families |
| Azure OpenAI | azure/ | azure/gpt-4o | Requires both AZURE_OPENAI_API_KEY and AZURE_OPENAI_BASE_URL |
| Anthropic | claude | claude-sonnet-4-5, claude-3-5-haiku-20241022 | json_schema via tool-use rewrite; 200K context |
| Google Gemini | gemini | gemini-2.0-flash, gemini-1.5-pro | 1M token context window; API key in URI |
| AWS Bedrock | bedrock/ | bedrock/anthropic.claude-3-sonnet-… | SigV4 signing; use IRSA/instance role on AWS |
| Mistral | mistral | mistral-large-latest, mistral-small-latest | OpenAI-compatible API; no vision |
| Cohere | command | command-r-plus, command-r | No structured output or JSON mode |
| Groq | groq/ | groq/llama-3.3-70b-versatile | Ultra-low latency; no json_schema |
| Ollama | ollama/ | ollama/llama3.2 | Local inference; no vision, tools, or structured output |
| Qwen | qwen | qwen2.5-72b-instruct, qwen-max | Alibaba DashScope; conservative v1 capability declaration |
| DeepSeek | deepseek | deepseek-chat, deepseek-reasoner | Streaming + tools + JSON mode; json_schema declared off (reasoning models differ) |
| Moonshot | moonshot | moonshot-v1-128k, moonshot-v1-32k | 200K context — largest of the long-context providers |
| ChatGLM | glm | glm-4, glm-4-air, glm-4v-plus | Vision is model-specific (glm-4v-* only); declared off at the provider level |
| Grok | grok | grok-2-1212, grok-2-vision-1212, grok-3-latest | Native structured outputs; vision over-declared at provider level (model-specific) |
| Mock | mock/ | mock/test | Dev and CI only — no upstream calls |
| Any other OpenAI-compatible | custom prefix | Fireworks AI, Together AI, Perplexity, internal corporate proxies | Register a custom provider → |
OpenAI
Model prefix: gpt for chat models, text-embedding for embeddings.
export OPENAI_API_KEY=sk-your-key
Example models: gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, o1-preview, o1-mini, o3-mini, o4-mini, chatgpt-4o-latest, text-embedding-3-small, text-embedding-3-large. OpenAI's chat catalogue spans three namespaces — classic GPT models (gpt-*), reasoning models (o1-*, o3-*, o4-*), and alias models (chatgpt-*); all three route to OpenAI under the same OPENAI_API_KEY. Embeddings route separately under the text-embedding-* prefix. Call GET /v1/models for the live list of models your deployment can see.
Features: Chat completions, embeddings, streaming, vision, tool calls, structured outputs (native json_schema and json_object).
Notes:
- Credentials are read per request, so rotating your OpenAI key in the DVARA Flightdeck takes effect on the next request — no restart.
- Chat vs embedding routing is by prefix:
gpt*→ chat,text-embedding*→ embeddings. response_format: json_schemaandresponse_format: json_objectpass through natively — no rewrite.
Azure OpenAI
Azure OpenAI is a separate provider from OpenAI because it uses deployment-based routing and a different auth header. Both api-key and base-url are required — the provider only registers when both are set.
export AZURE_OPENAI_API_KEY=your-azure-key
export AZURE_OPENAI_BASE_URL=https://my-resource.openai.azure.com/openai
Model prefix: azure/. The prefix is stripped before the call, so azure/gpt-4o routes to the deployment named gpt-4o on your Azure resource.
Features: Chat completions, streaming, vision, tool calls, structured outputs (native json_schema and json_object). 128K context.
Notes:
- Azure uses the
api-keyheader (notAuthorization: Bearer). DVARA handles this automatically. - The base URL typically ends in
/openaifor the Microsoft-hosted path. Example:https://acme.openai.azure.com/openai. - Endpoint template used under the hood:
/deployments/{deployment}/chat/completions?api-version=2024-10-21. - For tenant-scoped BYOK credentials, the platform-wide
DVARA_ENCRYPTION_MASTER_PASSWORDenv var is required when any credential is stored inENCRYPTEDmode (the default) — it protects every encrypted credential, not just Azure. Zero-trust deployments that only ever create credentials inREFERENCEmode (vault-pointer-only, no secret material at rest) can omit it. See Credentials & BYOK.
Anthropic
Model prefix: claude
export ANTHROPIC_API_KEY=sk-ant-your-key
Example models: claude-sonnet-4-5, claude-3-5-haiku-20241022, claude-3-opus-20240229. Any model starting with claude is routed to Anthropic.
Features: Chat completions, streaming, vision, tool calls, structured outputs (emulated — see notes).
Notes:
systemrole messages are automatically extracted and passed as Anthropic's separatesystemfield.- If you do not set
max_tokens, DVARA defaults it to1024— Anthropic requires the field. response_format: json_objectis implemented by injecting a system prompt instruction. The response is still standard text; you parse it as JSON.response_format: json_schemais implemented by rewriting the call into Anthropic's tool-use API with a syntheticstructured_outputtool. The tool's input is extracted and returned as the assistant message. This works transparently to the caller — you send OpenAI-formatjson_schema, you get OpenAI-format output.- If your client sends
json_schemawithstrict: true, the response includes anX-Gateway-Strict-Downgraded: trueheader — Anthropic cannot natively enforce strict JSON Schema, and DVARA flags that for observability.
Google Gemini
Model prefix: gemini
export GEMINI_API_KEY=AIza-your-key
Example models: gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash.
Features: Chat completions, streaming, vision, tool calls, structured outputs (native responseMimeType and responseSchema). 1 million token context window — the largest of any supported provider.
Notes:
- The Gemini REST API puts its API key in the URI query string instead of a header. DVARA resolves the key per request and appends it to the outgoing URL.
json_schematranslates togenerationConfig.responseMimeType: "application/json"plusgenerationConfig.responseSchema.json_objecttranslates togenerationConfig.responseMimeType: "application/json"without a schema.- Because the context is 1M tokens, DVARA's input-size guardrails rarely trigger for Gemini with default settings. Tune
guardrail.max-input-tokensper tenant if you want tighter limits for a specific workload.
AWS Bedrock
Model prefix: bedrock/. The prefix is stripped before sending the model ID to Bedrock, so any Bedrock-hosted model works.
export BEDROCK_ENABLED=true
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION=us-east-1 # defaults to us-east-1 if unset
Example model IDs:
bedrock/anthropic.claude-3-sonnet-20240229-v1:0
bedrock/amazon.titan-text-express-v1
bedrock/meta.llama3-70b-instruct-v1:0
bedrock/mistral.mistral-7b-instruct-v0:2
Features: Chat completions, streaming, vision, tool calls, structured outputs (emulated via tool-use rewrite for Claude models hosted on Bedrock).
Authentication: AWS SigV4 request signing, computed per request. Credentials are resolved per request, so rotating them in the DVARA Flightdeck or in your vault takes effect on the next call.
Notes:
- AWS credentials are required explicitly today — either as the
AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEYenv vars (shown above), or as tenant-scoped credentials registered in the DVARA Flightdeck under Credentials. SettingBEDROCK_ENABLED=trueand leaving the access / secret keys blank activates the provider at startup but every request fails with a credential error from the SigV4 signer. Support for the AWS SDK's default credential provider chain (IRSA on EKS, EC2 instance metadata, container credentials) is on the roadmap; until it lands, an explicit key pair is mandatory on every deployment that uses Bedrock. response_format: json_schemarewrites the call into Bedrock'stoolConfigwith a synthetic tool; the tool's input is extracted as the response text.response_format: json_objectinjects a system message instructing the model to reply in JSON.- When the client sends
json_schemawithstrict: true, the response includesX-Gateway-Strict-Downgraded: true— Bedrock does not natively enforce strict schemas.
Mistral
Model prefix: mistral
export MISTRAL_API_KEY=your-mistral-key
Example models: mistral-large-latest, mistral-small-latest, mistral-medium-latest. Any model starting with mistral is routed here.
Features: Chat completions, streaming, tool calls, structured outputs (native json_schema and json_object). 128K context.
Notes:
- Mistral's public API is OpenAI-compatible, so DVARA talks to it using the standard OpenAI request and response shapes. You do not need a different client.
Limitations: Pixtral vision models (pixtral-12b, pixtral-large) are not supported in this release. See Provider capability gaps for the full list and the safe routing workaround.
Cohere
Model prefix: command
export COHERE_API_KEY=your-cohere-key
Example models: command-r-plus, command-r, command-light. Any model starting with command is routed here.
Features: Chat completions, streaming. 128K context.
Notes:
- Cohere uses its v2 chat API at
api.cohere.com/v2. DVARA translates OpenAI-style requests to Cohere's format and maps responses back — clients see standard OpenAI format. - Finish reason mapping:
COMPLETE→stop,MAX_TOKENS→length.
Limitations: Vision, tool calls, and structured outputs are not supported in this release. See Provider capability gaps for the error codes you'll see and the routing workaround.
Groq
Model prefix: groq/. The prefix is stripped before sending the model name to Groq — groq/llama-3.3-70b-versatile sends llama-3.3-70b-versatile to the API.
export GROQ_API_KEY=your-groq-key
Example models: groq/llama-3.3-70b-versatile, groq/mixtral-8x7b-32768, groq/gemma2-9b-it.
Features: Chat completions, streaming, JSON mode (json_object). 131K context. Groq is well known for ultra-low inference latency on supported models — it's a good choice for interactive UX where wall-clock matters more than cost.
Notes:
- Groq's API is OpenAI-compatible. DVARA reuses the standard OpenAI request/response shapes.
- On a mixed route,
json_schemarequests are automatically routed to a capable provider;json_objectrequests can still go to Groq.
Limitations: Vision, tool calls, and json_schema structured outputs are not supported in this release; json_object mode works. See Provider capability gaps for the full list and the routing workaround.
Ollama
Model prefix: ollama/. The prefix is stripped before sending the model name to the Ollama API.
export OLLAMA_ENABLED=true
export OLLAMA_BASE_URL=http://localhost:11434 # optional, defaults to localhost:11434
Prerequisites: an Ollama server running at OLLAMA_BASE_URL with at least one model pulled:
ollama serve
ollama pull llama3.2
Example request:
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "ollama/llama3.2", "messages": [{"role": "user", "content": "Hello!"}]}'
Features: Chat completions, streaming. 32K context.
Notes:
- Ollama's endpoints are unauthenticated by design. Do not expose them to the public internet — run the Ollama server on localhost, inside a private network, or behind a separate authenticated reverse proxy.
- The provider only registers when
OLLAMA_ENABLED=true, so Ollama won't show up inGET /v1/modelsfor deployments that haven't opted in.
Limitations: Vision, tool calls, structured outputs, JSON mode, and embeddings are not supported in this release. See Provider capability gaps for the error codes and the routing workaround. Local-only deployments that need any of those capabilities should pair Ollama with a hosted provider on the same route.
Alibaba Qwen
Model prefix: qwen
export QWEN_API_KEY=your-dashscope-key
Example models: qwen2.5-72b-instruct, qwen2.5-coder-32b-instruct, qwen-max, qwen-plus. Any model starting with qwen routes to the DashScope compatible-mode endpoint at https://dashscope.aliyuncs.com/compatible-mode/v1.
Features: Chat completions, streaming. 32K context.
Notes:
- Qwen's catalogue spans text, vision (Qwen-VL family), and reasoning variants. Capabilities are declared conservatively at the provider level — streaming on, everything else off — so capability-aware routing won't dispatch a
json_schemaor vision request to Qwen and have it fail upstream. Use a route that explicitly targets a Qwen vision or tools-capable model when you need that capability. - The compatible-mode endpoint speaks the OpenAI wire format. The native DashScope API uses a different shape; DVARA doesn't talk to that one.
Limitations: Vision, tool calls, structured outputs, and JSON mode are conservatively-declared off in this release. See Provider capability gaps.
DeepSeek
Model prefix: deepseek
export DEEPSEEK_API_KEY=your-deepseek-key
Example models: deepseek-chat, deepseek-reasoner. Any model starting with deepseek is routed here.
Features: Chat completions, streaming, tool calls, JSON mode. 64K context.
Notes:
- DeepSeek's chat models support tools and JSON mode natively; structured outputs (
json_schemaenforcement) are declared off at the provider level because the reasoning model family (deepseek-reasoner) handles structured output differently from the chat family. Operators who need strictjson_schemashould route to OpenAI, Gemini, Mistral, or Azure OpenAI. - API base URL is
https://api.deepseek.com/v1— fully OpenAI wire-format compatible.
Limitations: Strict json_schema structured outputs and vision are not declared at the provider level. See Provider capability gaps.
Moonshot AI (Kimi)
Model prefix: moonshot
export MOONSHOT_API_KEY=your-moonshot-key
Example models: moonshot-v1-8k, moonshot-v1-32k, moonshot-v1-128k. The brand is "Kimi" but the API uses the moonshot- prefix.
Features: Chat completions, streaming, tool calls, JSON mode. 200K context — the largest in the OpenAI-compatible provider set.
Notes:
- Long context is the headline differentiator. Use Moonshot when the workload needs to feed the model long documents or long conversation histories that exceed the 128K of GPT-4o or the 131K of Groq.
- API base URL is
https://api.moonshot.cn/v1.
Limitations: Vision and strict structured outputs not first-class. See Provider capability gaps.
Zhipu AI (ChatGLM)
Model prefix: glm
export ZHIPU_API_KEY=your-zhipu-key
Example models: glm-4, glm-4-air, glm-4-flash, glm-4v-plus. The brand is "ChatGLM" but the API uses the glm- prefix.
Features: Chat completions, streaming, tool calls, JSON mode. 128K context.
Notes:
- API base URL is
https://open.bigmodel.cn/api/paas/v4.
Capabilities by model:
glm-4,glm-4-air,glm-4-flash— text only; chat + tools + streaming + JSON mode.glm-4v-plus,glm-4v— vision-capable. The provider-levelsupportsVisionis declared off in this release (conservative), so capability-aware routing won't auto-dispatch vision requests to ChatGLM. To use aglm-4v-*model for vision, configure an explicit route that targets it.
Limitations: Vision is conservatively declared off at the provider level. See Provider capability gaps.
Grok (xAI)
Model prefix: grok
export XAI_API_KEY=your-xai-key
Example models: grok-2-1212, grok-2-vision-1212, grok-3-latest, grok-beta.
Features: Chat completions, streaming, vision (model-specific), tool calls, structured outputs (native), JSON mode. 131K context.
Notes:
- API base URL is
https://api.x.ai/v1. xAI's API is OpenAI-compatible at the wire format; DVARA reuses the standard request/response shapes.
Capabilities by model:
grok-2-1212,grok-3-latest,grok-beta— text only; chat + tools + streaming + structured outputs + JSON mode.grok-2-vision-1212— vision-capable. The provider-levelsupportsVisionis declared on because the buyer-signal value of "Grok supports vision" matters for route configuration even though most Grok models don't accept image input. Sending a vision request to a non-vision Grok model gets aPROVIDER_ERRORfrom the upstream rather thanUNSUPPORTED_CAPABILITYfrom DVARA. Route vision traffic explicitly togrok-2-vision-1212to avoid the noise.
Limitations: Vision is over-declared at the provider level — the per-model variance is the trade-off. See Provider capability gaps.
Mock Provider
The Mock provider returns configurable fake completions without calling any upstream API — ideal for integration tests, CI pipelines, load tests, and local development without real API keys. Model prefix is mock/.
Full setup, wiremock-style matchers, file-backed scenarios with hot reload, the DVARA Flightdeck authoring UI, Prometheus counters, and audit events are documented on the dedicated Mock Provider page.
Advanced: overriding the base URL
For the common case, you only need environment variables — the gateway's built-in configuration already maps each *_API_KEY env var to the right provider and points at the official upstream endpoint. You never write YAML for a standard OpenAI, Anthropic, Gemini, Mistral, Cohere, Groq, or Azure deployment.
The one time you do need YAML is when you want to override the upstream base URL, for example:
- An OpenAI-compatible proxy (corporate LLM gateway, regional endpoint, OpenAI-protocol fork) that you want the OpenAI provider to talk to.
- A Mistral / Groq fork hosted on your own infrastructure.
- A non-default Ollama address (
ollama-servers.internal:11434).
In those cases, create a tiny overrides.yml with just the provider base URL you want to change:
# overrides.yml
dvara:
llm-gateway:
providers:
openai:
base-url: https://llm.corp.internal/v1 # override — api-key still comes from env
ollama:
base-url: http://ollama-servers.internal:11434
Then tell the gateway to load it on top of the bundled config (not in place of it). The three recipes below show how to do that in each deployment topology. All three use the SPRING_CONFIG_ADDITIONAL_LOCATION environment variable so your override layers on the defaults — every other property (credentials, activation, routing, capabilities) keeps working unchanged.
VM / Desktop — bare Java process
Two equivalent options. Either drop the override next to the jar or pass its location on the command line.
Option 1 — drop ./config/application.yml next to the jar:
/opt/dvara/
├── gateway.jar
└── config/
└── application.yml ← your overrides.yml contents, renamed
Spring Boot picks up ./config/application.yml automatically — no flags needed.
cd /opt/dvara
export OPENAI_API_KEY=sk-your-key
java -jar gateway.jar
Option 2 — pass an explicit additional location:
Useful when the override file lives outside the jar directory (e.g. managed by a config-management tool).
export OPENAI_API_KEY=sk-your-key
export SPRING_CONFIG_ADDITIONAL_LOCATION=file:/etc/dvara/overrides.yml
java -jar /opt/dvara/gateway.jar
Or as a command-line flag:
java -jar /opt/dvara/gateway.jar \
--spring.config.additional-location=file:/etc/dvara/overrides.yml
Either form layers the override on top of the bundled application.yml — values in the override win; everything else uses the default.
Docker Compose
Bind-mount the override into the container and point Spring at it.
# docker-compose.yml
services:
gateway:
image: ghcr.io/dvarahq/dvara/dvara-llm-gateway:latest
environment:
SPRING_CONFIG_ADDITIONAL_LOCATION: file:/config/
OPENAI_API_KEY: ${OPENAI_API_KEY}
volumes:
- ./overrides.yml:/config/application.yml:ro
ports:
- "8080:8080"
Create overrides.yml alongside the compose file with the YAML from the top of this section, then bring the stack up:
export OPENAI_API_KEY=sk-your-key
docker compose up -d
The trailing slash on SPRING_CONFIG_ADDITIONAL_LOCATION=file:/config/ tells Spring to scan that directory for any application*.yml — which is why the mount target is /config/application.yml. The bundled application.yml inside the image keeps loading; your file layers on top.
Kubernetes
Store the override in a ConfigMap and mount it as a volume. The Deployment-level env var points Spring at the mount path.
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: dvara-overrides
data:
application.yml: |
dvara:
llm-gateway:
providers:
openai:
base-url: https://llm.corp.internal/v1
ollama:
base-url: http://ollama-servers.internal:11434
---
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: dvara-gateway
spec:
replicas: 2
selector:
matchLabels: {app: dvara-gateway}
template:
metadata:
labels: {app: dvara-gateway}
spec:
containers:
- name: gateway
image: ghcr.io/dvarahq/dvara/dvara-llm-gateway:latest
ports:
- {containerPort: 8080}
env:
- name: SPRING_CONFIG_ADDITIONAL_LOCATION
value: file:/config/
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: dvara-provider-keys
key: openai
volumeMounts:
- name: overrides
mountPath: /config
readOnly: true
volumes:
- name: overrides
configMap:
name: dvara-overrides
Apply:
kubectl apply -f configmap.yaml
kubectl apply -f deployment.yaml
Updating the ConfigMap rolls the override onto pods on the next restart (ConfigMap-as-volume updates are not automatic — trigger a rollout with kubectl rollout restart deployment/dvara-gateway if you change the override). If you use the DVARA Helm chart, set the equivalent value under extraConfig: and the chart wires the ConfigMap and volume mount for you — see the chart docs for the exact key name.
Don't mount your override over /workspace/config/application.yml (or wherever the image keeps its bundled application.yml). That replaces the defaults entirely — you lose every property the gateway expected to find, and startup fails with cryptic configuration-resolution errors.
Always mount into a separate directory (/config, /etc/dvara, or similar) and set SPRING_CONFIG_ADDITIONAL_LOCATION=file:/that-directory/. The overrides layer on top of the bundled defaults instead of replacing them. If startup logs show Spring loading zero application config files when you expected one, the mount path is wrong.
See Configuration Reference for the full set of properties you can override from YAML, and the Deployment pages for full topology-specific install guides.
Capabilities Matrix
| Provider | Streaming | Vision | Tool Calls | Structured Outputs | JSON Mode | Max Context |
|---|---|---|---|---|---|---|
| OpenAI | yes | yes | yes | yes (native) | yes (native) | 128,000 |
| Azure OpenAI | yes | yes | yes | yes (native) | yes (native) | 128,000 |
| Anthropic | yes | yes | yes | yes (tool rewrite) | yes (prompt) | 200,000 |
| Google Gemini | yes | yes | yes | yes (native) | yes (native) | 1,000,000 |
| AWS Bedrock | yes | yes | yes | yes (tool rewrite) | yes (prompt) | 200,000 |
| Mistral | yes | no | yes | yes (native) | yes (native) | 128,000 |
| Cohere | yes | no | no | no | no | 128,000 |
| Groq | yes | no | no | no | yes (native) | 131,072 |
| Ollama | yes | no | no | no | no | 32,000 |
| Qwen | yes | no¹ | no¹ | no¹ | no¹ | 32,768 |
| DeepSeek | yes | no | yes | no¹ | yes (native) | 64,000 |
| Moonshot | yes | no | yes | no | yes (native) | 200,000 |
| ChatGLM | yes | no¹ | yes | no | yes (native) | 128,000 |
| Grok | yes | yes¹ | yes | yes (native) | yes (native) | 131,072 |
| Mock | yes | no | no | yes (wraps JSON) | yes (wraps JSON) | 128,000 |
¹ Conservative or model-specific declaration. Qwen is declared "no" across the board in this release — per-model verification deferred to follow-up issues. DeepSeek's json_schema is "no" because reasoning models differ from chat models. ChatGLM's vision is "no" at the provider level even though glm-4v-* exists; route to it explicitly. Grok's vision is "yes" at the provider level even though only grok-2-vision-1212 accepts images; route around the non-vision models or accept upstream PROVIDER_ERROR on the rest. See per-provider sections for the per-model breakdown.
Only the Structured Outputs and JSON Mode columns drive capability-aware routing. When a request carries response_format: json_schema or json_object, providers on the route that lack the capability are excluded from the candidate pool before dispatch. If no provider remains, the call fails fast with NO_CAPABLE_PROVIDER (HTTP 400).
Vision and tool calls are not pre-filtered at the routing layer. The request is dispatched to whichever provider the route selects, and the provider rejects unsupported content with UNSUPPORTED_CAPABILITY (HTTP 400) or silently degrades it. For reliable behavior, configure routes that contain only providers capable of the traffic they serve — a vision-only route for vision requests, a tool-capable route for tool traffic.
One caveat: an explicit model prefix overrides routing. If a client sends model: "mistral-large-latest" and the route has Mistral + OpenAI, DVARA routes to Mistral regardless of what the request contains — the explicit prefix wins over capability selection.
For OpenAI-compatible long-tail providers wired via base-URL override (Fireworks AI, Together AI, Perplexity, vLLM hosts, LiteLLM proxies, internal corporate gateways), capabilities depend on the upstream — see Additional OpenAI-compatible providers.
See Routing for how this interacts with failover, latency-aware routing, and canary splits. See Multimodal & Vision for request examples and a worked route setup for vision traffic.
Provider capability gaps in this release
Some upstream provider features are not implemented in this release. The table below is the single source of truth for what's missing per provider; the per-provider sections above link here rather than restating it. The governance pipeline (policy, PII, audit, budget, guardrails) applies to every request regardless of which capability the upstream supports — gaps are about how DVARA talks to the upstream, not about what governance covers.
The columns:
- Capability — the upstream feature.
- Status — what DVARA does today when the request reaches that provider.
- Workaround — the safe configuration to use until the gap is closed.
| Provider | Capability | Status in this release | Workaround |
|---|---|---|---|
| Mistral | Vision (Pixtral 12B, Pixtral Large) | Not supported. Mistral's text models are wired up; Pixtral is not. A vision request dispatched to Mistral has its image blocks serialized as garbage text — the model responds to the text portion only, with no error surfaced. | Use a vision-capable provider (OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure OpenAI) for vision traffic. See Multimodal & Vision. |
| Cohere | Vision | Not supported. Sending image content returns UNSUPPORTED_CAPABILITY (HTTP 400). | Use a vision-capable provider (see above). |
| Cohere | Tool calls | Not supported. Sending tool-use or tool-result content blocks returns UNSUPPORTED_CAPABILITY (HTTP 400). | Use a tool-capable provider (OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure OpenAI, Mistral). |
| Cohere | Structured outputs / JSON mode | Not supported. Sending response_format: json_schema or json_object to a Cohere-only route returns NO_CAPABLE_PROVIDER (HTTP 400) before dispatch. | Configure a route that includes a structured-outputs-capable provider; capability-aware routing automatically excludes Cohere from the candidate pool. |
| Groq | Vision | Not supported. Sending image content returns UNSUPPORTED_CAPABILITY (HTTP 400). | Use a vision-capable provider. |
| Groq | Tool calls | Not supported. Sending tool-use or tool-result content blocks returns UNSUPPORTED_CAPABILITY (HTTP 400). | Use a tool-capable provider. |
| Groq | Structured outputs (json_schema) | Not supported. Sending response_format: json_schema to a Groq-only route returns NO_CAPABLE_PROVIDER (HTTP 400). json_object mode works. | Use response_format: json_object if Groq's ultra-low latency matters and the schema is enforced client-side, or route to OpenAI / Gemini / Mistral / Azure for json_schema. |
| Ollama | Vision | Not supported. Sending image content returns UNSUPPORTED_CAPABILITY (HTTP 400). | Use a vision-capable provider. |
| Ollama | Tool calls | Not supported. Sending tool-use or tool-result content blocks returns UNSUPPORTED_CAPABILITY (HTTP 400). | Use a tool-capable provider. |
| Ollama | Structured outputs / JSON mode | Not supported. Sending response_format: json_schema or json_object returns NO_CAPABLE_PROVIDER (HTTP 400). | Route structured-output traffic to a capable provider. Local-only deployments that require structured outputs should pair Ollama with a hosted vision/structured provider on the same route. |
| Ollama | Embeddings | Not supported. Embedding requests with an ollama/ model return NO_PROVIDER (HTTP 400). | Use OpenAI's text-embedding-3-small / text-embedding-3-large models, which DVARA routes through the OpenAI provider. |
For OpenAI-compatible long-tail providers wired via base-URL override (Fireworks AI, Together AI, Perplexity, vLLM hosts, LiteLLM proxies, internal corporate gateways), capabilities depend on the upstream and are not tracked here — see Additional OpenAI-compatible providers for that surface.
The fail-safe configuration is always a route that contains only providers capable of the traffic it serves. A vision-only route never dispatches a vision request to a non-vision provider; a tool-capable-only route never dispatches a tool call to Cohere. See Routing for how to compose these routes and Multimodal & Vision for a worked vision-route example.
Next steps
- Routing — route by model, latency, cost, weight, or region, with canary splits and shadow traffic.
- Credentials & BYOK — tenant-scoped provider keys, rotation, and vault integration.
- Multi-Tenancy — how DVARA resolves the tenant from an API key and cascades per-tenant configuration.
- Multimodal & Vision — sending images through the gateway, supported providers, and route configuration for vision traffic.
- API — interactive OpenAPI reference for every
/v1/*endpoint, with a configurableTry itpanel.