Provider Setup
Dvara supports six LLM providers. Each provider is activated by setting the appropriate environment variable or configuration property. Only activated providers are registered — requests to unconfigured providers return HTTP 400.
OpenAI
Model prefix: gpt (also text-embedding for embeddings)
Required:
export OPENAI_API_KEY=sk-your-key
Configuration:
gateway:
providers:
openai:
api-key: ${OPENAI_API_KEY:}
base-url: https://api.openai.com/v1 # optional override
Supported models: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo, text-embedding-ada-002, text-embedding-3-small, text-embedding-3-large
Features: Chat completions, embeddings, streaming, vision, tool calls, structured outputs (native json_schema and json_object).
Azure OpenAI
Use the base-url override to point at your Azure deployment:
gateway:
providers:
openai:
api-key: ${AZURE_OPENAI_API_KEY}
base-url: https://my-resource.openai.azure.com/openai/deployments/my-deployment
Anthropic
Model prefix: claude
Required:
export ANTHROPIC_API_KEY=sk-ant-your-key
Configuration:
gateway:
providers:
anthropic:
api-key: ${ANTHROPIC_API_KEY:}
Supported models: claude-sonnet-4-5, claude-3-haiku, claude-3-opus
Features: Chat completions, streaming, vision, tool calls, structured outputs (via tool-use rewrite).
Implementation notes:
systemrole messages are extracted and passed as a separatesystemfield in the Anthropic API- Default
max_tokensis set to 1024 if not specified in the request json_objectmode injects a system prompt instructionjson_schemamode rewrites the request to use Anthropic's tool-use API with astructured_outputtool
Google Gemini
Model prefix: gemini
Required:
export GEMINI_API_KEY=AIza-your-key
Configuration:
gateway:
providers:
gemini:
api-key: ${GEMINI_API_KEY:}
Supported models: gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash
Features: Chat completions, streaming, vision, tool calls, structured outputs (native responseMimeType and responseSchema).
AWS Bedrock
Model prefix: bedrock/
The bedrock/ prefix is stripped before sending the model ID to the Bedrock API, allowing any Bedrock-hosted model to be used.
Required:
export BEDROCK_ENABLED=true
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION=us-east-1 # optional, defaults to us-east-1
Configuration:
gateway:
providers:
bedrock:
enabled: ${BEDROCK_ENABLED:false}
access-key: ${AWS_ACCESS_KEY_ID:}
secret-key: ${AWS_SECRET_ACCESS_KEY:}
region: ${AWS_REGION:us-east-1}
Example model IDs:
bedrock/anthropic.claude-3-sonnet-20240229-v1:0
bedrock/amazon.titan-text-express-v1
bedrock/meta.llama3-70b-instruct-v1:0
bedrock/mistral.mistral-7b-instruct-v0:2
Features: Chat completions, streaming, vision, tool calls, structured outputs (via tool-use rewrite for Claude models on Bedrock).
Authentication: Uses AWS SigV4 request signing.
Ollama
Model prefix: ollama/
The ollama/ prefix is stripped before sending the model name to the Ollama API.
Required:
export OLLAMA_ENABLED=true
Configuration:
gateway:
providers:
ollama:
enabled: ${OLLAMA_ENABLED:false}
base-url: ${OLLAMA_BASE_URL:http://localhost:11434}
Prerequisites: Ollama must be running locally (or at the configured base URL) with at least one model pulled:
ollama serve
ollama pull llama3.2
Example usage:
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "ollama/llama3.2", "messages": [{"role": "user", "content": "Hello!"}]}'
Limitations: Ollama does not support structured outputs (json_object or json_schema). Sending response_format to an Ollama model returns HTTP 400 with error code unsupported_response_format.
Mock Provider
Model prefix: mock/
The mock provider returns configurable fake completions without calling any upstream API. Useful for integration testing, CI pipelines, and local development without API keys.
Required:
export MOCK_PROVIDER_ENABLED=true
Configuration:
gateway:
providers:
mock:
enabled: ${MOCK_PROVIDER_ENABLED:false}
response: "This is a mock response" # static text (default)
latency-ms: 100 # simulated delay (ms)
stream-token-delay-ms: 20 # inter-token delay for SSE
error-rate: 0.0 # 0.0–1.0 failure fraction
response-overrides: # per-model response overrides
"[mock/fast]": "Fast model response"
"[mock/error-test]": "groovy: throw new RuntimeException('test')"
Per-model response overrides: The response-overrides map allows different mock/ models to return different responses. If no override matches the requested model, the default response is used. Override values support the same static text and scripting syntax as the default response. Use bracket notation ([mock/model-name]) to preserve the / in YAML keys.
Scripting: The response field (and override values) support dynamic responses via scripting engines:
# Groovy
response: "groovy: 'Hello from ' + request.model"
# JavaScript (GraalJS)
response: "js: 'Hello from ' + request.model"
# Python (GraalPy)
response: "python: 'Hello from ' + request.model"
Scripts receive a request binding with access to the full ChatRequest object. Scripting engines are optional classpath dependencies — a clear error is returned if the engine is not available.
Error simulation: Set error-rate to inject random failures (e.g., 0.1 = 10% of requests return PROVIDER_ERROR).
Circuit breaker: The mock provider is excluded from circuit breaker wrapping. Simulated errors (via error-rate) never trip the circuit breaker, ensuring the mock provider is always available.
CI usage: Enable the mock provider in CI test profiles to run integration tests without real API keys:
# application-ci.yml
gateway:
providers:
mock:
enabled: true
latency-ms: 0
stream-token-delay-ms: 0
Capabilities Matrix
| Provider | Streaming | Vision | Tool Calls | Structured Outputs | JSON Mode | Max Context |
|---|---|---|---|---|---|---|
| OpenAI | yes | yes | yes | yes (native) | yes (native) | 128,000 |
| Anthropic | yes | yes | yes | yes (tool rewrite) | yes (prompt) | 200,000 |
| Gemini | yes | yes | yes | yes (native) | yes (native) | 1,000,000 |
| Bedrock | yes | yes | yes | yes (tool rewrite) | yes (prompt) | 200,000 |
| Ollama | yes | no | no | no | no | 32,000 |
| Mock | yes | no | no | yes (wraps JSON) | yes (wraps JSON) | 128,000 |