Skip to main content

Provider Setup

Dvara supports six LLM providers. Each provider is activated by setting the appropriate environment variable or configuration property. Only activated providers are registered — requests to unconfigured providers return HTTP 400.

OpenAI

Model prefix: gpt (also text-embedding for embeddings)

Required:

export OPENAI_API_KEY=sk-your-key

Configuration:

gateway:
providers:
openai:
api-key: ${OPENAI_API_KEY:}
base-url: https://api.openai.com/v1 # optional override

Supported models: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo, text-embedding-ada-002, text-embedding-3-small, text-embedding-3-large

Features: Chat completions, embeddings, streaming, vision, tool calls, structured outputs (native json_schema and json_object).

Azure OpenAI

Use the base-url override to point at your Azure deployment:

gateway:
providers:
openai:
api-key: ${AZURE_OPENAI_API_KEY}
base-url: https://my-resource.openai.azure.com/openai/deployments/my-deployment

Anthropic

Model prefix: claude

Required:

export ANTHROPIC_API_KEY=sk-ant-your-key

Configuration:

gateway:
providers:
anthropic:
api-key: ${ANTHROPIC_API_KEY:}

Supported models: claude-sonnet-4-5, claude-3-haiku, claude-3-opus

Features: Chat completions, streaming, vision, tool calls, structured outputs (via tool-use rewrite).

Implementation notes:

  • system role messages are extracted and passed as a separate system field in the Anthropic API
  • Default max_tokens is set to 1024 if not specified in the request
  • json_object mode injects a system prompt instruction
  • json_schema mode rewrites the request to use Anthropic's tool-use API with a structured_output tool

Google Gemini

Model prefix: gemini

Required:

export GEMINI_API_KEY=AIza-your-key

Configuration:

gateway:
providers:
gemini:
api-key: ${GEMINI_API_KEY:}

Supported models: gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash

Features: Chat completions, streaming, vision, tool calls, structured outputs (native responseMimeType and responseSchema).


AWS Bedrock

Model prefix: bedrock/

The bedrock/ prefix is stripped before sending the model ID to the Bedrock API, allowing any Bedrock-hosted model to be used.

Required:

export BEDROCK_ENABLED=true
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION=us-east-1 # optional, defaults to us-east-1

Configuration:

gateway:
providers:
bedrock:
enabled: ${BEDROCK_ENABLED:false}
access-key: ${AWS_ACCESS_KEY_ID:}
secret-key: ${AWS_SECRET_ACCESS_KEY:}
region: ${AWS_REGION:us-east-1}

Example model IDs:

bedrock/anthropic.claude-3-sonnet-20240229-v1:0
bedrock/amazon.titan-text-express-v1
bedrock/meta.llama3-70b-instruct-v1:0
bedrock/mistral.mistral-7b-instruct-v0:2

Features: Chat completions, streaming, vision, tool calls, structured outputs (via tool-use rewrite for Claude models on Bedrock).

Authentication: Uses AWS SigV4 request signing.


Ollama

Model prefix: ollama/

The ollama/ prefix is stripped before sending the model name to the Ollama API.

Required:

export OLLAMA_ENABLED=true

Configuration:

gateway:
providers:
ollama:
enabled: ${OLLAMA_ENABLED:false}
base-url: ${OLLAMA_BASE_URL:http://localhost:11434}

Prerequisites: Ollama must be running locally (or at the configured base URL) with at least one model pulled:

ollama serve
ollama pull llama3.2

Example usage:

curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "ollama/llama3.2", "messages": [{"role": "user", "content": "Hello!"}]}'

Limitations: Ollama does not support structured outputs (json_object or json_schema). Sending response_format to an Ollama model returns HTTP 400 with error code unsupported_response_format.


Mock Provider

Model prefix: mock/

The mock provider returns configurable fake completions without calling any upstream API. Useful for integration testing, CI pipelines, and local development without API keys.

Required:

export MOCK_PROVIDER_ENABLED=true

Configuration:

gateway:
providers:
mock:
enabled: ${MOCK_PROVIDER_ENABLED:false}
response: "This is a mock response" # static text (default)
latency-ms: 100 # simulated delay (ms)
stream-token-delay-ms: 20 # inter-token delay for SSE
error-rate: 0.0 # 0.0–1.0 failure fraction
response-overrides: # per-model response overrides
"[mock/fast]": "Fast model response"
"[mock/error-test]": "groovy: throw new RuntimeException('test')"

Per-model response overrides: The response-overrides map allows different mock/ models to return different responses. If no override matches the requested model, the default response is used. Override values support the same static text and scripting syntax as the default response. Use bracket notation ([mock/model-name]) to preserve the / in YAML keys.

Scripting: The response field (and override values) support dynamic responses via scripting engines:

# Groovy
response: "groovy: 'Hello from ' + request.model"

# JavaScript (GraalJS)
response: "js: 'Hello from ' + request.model"

# Python (GraalPy)
response: "python: 'Hello from ' + request.model"

Scripts receive a request binding with access to the full ChatRequest object. Scripting engines are optional classpath dependencies — a clear error is returned if the engine is not available.

Error simulation: Set error-rate to inject random failures (e.g., 0.1 = 10% of requests return PROVIDER_ERROR).

Circuit breaker: The mock provider is excluded from circuit breaker wrapping. Simulated errors (via error-rate) never trip the circuit breaker, ensuring the mock provider is always available.

CI usage: Enable the mock provider in CI test profiles to run integration tests without real API keys:

# application-ci.yml
gateway:
providers:
mock:
enabled: true
latency-ms: 0
stream-token-delay-ms: 0

Capabilities Matrix

ProviderStreamingVisionTool CallsStructured OutputsJSON ModeMax Context
OpenAIyesyesyesyes (native)yes (native)128,000
Anthropicyesyesyesyes (tool rewrite)yes (prompt)200,000
Geminiyesyesyesyes (native)yes (native)1,000,000
Bedrockyesyesyesyes (tool rewrite)yes (prompt)200,000
Ollamayesnononono32,000
Mockyesnonoyes (wraps JSON)yes (wraps JSON)128,000