Additional OpenAI-compatible providers
DVARA ships first-class providers for the most-asked-for upstreams — OpenAI, Anthropic, Gemini, Bedrock, Azure OpenAI, Mistral, Cohere, Groq, Ollama, Qwen, DeepSeek, Moonshot, ChatGLM, and Grok. For each of those, just set the credential env var and the provider activates; see Provider Setup. This page is for everything else — providers that aren't first-class but speak the OpenAI wire format and can be routed through DVARA via base-URL override or a custom route.
The long-tail this page covers: Fireworks AI, Together AI, Perplexity, vLLM hosts, LiteLLM proxies, internal corporate gateways, and any other deployment that exposes an OpenAI-compatible chat endpoint. Governance (policy, audit, PII, cost attribution) applies identically to these requests once routed.
A long-tail provider must be wired up in DVARA before it can be used — either by overriding an existing first-class provider's base URL, or by configuring a DVARA route that targets a registered provider. See the Providers guide for the override flow.
dvara.llm-gateway.providers.openai.base-url is global to the OpenAI provider. A deployment that overrides it to point at a long-tail provider can no longer use OpenAI in the same install. For mixed deployments that need OpenAI and a long-tail upstream side-by-side, the right answer is to raise the long-tail upstream to first-class — the same shape the Chinese providers (Qwen, DeepSeek, Moonshot, ChatGLM, Grok) ship as today. Don't try to make the override path scale to multiple upstreams at once; there's no per-route base-URL override.
All examples assume DVARA is running at http://localhost:8080 and you've created a tenant + API key via the Quickstart.
Any OpenAI-compatible provider — end-to-end
The pattern is always the same:
- Register the provider in DVARA by overriding the OpenAI provider's base URL (see Provider Setup → overriding the base URL for the per-platform recipes — VM, Docker Compose, Kubernetes).
- Point your OpenAI SDK at DVARA's base URL (
http://localhost:8080/v1in dev, your gateway hostname in production). - Set
modelto whatever the upstream exposes — DVARA forwards the request to whichever upstream the OpenAI provider'sbase-urlnow points at.
For example, to route through Fireworks AI, create this overrides.yml:
# overrides.yml — apply with SPRING_CONFIG_ADDITIONAL_LOCATION (Spring Boot's standard
# layered-config flag). See ../core-concepts/01-providers.md#advanced-overriding-the-base-url
# for the per-platform mount recipes.
dvara:
llm-gateway:
providers:
openai:
base-url: https://api.fireworks.ai/inference/v1
Set the upstream's API key as OPENAI_API_KEY (since you've re-pointed the OpenAI provider, DVARA sends that header value as the upstream auth):
export OPENAI_API_KEY=fw_your-fireworks-key
Then call from any OpenAI SDK, using a Fireworks model name:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="your-dvara-api-key",
)
response = client.chat.completions.create(
model="accounts/fireworks/models/llama-v3-70b",
messages=[{"role": "user", "content": "Hello from an additional provider!"}],
)
The same shape works for Together AI, Perplexity, vLLM hosts, LiteLLM proxies, and internal corporate gateways — only the base-url and the model name change.
Docker Model Runner
Docker Model Runner (DMR) is Docker's built-in local LLM runtime, bundled with recent Docker Desktop and available on Linux with Docker Engine. It exposes an OpenAI-compatible REST API and is not a first-class DVARA provider in this release — wire it up through the standard OpenAI-compatible path described above.
DMR's REST endpoint depends on whether DVARA runs on the host or inside a container on the same Docker host:
| Where DVARA runs | DMR base URL |
|---|---|
| On the host | http://localhost:12434/engines/v1 |
| In a container | http://model-runner.docker.internal/engines/v1 |
DMR ignores the Authorization header — the SDK still has to send something, so use any non-empty placeholder.
1. Pull a model:
docker model pull ai/llama3.2
docker model list
2. Register DMR as an OpenAI-compatible provider in DVARA:
# overrides.yml — apply with SPRING_CONFIG_ADDITIONAL_LOCATION
dvara:
llm-gateway:
providers:
openai:
base-url: http://localhost:12434/engines/v1 # or model-runner.docker.internal in-container
This re-points DVARA's OpenAI provider at DMR; if you want DMR alongside hosted OpenAI, register it under a custom prefix instead — see Provider Setup → overriding the base URL.
3. Call it from any OpenAI SDK:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="<your-dvara-api-key>",
)
response = client.chat.completions.create(
model="ai/llama3.2", # whatever you pulled with `docker model pull`
messages=[{"role": "user", "content": "Hello from Docker Model Runner!"}],
)
print(response.choices[0].message.content)
DMR supports chat completions and embeddings; capability surfaces (vision, tools, structured outputs) depend on the model and runtime backend (llama.cpp by default; vLLM / Diffusers on NVIDIA GPUs on Linux/WSL2). Where DMR's response shape diverges from the OpenAI canonical response, the upstream's behaviour passes through to the SDK — DVARA does not normalise.
DMR is localhost-only and unauthenticated by design. Treat it the same way you would Ollama — never expose the runtime port to the public internet; run it inside a private network, on the operator's machine, or behind a separate authenticated reverse proxy.
Governance still applies
Additional providers aren't second-class — every request still goes through:
- Policy evaluation — the same YAML policies that gate OpenAI requests gate these too
- Immutable audit — every request and response writes a signed audit event, including the upstream provider name
- PII redaction — sensitive spans stripped or tokenized before leaving the perimeter
- Cost attribution — per-tenant, per-model spend tracked for chargeback, provided the additional provider has a pricing entry in DVARA (see Cost management)
- Rate limiting and budget caps — same knobs as any other provider
For data-residency-sensitive deployments (PRC only, offshore only, mixed by tenant), pair this with DVARA's policy engine so different tenants can be routed to different upstream providers based on region tags. See Routes & Policies.
Next steps
- Back to Integrations overview
- Routing models from Python, JavaScript / TypeScript, or Java? The same base-URL swap applies — no SDK change needed
- Need correlation across SDKs? See DVARA headers