Canary a new model version and roll back on error rate
The problem
You're considering moving a workload from one provider to another — gpt-4o on OpenAI to the same model on Azure OpenAI for HA, claude-sonnet-4-5 on Anthropic to the same model via AWS Bedrock for data-residency, or any first-class provider you've added recently. You want to send a small percentage of traffic to the candidate while the rest stays on the production provider, get side-by-side metrics (error rate, latency, cost per request), and make the promote or roll back call in minutes — not days, and without a redeploy.
The approach
DVARA's canary routing strategy splits a route between a baseline provider and a candidate provider with a configurable percentage. Both sides write metrics tagged with the variant so the canary report shows them side-by-side under the same route ID. A single API call flips the split back to 0% to roll back instantly.
The canary strategy splits between two providers. Canarying a new model on the same provider (e.g. gpt-4o → gpt-5-preview both on OpenAI) is not supported by route configuration alone — DVARA routes pick a provider, and the upstream model name comes from the request or pinned-model-version. To test a same-provider model upgrade, gate the client side with a feature flag and let it send the new model name to a separately-pinned route. The canary recipe below covers the more common case of swapping the provider underneath the same workload.
Prerequisites
- A running DVARA instance with a tenant and API key (Quickstart)
- Both providers registered (e.g.
OPENAI_API_KEYfor the baseline andAZURE_OPENAI_API_KEY+AZURE_OPENAI_BASE_URLfor the candidate) — see Provider Setup - An admin / owner account on DVARA Flightdeck or an admin PAT for the Automation API
The steps
1. Create a canary route
Canary configuration is API-only — the YAML gateway.routes block doesn't bind canary configs at startup. Use the Automation API:
ROUTE_ID=$(curl -s -X POST https://dvara.internal.example.com/v1/admin/routes \
-H "Authorization: Bearer <admin-pat>" \
-H "Content-Type: application/json" \
-d '{
"model_pattern": "gpt*",
"strategy": "canary",
"canary_config": {
"baseline_provider": "openai",
"candidate_provider": "azure-openai",
"split_pct": 10,
"test_name": "azure-eval"
},
"providers": [
{"provider": "openai"},
{"provider": "azure-openai"}
]
}' | jq -r .id)
echo "Route ID: $ROUTE_ID"
The Automation API generates a server-side UUID for the route — capture it from the response (any client-supplied id is ignored) and use $ROUTE_ID in every URL below.
This route matches any gpt* model the client sends, routes 90% to OpenAI (baseline) and 10% to Azure OpenAI (candidate). DVARA tags each request with the variant so canary metrics stay separable from baseline metrics.
Alternatively, set this up in DVARA Flightdeck under Routing → Routes → New Route — see Routes and Policies.
2. Send traffic
No client change needed. Applications continue sending model: "gpt-4o" (or any gpt* model — the route matches the pattern). DVARA handles the split server-side.
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "..."}],
)
3. Monitor canary metrics
Pull the canary report at any time:
curl -s https://dvara.internal.example.com/v1/admin/routes/$ROUTE_ID/canary/report \
-H "Authorization: Bearer <admin-pat>"
The response carries per-variant counts, error rates, latency percentiles, and cost. In DVARA Flightdeck, open the route's Canary dashboard (/routes/<route-id>/canary) for the same data with auto-refresh. See Routes and Policies for the dashboard walkthrough.
A typical decision threshold for promoting a canary:
| Metric | Canary must be … |
|---|---|
| Error rate | Not meaningfully higher than baseline (within 1σ, or a fixed delta you set) |
| p95 latency | Within an acceptable budget of baseline (e.g. +20%) |
| Cost per request | Within budget, or the quality gain justifies the delta |
| Sample size | At least N thousand requests so the comparison is not noise |
4a. Promote (if candidate is healthy)
Update the route to send all traffic to the candidate provider as the new baseline. DVARA keeps the old route version so you can roll back if something surprises you in production:
curl -s -X PUT https://dvara.internal.example.com/v1/admin/routes/$ROUTE_ID \
-H "Authorization: Bearer <admin-pat>" \
-H "Content-Type: application/json" \
-d '{
"model_pattern": "gpt*",
"strategy": "model-prefix",
"providers": [
{"provider": "azure-openai"}
]
}'
4b. Roll back (if candidate is bad)
Flip the split to 0% — instantaneous across the fleet via PG NOTIFY, no pod restart:
curl -s -X PUT https://dvara.internal.example.com/v1/admin/routes/$ROUTE_ID/canary \
-H "Authorization: Bearer <admin-pat>" \
-H "Content-Type: application/json" \
-d '{"split_pct": 0}'
Or restore the previous route version directly:
curl -s -X POST https://dvara.internal.example.com/v1/admin/routes/$ROUTE_ID/rollback \
-H "Authorization: Bearer <admin-pat>" \
-d '{"version": 1}'
All traffic goes back to OpenAI immediately. No redeploy, no app change.
Why this works
- Split is server-side — applications keep sending
model: "gpt-4o"; the weight is enforced inside DVARA. No need to ship a feature flag to every service. - Variants are separable in metrics — each canary request is tagged with the variant so
gateway_canary_requests_totaland the per-route error and latency histograms don't mix the two. - Rollback is atomic — a single
PUTupdates the route version, a PG NOTIFY propagates the change to every data plane pod in milliseconds (see Architecture → Config propagation), and the next request on every pod uses the new split. - Route history is preserved — the old version is retained, so
POST /v1/admin/routes/{id}/rollbackrestores without you having to remember the old config.
Common mistakes
- Sampling too small a window before deciding — a 100-request canary shows noise, not signal. Pick a traffic volume and elapsed time that makes the delta detectable for your baseline error rate.
- Picking providers that serve different model classes for the same workload — if the baseline is OpenAI's
gpt-4oand the candidate is Anthropic'sclaude-sonnet-4-5, the canary report mixes model-quality variance with provider-infrastructure variance. Pick a candidate that exposes the same model name (e.g.azure-openaiforgpt-4o,bedrockforclaude-*) to keep the comparison clean. - Forgetting to clean up the canary route after promotion — once the candidate is the new baseline, swap the route's strategy to
model-prefixwith a single-provider list. A stale canary config keeps the canary metrics endpoint live but reports a stagnant 100/0 split.
Next steps
- Routing — all the routing strategies (round-robin, weighted, latency-aware, cost-aware, geo-aware)
- Resilience — circuit breakers and failover that compose with canary splits
- Observability — the metrics and audit events that back the canary decision
- Routes and Policies (Flightdeck) — canary dashboard walkthrough in the UI