Version: 1.3.0

Canary a new model version and roll back on error rate

The problem

You're considering moving a workload from one provider to another — gpt-4o on OpenAI to the same model on Azure OpenAI for HA, claude-sonnet-4-5 on Anthropic to the same model via AWS Bedrock for data-residency, or any first-class provider you've added recently. You want to send a small percentage of traffic to the candidate while the rest stays on the production provider, get side-by-side metrics (error rate, latency, cost per request), and make the promote or roll back call in minutes — not days, and without a redeploy.

The approach

DVARA's canary routing strategy splits a route between a baseline provider and a candidate provider with a configurable percentage. Both sides write metrics tagged with the variant so the canary report shows them side-by-side under the same route ID. A single API call flips the split back to 0% to roll back instantly.

Canary scope: provider-to-provider, not model-to-model

The canary strategy splits between two providers. Canarying a new model on the same provider (e.g. gpt-4o → gpt-5-preview both on OpenAI) is not supported by route configuration alone — DVARA routes pick a provider, and the upstream model name comes from the request or pinned-model-version. To test a same-provider model upgrade, gate the client side with a feature flag and let it send the new model name to a separately-pinned route. The canary recipe below covers the more common case of swapping the provider underneath the same workload.

Prerequisites

A running DVARA instance with a tenant and API key (Quickstart)
Both providers registered (e.g. OPENAI_API_KEY for the baseline and AZURE_OPENAI_API_KEY + AZURE_OPENAI_BASE_URL for the candidate) — see Provider Setup
An admin / owner account on DVARA Flightdeck or an admin PAT for the Automation API

The steps

1. Create a canary route

Canary configuration is API-only — the YAML gateway.routes block doesn't bind canary configs at startup. Use the Automation API:

ROUTE_ID=$(curl -s -X POST https://dvara.internal.example.com/v1/admin/routes \
  -H "Authorization: Bearer <admin-pat>" \
  -H "Content-Type: application/json" \
  -d '{
    "model_pattern": "gpt*",
    "strategy": "canary",
    "canary_config": {
      "baseline_provider": "openai",
      "candidate_provider": "azure-openai",
      "split_pct": 10,
      "test_name": "azure-eval"
    },
    "providers": [
      {"provider": "openai"},
      {"provider": "azure-openai"}
    ]
  }' | jq -r .id)

echo "Route ID: $ROUTE_ID"

The Automation API generates a server-side UUID for the route — capture it from the response (any client-supplied id is ignored) and use $ROUTE_ID in every URL below.

This route matches any gpt* model the client sends, routes 90% to OpenAI (baseline) and 10% to Azure OpenAI (candidate). DVARA tags each request with the variant so canary metrics stay separable from baseline metrics.

Alternatively, set this up in DVARA Flightdeck under Routing → Routes → New Route — see Routes and Policies.

2. Send traffic

No client change needed. Applications continue sending model: "gpt-4o" (or any gpt* model — the route matches the pattern). DVARA handles the split server-side.

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "..."}],
)

3. Monitor canary metrics

Pull the canary report at any time:

curl -s https://dvara.internal.example.com/v1/admin/routes/$ROUTE_ID/canary/report \
  -H "Authorization: Bearer <admin-pat>"

The response carries per-variant counts, error rates, latency percentiles, and cost. In DVARA Flightdeck, open the route's Canary dashboard (/routes/<route-id>/canary) for the same data with auto-refresh. See Routes and Policies for the dashboard walkthrough.

A typical decision threshold for promoting a canary:

Metric	Canary must be …
Error rate	Not meaningfully higher than baseline (within 1σ, or a fixed delta you set)
p95 latency	Within an acceptable budget of baseline (e.g. +20%)
Cost per request	Within budget, or the quality gain justifies the delta
Sample size	At least N thousand requests so the comparison is not noise

4a. Promote (if candidate is healthy)

Update the route to send all traffic to the candidate provider as the new baseline. DVARA keeps the old route version so you can roll back if something surprises you in production:

curl -s -X PUT https://dvara.internal.example.com/v1/admin/routes/$ROUTE_ID \
  -H "Authorization: Bearer <admin-pat>" \
  -H "Content-Type: application/json" \
  -d '{
    "model_pattern": "gpt*",
    "strategy": "model-prefix",
    "providers": [
      {"provider": "azure-openai"}
    ]
  }'

4b. Roll back (if candidate is bad)

Flip the split to 0% — propagated across the fleet by the config-version poll (within the poll interval, a few seconds), no pod restart:

curl -s -X PUT https://dvara.internal.example.com/v1/admin/routes/$ROUTE_ID/canary \
  -H "Authorization: Bearer <admin-pat>" \
  -H "Content-Type: application/json" \
  -d '{"split_pct": 0}'

Or restore the previous route version directly:

curl -s -X POST https://dvara.internal.example.com/v1/admin/routes/$ROUTE_ID/rollback \
  -H "Authorization: Bearer <admin-pat>" \
  -d '{"version": 1}'

All traffic goes back to OpenAI immediately. No redeploy, no app change.

Why this works

Split is server-side — applications keep sending model: "gpt-4o"; the weight is enforced inside DVARA. No need to ship a feature flag to every service.
Variants are separable in metrics — each canary request is tagged with the variant so gateway_canary_requests_total and the per-route error and latency histograms don't mix the two.
Rollback is atomic — a single PUT updates the route version, the config-version poll propagates the change to every data plane pod within the poll interval (a few seconds; see Architecture → Config propagation), and the next request on every pod uses the new split.
Route history is preserved — the old version is retained, so POST /v1/admin/routes/{id}/rollback restores without you having to remember the old config.

Common mistakes

Sampling too small a window before deciding — a 100-request canary shows noise, not signal. Pick a traffic volume and elapsed time that makes the delta detectable for your baseline error rate.
Picking providers that serve different model classes for the same workload — if the baseline is OpenAI's gpt-4o and the candidate is Anthropic's claude-sonnet-4-5, the canary report mixes model-quality variance with provider-infrastructure variance. Pick a candidate that exposes the same model name (e.g. azure-openai for gpt-4o, bedrock for claude-*) to keep the comparison clean.
Forgetting to clean up the canary route after promotion — once the candidate is the new baseline, swap the route's strategy to model-prefix with a single-provider list. A stale canary config keeps the canary metrics endpoint live but reports a stagnant 100/0 split.

Next steps

Routing — all the routing strategies (round-robin, weighted, latency-aware, cost-aware, geo-aware)
Resilience — circuit breakers and failover that compose with canary splits
Observability — the metrics and audit events that back the canary decision
Routes and Policies (Flightdeck) — canary dashboard walkthrough in the UI

The problem​

The approach​

Prerequisites​

The steps​

1. Create a canary route​

2. Send traffic​

3. Monitor canary metrics​

4a. Promote (if candidate is healthy)​

4b. Roll back (if candidate is bad)​

Why this works​

Common mistakes​

Next steps​