Chat completion
POST/v1/chat/completions
OpenAI-compatible chat completions. Accepts any model the gateway
has a provider for — gpt-*, claude-*, gemini-*, ollama/*,
mock/*, and so on. Set stream: true to receive SSE deltas.
The gateway runs the full pre-dispatch pipeline (template resolution, budget enforcement, policy evaluation, PII scan, guardrails, priority admission, model downgrade, context window check) before any upstream call, and the post-dispatch pipeline (schema validation, grounding detection, content filters, PII output scan) on the response.
Request
Responses
- 200
- 400
- 402
- 403
- 429
- 502
Chat completion
Request was rejected before dispatch — missing credentials, unknown model, invalid response_format, or capability mismatch.
The tenant's hard budget cap is exhausted (BUDGET_CAP_HARD).
Request was blocked by an active policy rule (POLICY_DENIED, PII_DETECTED, GUARDRAIL_BLOCKED, IP_ACCESS_DENIED).
Per-key rate limit reached, or priority admission control throttled the request (PRIORITY_THROTTLED).
Every configured provider on the route (and every fallback) has failed.