Version: 1.3.0

Chat completion

POST /v1/chat/completions

OpenAI-compatible chat completions. Accepts any model the gateway has a provider for — gpt-*, claude-*, gemini-*, ollama/*, mock/*, and so on. Set stream: true to receive SSE deltas.

The gateway runs the full pre-dispatch pipeline (template resolution, budget enforcement, policy evaluation, PII scan, guardrails, priority admission, model downgrade, context window check) before any upstream call, and the post-dispatch pipeline (schema validation, grounding detection, content filters, PII output scan) on the response.

Request

Responses

Chat completion

The tenant's hard budget cap is exhausted (BUDGET_CAP_HARD).

Request was blocked by an active policy rule (POLICY_DENIED, PII_DETECTED, GUARDRAIL_BLOCKED, IP_ACCESS_DENIED).

Per-key rate limit reached, or priority admission control throttled the request (PRIORITY_THROTTLED).

Chat completion

/v1/chat/completions

Request​

Responses​

Request

Responses