Skip to main content

API Reference

Base URL: http://localhost:8080

All endpoints return Content-Type: application/json and include an X-Trace-ID response header.

OpenAPI Specification

Machine-readable OpenAPI 3.x specs are generated at startup by SpringDoc.

SpecURLContents
Public (JSON)GET /v3/api-docs/publicAll /v1/**, /admin/v1/**, and /status endpoints
Public (YAML)GET /v3/api-docs/public.yamlSame as above in YAML format
Internal (JSON)GET /v3/api-docs/internalInternal /internal/** endpoints only
Internal (YAML)GET /v3/api-docs/internal.yamlSame as above in YAML format

Use the spec to generate clients (OpenAPI Generator), mock servers, or detect breaking changes in CI.


Common Headers

HeaderDirectionDescription
X-Trace-IDRequestOptional. If provided, echoed back in the response.
X-Trace-IDResponseAlways present. Inbound value or generated 32-character hex UUID.
X-CacheResponseHIT or MISS (only on non-streaming requests when cache is enabled)
X-Cache-ControlRequestno-cache to bypass the response cache
AuthorizationRequestBearer <api-key> for rate limiting key identification
X-Budget-Remaining-TokensResponseEstimated tokens remaining in the active budget period. Omitted when no budget is configured. (Enterprise)
X-Budget-Remaining-PctResponsePercentage of budget remaining (0–100). Omitted when no budget is configured. (Enterprise)
X-Budget-WarningResponsetrue when a WARN_AGENT policy rule has fired (budget utilization exceeded threshold). Omitted otherwise. (Enterprise)
X-Context-Window-WarningResponsetrue when estimated tokens exceed the context window warning threshold. (Enterprise)
X-Context-Window-UtilizationResponseContext window utilization percentage (e.g., 87%). Present when warning threshold breached. (Enterprise)
X-Gateway-Strict-DowngradedResponsetrue when json_schema strict mode is downgraded (Anthropic/Bedrock).

POST /v1/chat/completions

OpenAI-compatible chat completions. Supports all configured providers.

Request

POST /v1/chat/completions
Content-Type: application/json
{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
"temperature": 0.7,
"max_tokens": 512
}

Request Fields

FieldTypeRequiredDescription
modelstringyesModel ID. Must match a configured provider's prefix.
messagesarrayyesNon-empty list of {role, content} objects.
temperaturenumbernoSampling temperature (0-2).
max_tokensintegernoMaximum tokens to generate.
streambooleannoIf true, returns Server-Sent Events stream.
response_formatobjectnoResponse format constraint. See Structured Outputs.
toolsarraynoTool definitions (passed through to provider).
tool_choicestring/objectnoTool choice control.
top_pnumbernoNucleus sampling parameter.
frequency_penaltynumbernoFrequency penalty (-2.0 to 2.0).
presence_penaltynumbernoPresence penalty (-2.0 to 2.0).
nintegernoNumber of completions to generate.
userstringnoEnd-user identifier.

Message Object

FieldTypeRequiredDescription
rolestringyessystem, user, or assistant
contentstring or arrayyesText string or array of content parts (for multimodal)
tool_callsarraynoTool calls from assistant
tool_call_idstringnoID referencing a tool call

Response (200 OK)

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1771667816,
"model": "gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 9,
"total_tokens": 33
}
}

Streaming Response

When "stream": true, the response is sent as Server-Sent Events:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","model":"gpt-4o","choices":[{"index":0,"delta":{"content":"The"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","model":"gpt-4o","choices":[{"index":0,"delta":{"content":" capital"},"finish_reason":null}]}

data: [DONE]

POST /v1/embeddings

Returns embedding vectors for the given input. Requires an OpenAI-compatible embedding provider (OPENAI_API_KEY).

Request

POST /v1/embeddings
Content-Type: application/json
{
"model": "text-embedding-ada-002",
"input": "The quick brown fox"
}

Request Fields

FieldTypeRequiredDescription
modelstringyesEmbedding model ID (must start with text-embedding)
inputstring or arrayyesText to embed. String or array of strings.

Response (200 OK)

{
"object": "list",
"model": "text-embedding-ada-002",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0023, -0.0091, 0.0152, ...]
}
],
"usage": {
"prompt_tokens": 5,
"total_tokens": 5
}
}

GET /v1/models

Lists all available models from registered providers, each with a capabilities object.

Request

GET /v1/models

Response (200 OK)

{
"object": "list",
"data": [
{
"id": "gpt-4o",
"object": "model",
"created": 1704067200,
"owned_by": "openai",
"capabilities": {
"supports_streaming": true,
"supports_vision": true,
"supports_tool_calls": true,
"supports_structured_outputs": true,
"supports_json_mode": true,
"max_context_tokens": 128000
}
},
{
"id": "claude-sonnet-4-5",
"object": "model",
"created": 1704067200,
"owned_by": "anthropic",
"capabilities": {
"supports_streaming": true,
"supports_vision": true,
"supports_tool_calls": true,
"supports_structured_outputs": true,
"supports_json_mode": true,
"max_context_tokens": 200000
}
}
]
}

Capability Fields

FieldTypeDescription
supports_streamingbooleanProvider supports SSE streaming
supports_visionbooleanProvider supports image/vision inputs
supports_tool_callsbooleanProvider supports function/tool calling
supports_structured_outputsbooleanProvider supports structured output schemas
supports_json_modebooleanProvider supports JSON-mode responses
max_context_tokensintegerMaximum context window size in tokens

POST /v1/completions (Legacy)

Legacy text-completion endpoint. Internally converts the prompt into a chat message and routes through the same pipeline.

Request

POST /v1/completions
Content-Type: application/json
{
"model": "gpt-3.5-turbo-instruct",
"prompt": "Say this is a test",
"max_tokens": 64
}

Request Fields

FieldTypeRequiredDescription
modelstringyesModel ID
promptstring or arrayyesText prompt
max_tokensintegernoMaximum tokens to generate
temperaturenumbernoSampling temperature

Response (200 OK)

{
"id": "cmpl-xyz789",
"object": "text_completion",
"created": 1771667816,
"model": "gpt-3.5-turbo-instruct",
"choices": [
{
"text": " This is indeed a test.",
"index": 0,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 6,
"completion_tokens": 7,
"total_tokens": 13
}
}

GET /

Redirects to /try.

GET /
302 Found
Location: /try

GET /try

Built-in browser-based chat test panel. Provides a model selector, message input, SSE streaming, response metadata (model, provider, latency, token count), and a "Copy as curl" button.

GET /try

Returns 200 OK with Content-Type: text/html.


GET /status

Returns gateway status as JSON with provider, route, and configuration information.

GET /status
Accept: application/json
{
"status": "running",
"mode": "standalone",
"version": "dev",
"uptimeSeconds": 120,
"configVersion": 0,
"providers": [
{
"name": "mock",
"type": "MockProvider",
"health": "HEALTHY",
"capabilities": {
"streaming": true,
"vision": false,
"toolCalls": false,
"structuredOutputs": true,
"jsonMode": true,
"maxContextTokens": 128000
}
}
],
"routes": [],
"region": {
"id": null,
"name": null,
"regionAware": false
},
"rateLimits": {
"enabled": false,
"globalRequestsPerSecond": 100,
"defaultPerKeyRequestsPerSecond": 10,
"defaultPerKeyTokensPerMinute": 100000
},
"warnings": ["No routes configured (using default model-prefix routing)"]
}

GET /actuator/gateway-status

Custom Actuator endpoint returning the same structured status as /status. Exposed via Spring Boot Actuator at /actuator/gateway-status.

GET /actuator/gateway-status

Returns the same JSON structure as GET /status.


POST /admin/v1/tenants

Creates a new tenant.

Request

POST /admin/v1/tenants
Content-Type: application/json
{
"name": "Acme Corp",
"status": "active",
"region": "us-east-1",
"metadata": {"plan": "enterprise"}
}

Request Fields

FieldTypeRequiredDescription
namestringyesTenant display name
statusstringnoactive (default) or suspended
regionstringnoDeployment region (e.g., us-east-1)
metadataobjectnoArbitrary key-value metadata

Response (201 Created)

Includes a Location header with the new tenant's URL.

Location: /admin/v1/tenants/{id}
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"name": "Acme Corp",
"status": "active",
"region": "us-east-1",
"metadata": {"plan": "enterprise"},
"created_at": "2026-01-15T10:30:00Z",
"updated_at": "2026-01-15T10:30:00Z"
}

GET /admin/v1/tenants

Lists all tenants.

Request

GET /admin/v1/tenants

Response (200 OK)

{
"object": "list",
"data": [
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"name": "Acme Corp",
"status": "active",
"region": "us-east-1",
"metadata": {"plan": "enterprise"},
"created_at": "2026-01-15T10:30:00Z",
"updated_at": "2026-01-15T10:30:00Z"
}
]
}

GET /admin/v1/tenants/{id}

Retrieves a single tenant by ID.

Request

GET /admin/v1/tenants/a1b2c3d4-e5f6-7890-abcd-ef1234567890

Response (200 OK)

{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"name": "Acme Corp",
"status": "active",
"region": "us-east-1",
"metadata": {"plan": "enterprise"},
"created_at": "2026-01-15T10:30:00Z",
"updated_at": "2026-01-15T10:30:00Z"
}

Response (404 Not Found)

Returned when the tenant ID does not exist.


PUT /admin/v1/tenants/{id}

Updates an existing tenant. Only provided fields are updated.

Request

PUT /admin/v1/tenants/a1b2c3d4-e5f6-7890-abcd-ef1234567890
Content-Type: application/json
{
"name": "Acme Inc",
"status": "suspended"
}

Request Fields

FieldTypeRequiredDescription
namestringnoUpdated tenant name
statusstringnoactive or suspended
regionstringnoUpdated region
metadataobjectnoReplacement metadata

Response (200 OK)

Returns the updated tenant with a refreshed updated_at timestamp.

Response (404 Not Found)

Returned when the tenant ID does not exist.


DELETE /admin/v1/tenants/{id}

Deletes a tenant.

Request

DELETE /admin/v1/tenants/a1b2c3d4-e5f6-7890-abcd-ef1234567890

Response (204 No Content)

Returned on successful deletion. No response body.

Response (404 Not Found)

Returned when the tenant ID does not exist.


POST /admin/v1/tenants/{tenantId}/keys

Creates a new API key for the specified tenant.

Request

POST /admin/v1/tenants/a1b2c3d4-e5f6-7890-abcd-ef1234567890/keys
Content-Type: application/json
{
"name": "production-key",
"scopes": ["completions:write"],
"expires_at": "2027-01-01T00:00:00Z"
}

Request Fields

FieldTypeRequiredDescription
namestringyesDisplay name for the API key
scopesarraynoPermission scopes. Defaults to ["completions:write"]
expires_atstringnoISO-8601 expiration timestamp. null for non-expiring

Response (201 Created)

Includes a Location header with the new key's URL. The key field contains the plaintext API key — this is the only time it is returned.

Location: /admin/v1/tenants/{tenantId}/keys/{id}
{
"id": "f7e6d5c4-b3a2-1098-7654-321fedcba098",
"key": "gw_a3b4c5d6e7f8091011121314151617181920212223",
"key_prefix": "gw_a3b4c5d6e",
"name": "production-key",
"scopes": ["completions:write"],
"status": "active",
"created_at": "2026-01-15T10:30:00Z"
}

Response (404 Not Found)

Returned when the tenant ID does not exist.


GET /admin/v1/tenants/{tenantId}/keys

Lists all API keys for a tenant. Plaintext keys are never returned.

Request

GET /admin/v1/tenants/a1b2c3d4-e5f6-7890-abcd-ef1234567890/keys

Response (200 OK)

{
"object": "list",
"data": [
{
"id": "f7e6d5c4-b3a2-1098-7654-321fedcba098",
"name": "production-key",
"key_prefix": "gw_a3b4c5d6e",
"scopes": ["completions:write"],
"status": "active",
"expires_at": "2027-01-01T00:00:00Z",
"created_at": "2026-01-15T10:30:00Z",
"updated_at": "2026-01-15T10:30:00Z"
}
]
}

GET /admin/v1/tenants/{tenantId}/keys/{keyId}

Retrieves a single API key by ID.

Request

GET /admin/v1/tenants/a1b2c3d4-e5f6-7890-abcd-ef1234567890/keys/f7e6d5c4-b3a2-1098-7654-321fedcba098

Response (200 OK)

{
"id": "f7e6d5c4-b3a2-1098-7654-321fedcba098",
"name": "production-key",
"key_prefix": "gw_a3b4c5d6e",
"scopes": ["completions:write"],
"status": "active",
"expires_at": "2027-01-01T00:00:00Z",
"created_at": "2026-01-15T10:30:00Z",
"updated_at": "2026-01-15T10:30:00Z"
}

Response (404 Not Found)

Returned when the tenant or key ID does not exist, or the key belongs to a different tenant.


DELETE /admin/v1/tenants/{tenantId}/keys/{keyId}

Revokes an API key (soft delete). The key status is set to revoked and it can no longer be used for authentication.

Request

DELETE /admin/v1/tenants/a1b2c3d4-e5f6-7890-abcd-ef1234567890/keys/f7e6d5c4-b3a2-1098-7654-321fedcba098

Response (204 No Content)

Returned on successful revocation. No response body.

Response (404 Not Found)

Returned when the tenant or key ID does not exist.


POST /admin/v1/tenants/{tenantId}/keys/{keyId}/rotate

Rotates an API key. The old key is marked as rotated (still valid during a grace period) and a new key is generated with the same name and scopes.

Request

POST /admin/v1/tenants/a1b2c3d4-e5f6-7890-abcd-ef1234567890/keys/f7e6d5c4-b3a2-1098-7654-321fedcba098/rotate

Response (201 Created)

Returns the new key with plaintext (same as create). The old key's status changes to rotated.

{
"id": "new-key-uuid",
"key": "gw_newplaintextkey...",
"key_prefix": "gw_newplaint",
"name": "production-key",
"scopes": ["completions:write"],
"status": "active",
"created_at": "2026-02-01T12:00:00Z"
}

Response (404 Not Found)

Returned when the tenant or key ID does not exist.


GET /admin/v1/providers/{id}/capabilities

Returns the capabilities of a specific registered provider. Returns 404 if the provider is not configured.

Request

GET /admin/v1/providers/openai/capabilities

Response (200 OK)

{
"supports_streaming": true,
"supports_vision": true,
"supports_tool_calls": true,
"supports_structured_outputs": true,
"supports_json_mode": true,
"max_context_tokens": 128000
}

Response (404 Not Found)

Returned when the provider ID is unknown or not registered.

Valid provider IDs: openai, anthropic, gemini, bedrock, ollama, mock


POST /admin/v1/routes

Creates a new versioned route configuration. Changes propagate live without restart.

Request

POST /admin/v1/routes
Content-Type: application/json
{
"model_pattern": "gpt*",
"strategy": "model-prefix",
"providers": [
{"provider": "openai", "weight": 1}
],
"pinned_model_version": null,
"latency_sla_ms": 0
}

Request Fields

FieldTypeRequiredDescription
model_patternstringyesGlob pattern to match model names (e.g., gpt*)
strategystringnoRouting strategy: model-prefix, round-robin, weighted. Default: model-prefix
providersarrayyesList of provider entries with provider, weight, model_override, region
pinned_model_versionstringnoPin all requests on this route to a specific model version
latency_sla_mslongnoLatency SLA in milliseconds for cost-aware routing. Default: 0 (disabled). When set, cost-aware routing selects the cheapest provider meeting this latency target

Response (201 Created)

{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"model_pattern": "gpt*",
"strategy": "model-prefix",
"providers": [{"provider": "openai", "weight": 1}],
"latency_sla_ms": 0,
"version": 1,
"created_at": "2026-01-01T00:00:00Z",
"updated_at": "2026-01-01T00:00:00Z"
}

GET /admin/v1/routes

Lists all route configurations.

Request

GET /admin/v1/routes

Response (200 OK)

{
"object": "list",
"data": [
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"model_pattern": "gpt*",
"strategy": "model-prefix",
"providers": [{"provider": "openai", "weight": 1}],
"latency_sla_ms": 0,
"version": 1,
"created_at": "2026-01-01T00:00:00Z",
"updated_at": "2026-01-01T00:00:00Z"
}
]
}

GET /admin/v1/routes/{id}

Returns a specific route configuration.

Request

GET /admin/v1/routes/a1b2c3d4-e5f6-7890-abcd-ef1234567890

Response (200 OK)

Same format as the route object in the list response.

Response (404 Not Found)

Returned when the route ID does not exist.


PUT /admin/v1/routes/{id}

Updates a route configuration. Each update increments the version number. Changes propagate live.

Request

PUT /admin/v1/routes/a1b2c3d4-e5f6-7890-abcd-ef1234567890
Content-Type: application/json
{
"model_pattern": "claude*",
"strategy": "round-robin",
"providers": [
{"provider": "anthropic", "weight": 1},
{"provider": "bedrock", "weight": 1}
]
}

All fields are optional — only provided fields are updated.

Response (200 OK)

Returns the updated route with incremented version.


DELETE /admin/v1/routes/{id}

Deletes a route and its version history. Changes propagate live.

Request

DELETE /admin/v1/routes/a1b2c3d4-e5f6-7890-abcd-ef1234567890

Response (204 No Content)

Response (404 Not Found)

Returned when the route ID does not exist.


GET /admin/v1/routes/{id}/versions

Returns the version history for a route (last 10 versions minimum).

Request

GET /admin/v1/routes/a1b2c3d4-e5f6-7890-abcd-ef1234567890/versions

Response (200 OK)

{
"object": "list",
"data": [
{
"route_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"version": 2,
"model_pattern": "claude*",
"strategy": "round-robin",
"providers": [{"provider": "anthropic", "weight": 1}],
"created_at": "2026-01-02T00:00:00Z"
},
{
"route_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"version": 1,
"model_pattern": "gpt*",
"strategy": "model-prefix",
"providers": [{"provider": "openai", "weight": 1}],
"created_at": "2026-01-01T00:00:00Z"
}
]
}

POST /admin/v1/routes/{id}/rollback

Rolls back a route to a specified version. Creates a new version with the rolled-back configuration. Changes propagate live.

Request

POST /admin/v1/routes/a1b2c3d4-e5f6-7890-abcd-ef1234567890/rollback
Content-Type: application/json
{
"version": 1
}

Response (200 OK)

Returns the route with rolled-back configuration and incremented version.

Response (404 Not Found)

Returned when the route ID or the specified version does not exist.


GET /admin/v1/latency

Returns current EWMA latency for all tracked provider+model pairs. Available when the enterprise LatencyTracker is active; returns an empty list without license.

Request

GET /admin/v1/latency

Response (200 OK)

{
"object": "list",
"data": [
{
"provider": "openai",
"model": "gpt-4o",
"ewmaLatencyMs": 120.5,
"rawLatencyMs": 115.0,
"sampleCount": 42,
"lastUpdated": "2026-01-01T00:00:00Z"
},
{
"provider": "anthropic",
"model": "claude-sonnet-4-5-20250514",
"ewmaLatencyMs": 95.3,
"rawLatencyMs": 88.0,
"sampleCount": 128,
"lastUpdated": "2026-01-01T00:01:00Z"
}
]
}
FieldTypeDescription
providerstringProvider name
modelstringModel name
ewmaLatencyMsdoubleExponentially weighted moving average latency in milliseconds
rawLatencyMsdoubleMost recent raw latency sample in milliseconds
sampleCountlongTotal number of latency samples recorded
lastUpdatedstringISO-8601 timestamp of the last recorded sample

GET /admin/v1/priority/stats

Returns current priority admission control stats including per-tier concurrency and threshold configuration.

Request

GET /admin/v1/priority/stats

Response (200 OK)

{
"object": "priority_stats",
"data": {
"currentConcurrent": 17,
"maxConcurrent": 1000,
"perTierConcurrent": {
"premium": 5,
"standard": 10,
"bulk": 2
},
"perTierThresholdPct": {
"premium": 100,
"standard": 80,
"bulk": 50
}
}
}
FieldTypeDescription
currentConcurrentintTotal in-flight requests across all tiers
maxConcurrentintConfigured maximum concurrent requests (0 without license)
perTierConcurrentobjectCurrent in-flight count per priority tier
perTierThresholdPctobjectConfigured throttle threshold percentage per tier

POST /admin/v1/policies

Creates a new policy in DRAFT status.

Request

POST /admin/v1/policies
Content-Type: application/json

{
"name": "block-legacy-models",
"tenant_id": "acme-corp",
"description": "Block legacy models for production tenants",
"dsl": "version: \"1\"\nrules:\n - id: block-legacy\n conditions:\n model:\n denylist: [gpt-3.5-turbo]\n action: DENY\n deny_message: \"Legacy model not allowed\""
}

Response (201 Created)

{
"id": "pol-abc123",
"tenant_id": "acme-corp",
"name": "block-legacy-models",
"description": "Block legacy models for production tenants",
"dsl": "...",
"status": "DRAFT",
"version": 1,
"created_at": "2026-01-15T10:00:00Z",
"updated_at": "2026-01-15T10:00:00Z"
}

GET /admin/v1/policies

Lists all policies. Optional ?tenant_id= filter.


GET /admin/v1/policies/{id}

Returns a single policy by ID. Returns 404 if not found.


PUT /admin/v1/policies/{id}

Updates a policy. Each update increments the version. Only provided fields are changed.


DELETE /admin/v1/policies/{id}

Deletes a policy and its version history. Returns 204.


POST /admin/v1/policies/{id}/status

Changes policy status. Valid values: DRAFT, ACTIVE, SHADOW, ARCHIVED. When transitioning to ACTIVE, conflict detection runs against other active policies and returns warnings if conflicts are found.

Request

POST /admin/v1/policies/pol-abc123/status
Content-Type: application/json

{"status": "SHADOW"}

GET /admin/v1/policies/{id}/versions

Returns the version history for a policy (last 10 versions minimum).


POST /admin/v1/policies/{id}/rollback

Rolls back a policy to a specified version. Creates a new version with the restored configuration.

Request

{"version": 1}

POST /admin/v1/policies/dry-run

Simulates a policy evaluation. Validates YAML syntax and evaluates through the PolicyEngine.

Request

{
"dsl": "version: \"1\"\nrules:\n - id: test\n conditions:\n model:\n denylist: [gpt-3.5-turbo]\n action: DENY",
"tenant_id": "acme-corp",
"model": "gpt-3.5-turbo"
}

Response (200 OK)

{
"status": "DENIED",
"reason": "Request blocked by policy rule: test",
"rule_id": "test",
"evaluation_time_ms": 2
}

POST /admin/v1/policies/{id}/promote

Promotes a SHADOW policy to ACTIVE status. Runs conflict detection, cleans up shadow events, and publishes a policy mutation event.

Requires org-admin or policy-admin role.

Request

POST /admin/v1/policies/pol-abc123/promote

Response (200 OK)

{
"id": "pol-abc123",
"name": "block-legacy-models",
"status": "ACTIVE",
"version": 3,
"warnings": ["Potential conflict with policy pol-xyz: overlapping model denylist"]
}

Response (400 Bad Request)

Returned when the policy is not in SHADOW status.


GET /admin/v1/policies/{id}/shadow/stats

Returns shadow mode divergence statistics for a policy.

Requires org-admin, policy-admin, developer, or viewer role.

Query Parameters

ParameterRequiredDefaultDescription
periodNo24hTime window: 1h, 24h, or 7d

Response (200 OK)

{
"policy_id": "pol-abc123",
"period": "24h",
"total_evaluations": 1250,
"divergent_count": 47,
"divergence_rate": 0.0376,
"by_rule": [
{
"rule_id": "block-legacy",
"count": 35,
"divergence_type": "shadow_deny_active_allow"
},
{
"rule_id": "max-tokens-limit",
"count": 12,
"divergence_type": "shadow_deny_active_allow"
}
]
}

GET /admin/v1/policies/{id}/shadow/events

Returns recent shadow policy events, optionally filtered to diverged-only.

Requires org-admin, policy-admin, developer, or viewer role.

Query Parameters

ParameterRequiredDefaultDescription
divergedNotrueFilter to diverged events only
limitNo50Maximum events to return

Response (200 OK)

{
"object": "list",
"data": [
{
"id": "evt-uuid",
"policy_id": "pol-abc123",
"rule_id": "block-legacy",
"request_id": "trace-xyz",
"tenant_id": "acme-corp",
"model": "gpt-3.5-turbo",
"active_decision": "ALLOW",
"shadow_decision": "DENY",
"diverged": true,
"divergence_type": "shadow_deny_active_allow",
"timestamp": "2026-02-28T14:30:00Z"
}
]
}

Internal Config Distribution API

Internal endpoints for data plane pods to pull configuration from the control plane without direct database access. Protected by the X-Internal-Secret header when gateway.internal.secret is configured.

GET /internal/v1/config/version

Returns the current config version (monotonic counter).

Request

GET /internal/v1/config/version
X-Internal-Secret: <secret>

Response (200 OK)

{
"version": 42
}

GET /internal/v1/config/full

Returns a full config snapshot (tenants, routes, API keys without key hashes).

Request

GET /internal/v1/config/full?since_version=0
X-Internal-Secret: <secret>
ParameterTypeRequiredDescription
since_versionintegernoAdvisory version hint (full snapshot always returned in core). Default: 0

Response (200 OK)

{
"version": 42,
"tenants": [
{
"id": "a1b2c3d4-...",
"name": "Acme Corp",
"status": "active",
"region": "us-east-1",
"metadata": {"plan": "enterprise"},
"created_at": "2026-01-15T10:30:00Z",
"updated_at": "2026-01-15T10:30:00Z"
}
],
"routes": [
{
"id": "r1b2c3d4-...",
"model_pattern": "gpt*",
"strategy": "model-prefix",
"providers": [{"provider": "openai", "weight": 1}],
"version": 1,
"created_at": "2026-01-01T00:00:00Z",
"updated_at": "2026-01-01T00:00:00Z"
}
],
"api_keys": [
{
"id": "k1b2c3d4-...",
"tenant_id": "a1b2c3d4-...",
"name": "production-key",
"key_prefix": "gw_a3b4c5d6e",
"scopes": ["completions:write"],
"status": "active",
"expires_at": "2027-01-01T00:00:00Z",
"created_at": "2026-01-15T10:30:00Z",
"updated_at": "2026-01-15T10:30:00Z"
}
]
}

Note: key_hash is deliberately omitted from API key entries for security.

Response (401 Unauthorized)

Returned when gateway.internal.secret is configured and the request is missing or has an invalid X-Internal-Secret header.


POST /admin/v1/pricing

Creates a new model pricing entry. Pricing entries define cost-per-million-tokens for input and output. Model names support glob patterns (e.g. gpt-4o* matches gpt-4o, gpt-4o-mini).

Request

POST /admin/v1/pricing
Content-Type: application/json
{
"model": "gpt-4o",
"provider": "openai",
"inputPricePerMillion": 2.50,
"outputPricePerMillion": 10.00
}

Request Fields

FieldTypeRequiredDescription
modelstringyesModel name or glob pattern (e.g. gpt-4o*)
providerstringyesProvider name (e.g. openai, anthropic)
inputPricePerMillionBigDecimalyesUSD cost per 1M input tokens
outputPricePerMillionBigDecimalyesUSD cost per 1M output tokens

Response (201 Created)

{
"id": "f47ac10b-...",
"model": "gpt-4o",
"provider": "openai",
"inputPricePerMillion": 2.50,
"outputPricePerMillion": 10.00,
"effectiveDate": "2026-03-04T10:00:00Z",
"createdAt": "2026-03-04T10:00:00Z",
"updatedAt": "2026-03-04T10:00:00Z"
}

Includes Location: /admin/v1/pricing/\{id\} header.


GET /admin/v1/pricing

Lists all pricing entries. Optional ?model= and ?provider= filters.

Response (200 OK)

{
"object": "list",
"data": [
{
"id": "f47ac10b-...",
"model": "gpt-4o",
"provider": "openai",
"inputPricePerMillion": 2.50,
"outputPricePerMillion": 10.00,
"effectiveDate": "2026-03-04T10:00:00Z",
"createdAt": "2026-03-04T10:00:00Z",
"updatedAt": "2026-03-04T10:00:00Z"
}
]
}

GET /admin/v1/pricing/{id}

Returns a specific pricing entry. Returns 404 with pricing_not_found error if not found.


PUT /admin/v1/pricing/{id}

Updates a pricing entry. Returns 404 if not found. Increments ConfigVersionTracker.


DELETE /admin/v1/pricing/{id}

Deletes a pricing entry. Returns 204. Returns 404 if not found.


GET /admin/v1/costs

Lists cost records. Supports filtering by tenantId, apiKey, model, and provider query parameters.

Request

GET /admin/v1/costs?tenantId=acme-corp&model=gpt-4o

Response (200 OK)

[
{
"id": "cost-abc123",
"tenantId": "acme-corp",
"apiKey": "sk-test",
"model": "gpt-4o",
"provider": "openai",
"inputTokens": 1000,
"outputTokens": 500,
"inputCost": 0.003,
"outputCost": 0.0075,
"totalCost": 0.0105,
"currency": "USD",
"pricingId": "f47ac10b-...",
"timestamp": "2026-03-04T10:05:00Z"
}
]

GET /admin/v1/costs/summary

Returns aggregated cost summary. Supports filtering by tenantId, model, provider, from, and to (ISO 8601 timestamps).

Request

GET /admin/v1/costs/summary?tenantId=acme-corp&model=gpt-4o

Response (200 OK)

{
"tenantId": "acme-corp",
"model": "gpt-4o",
"provider": null,
"totalInputCost": 0.15,
"totalOutputCost": 0.375,
"totalCost": 0.525,
"requestCount": 50,
"totalInputTokens": 50000,
"totalOutputTokens": 25000
}

POST /admin/v1/budgets

Creates a budget cap with soft and hard limits.

Request

POST /admin/v1/budgets
Content-Type: application/json

{
"name": "Production Monthly",
"tenantId": "acme-corp",
"apiKeyId": null,
"period": "MONTHLY",
"limitUsd": 1000.00,
"softLimitPct": 80,
"enabled": true
}

Response (201 Created)

{
"id": "b-123",
"tenantId": "acme-corp",
"apiKeyId": null,
"name": "Production Monthly",
"period": "MONTHLY",
"limitUsd": 1000.00,
"softLimitPct": 80,
"enabled": true,
"version": 1,
"createdAt": "2026-03-04T12:00:00Z",
"updatedAt": "2026-03-04T12:00:00Z"
}

GET /admin/v1/budgets

Lists budget caps. Optional filters: tenantId, apiKeyId.


GET /admin/v1/budgets/{id}

Returns a specific budget cap. Returns 404 (BUDGET_NOT_FOUND) if not found.


PUT /admin/v1/budgets/{id}

Updates a budget cap. Increments version. Supports partial updates (name, period, limitUsd, softLimitPct, enabled).


DELETE /admin/v1/budgets/{id}

Deletes a budget cap. Returns 204 on success, 404 if not found.


GET /admin/v1/budgets/{id}/usage

Returns current period spend vs budget limit.

Response (200 OK)

{
"budgetId": "b-123",
"name": "Production Monthly",
"period": "MONTHLY",
"limitUsd": 1000.00,
"softLimitUsd": 800.00,
"currentSpend": 450.00,
"remainingUsd": 550.00,
"utilizationPct": 45.0,
"periodStart": "2026-03-01T00:00:00Z",
"periodEnd": "2026-04-01T00:00:00Z"
}

POST /admin/v1/chargeback

Generates a chargeback report for the specified time period. Requires enterprise license.

Request

{
"tenant_id": "acme-corp",
"from": "2026-02-01T00:00:00Z",
"to": "2026-03-01T00:00:00Z"
}
FieldTypeRequiredDescription
tenant_idstringnoTenant to report on (null = all)
fromISO 8601yesStart of reporting period
toISO 8601yesEnd of reporting period

Response (201 Created)

{
"id": "cb-a1b2c3d4",
"tenant_id": null,
"from": "2026-02-01T00:00:00Z",
"to": "2026-03-01T00:00:00Z",
"generated_at": "2026-03-01T12:00:00Z",
"generated_by": "admin",
"sections": {
"tenantSummary": [...],
"apiKeySummary": [...],
"modelSummary": [...],
"providerSummary": [...],
"dailyBreakdown": [...],
"forecast": { "trailing7d": [...], "trailing30d": [...] },
"anomalies": [...]
}
}

Response (400 Bad Request)

Returned without license when enterprise license is not available (CHARGEBACK_NOT_AVAILABLE).


GET /admin/v1/chargeback

Lists all chargeback reports.

Query Parameters

ParameterTypeRequiredDescription
tenant_idstringnoFilter by tenant ID

Response (200 OK)

{
"object": "list",
"data": [...]
}

GET /admin/v1/chargeback/{id}

Returns a specific chargeback report (without PDF bytes).

Response (404 Not Found)

Returned when the report ID does not exist (CHARGEBACK_NOT_FOUND).


GET /admin/v1/chargeback/{id}/pdf

Downloads a chargeback report as PDF.

Response (200 OK)

Returns Content-Type: application/pdf with Content-Disposition: attachment; filename=chargeback-report-\{id\}.pdf.


GET /admin/v1/chargeback/{id}/csv

Downloads a chargeback report as CSV.

Response (200 OK)

Returns Content-Type: text/csv with Content-Disposition: attachment; filename=chargeback-report-\{id\}.csv.


DELETE /admin/v1/chargeback/{id}

Deletes a chargeback report. Returns 204.


POST /admin/v1/schemas

Creates an output schema configuration for automatic response JSON validation. Requires enterprise license.

Request

POST /admin/v1/schemas
Content-Type: application/json
{
"modelPattern": "gpt-4o*",
"routeId": "gpt-route",
"schema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"age": { "type": "integer" }
},
"required": ["name"]
},
"maxRetries": 2,
"correctionPrompt": "The response must be valid JSON matching the schema. Fix the following errors:",
"enabled": true
}

Response: 201 Created


GET /admin/v1/schemas

Lists all output schema configurations.

Response: 200 OK — Array of schema config objects.


GET /admin/v1/schemas/{id}

Gets a specific output schema configuration.

Response: 200 OK — Schema config object.


PUT /admin/v1/schemas/{id}

Updates an output schema configuration.

Response: 200 OK — Updated schema config object.


DELETE /admin/v1/schemas/{id}

Deletes an output schema configuration. Returns 204 No Content.


GET /admin/v1/costs/forecast

Returns cost forecasts based on trailing spend trends.

Query Parameters

ParameterTypeRequiredDescription
tenant_idstringnoFilter by tenant ID
trailing_daysintnoTrailing window (default: 7)

Response (200 OK)

[
{
"tenantId": "acme-corp",
"model": "gpt-4o",
"trailingDays": 7,
"dailyAverage": 12.50,
"projectedMonthEnd": 375.00,
"trend": "increasing",
"computedAt": "2026-03-01T12:00:00Z"
}
]

GET /admin/v1/costs/anomalies

Returns active cost anomalies where current daily spend rate exceeds the configured threshold relative to the 30-day baseline.

Query Parameters

ParameterTypeRequiredDescription
tenant_idstringnoFilter by tenant ID

Response (200 OK)

[
{
"tenantId": "acme-corp",
"model": "gpt-4o",
"currentDailyRate": 5.00,
"baselineDailyRate": 1.00,
"deviationPct": 500.0,
"thresholdPct": 200.0,
"detectedAt": "2026-03-01T12:00:00Z"
}
]

POST /v1/budget/estimate

Pre-request budget estimation endpoint. Returns estimated cost for a request and remaining budget information. Useful for agent orchestrators to check budget before making a request.

Request

POST /v1/budget/estimate
Content-Type: application/json
{
"model": "gpt-4o",
"max_tokens": 1000
}
FieldTypeRequiredDescription
modelstringyesModel name for cost estimation
max_tokensintegernoMax tokens for cost estimation

Response (200 OK)

{
"estimated_cost_usd": 0.005,
"budget_remaining_usd": 70.0,
"budget_remaining_tokens": 500000,
"budget_remaining_pct": 70,
"would_exceed_budget": false
}
FieldTypeNullableDescription
estimated_cost_usddoublenoEstimated cost of the request in USD
budget_remaining_usddoubleyesRemaining budget in USD (null when no budget configured)
budget_remaining_tokensintegeryesEstimated tokens remaining in budget (null when no budget)
budget_remaining_pctintegeryesPercentage of budget remaining (null when no budget)
would_exceed_budgetbooleannoWhether this request would exceed the remaining budget

Error Responses

StatusError CodeDescription
400invalid_request_errorMissing required model field

GET /admin/v1/audit/events

Lists audit events, optionally filtered by tenant, event type, and date range. Returns newest first.

Request

GET /admin/v1/audit/events?tenant_id=acme-corp&event_type=GATEWAY_RESPONSE&from=2026-01-01T00:00:00Z&to=2026-01-02T00:00:00Z

Query Parameters

ParameterTypeRequiredDescription
tenant_idstringnoFilter by tenant ID
event_typestringnoFilter by event type (GATEWAY_REQUEST, GATEWAY_RESPONSE)
fromISO 8601noStart of time range
toISO 8601noEnd of time range

Response (200 OK)

[
{
"eventId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"timestamp": "2026-01-15T10:30:00Z",
"tenantId": "acme-corp",
"eventType": "GATEWAY_RESPONSE",
"payload": {
"model": "gpt-4o",
"provider": "openai",
"status": 200,
"latency_ms": 342,
"api_key": "sk-prod-1...",
"tokens_total": "235"
}
}
]

GET /admin/v1/audit/events/export

Downloads audit events as a CSV file. Accepts the same filter parameters as the list endpoint.

Request

GET /admin/v1/audit/events/export?tenant_id=acme-corp

Response (200 OK)

Returns Content-Type: text/csv with Content-Disposition: attachment; filename=audit-events.csv.

event_id,timestamp,tenant_id,event_type,payload
a1b2c3d4-...,2026-01-15T10:30:00Z,acme-corp,GATEWAY_RESPONSE,"{model=gpt-4o, status=200}"

GET /admin/v1/audit/events/export/json

Downloads audit events as a JSON file. Accepts the same filter parameters as the list endpoint.

Request

GET /admin/v1/audit/events/export/json?tenant_id=acme-corp

Response (200 OK)

Returns Content-Type: application/json with Content-Disposition: attachment; filename=audit-events.json.

[
{
"eventId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"timestamp": "2026-01-15T10:30:00Z",
"tenantId": "acme-corp",
"eventType": "GATEWAY_RESPONSE",
"payload": {
"model": "gpt-4o",
"provider": "openai",
"status": 200
}
}
]

POST /admin/v1/reports

Generates a compliance report (SOC2, HIPAA, or GDPR). Requires enterprise license.

Request

POST /admin/v1/reports
Content-Type: application/json
{
"type": "SOC2",
"tenant_id": "acme-corp",
"from": "2026-02-01T00:00:00Z",
"to": "2026-03-01T00:00:00Z"
}
FieldTypeRequiredDescription
typestringyesReport type: SOC2, HIPAA, or GDPR
tenant_idstringnoTenant ID to scope report (null = all tenants)
fromISO 8601yesStart of reporting period
toISO 8601yesEnd of reporting period

Response (201 Created)

{
"id": "rpt-a1b2c3d4",
"type": "SOC2",
"tenant_id": null,
"from": "2026-02-01T00:00:00Z",
"to": "2026-03-01T00:00:00Z",
"generated_at": "2026-03-01T12:00:00Z",
"generated_by": "admin@acme.com",
"sections": {
"auditChainIntegrity": { "valid": true, "verifiedCount": 1542 },
"accessControlSummary": { "authSuccessCount": 980, "authFailureCount": 12 },
"policyEnforcementSummary": { "policyDeniedCount": 5 },
"tokenUsageSummary": [ ... ],
"eventCountByType": { "GATEWAY_RESPONSE": 1200, "GATEWAY_REQUEST": 1200 }
}
}

Error (400 — unlicensed mode)

{
"error": {
"message": "Compliance reports require enterprise license",
"type": "invalid_request_error",
"code": "compliance_not_available"
}
}

GET /admin/v1/reports

Lists all generated compliance reports.

Request

GET /admin/v1/reports
GET /admin/v1/reports?tenant_id=acme-corp

Response (200 OK)

{
"object": "list",
"data": [
{
"id": "rpt-a1b2c3d4",
"type": "SOC2",
"tenant_id": null,
"from": "2026-02-01T00:00:00Z",
"to": "2026-03-01T00:00:00Z",
"generated_at": "2026-03-01T12:00:00Z",
"generated_by": "admin@acme.com"
}
]
}

GET /admin/v1/reports/{id}

Returns a specific compliance report with full section data (without PDF bytes).

Request

GET /admin/v1/reports/rpt-a1b2c3d4

Response (200 OK)

Same structure as the POST response above.

Error (404)

{
"error": {
"message": "Report not found: rpt-unknown",
"type": "not_found_error",
"code": "report_not_found"
}
}

GET /admin/v1/reports/{id}/pdf

Downloads the compliance report as a PDF file.

Request

GET /admin/v1/reports/rpt-a1b2c3d4/pdf

Response (200 OK)

Returns Content-Type: application/pdf with Content-Disposition: attachment; filename=soc2-report-rpt-a1b2c3d4.pdf.

If the stored report has no cached PDF bytes, the PDF is re-rendered on the fly.


DELETE /admin/v1/reports/{id}

Deletes a compliance report.

Request

DELETE /admin/v1/reports/rpt-a1b2c3d4

Response (204 No Content)

Empty body.


POST /admin/v1/pii/detokenize

Detokenizes PII tokens in a text string, replacing {{PII_<TYPE>_<hex>}} placeholders with the original values. Requires enterprise license.

Request

POST /admin/v1/pii/detokenize
Content-Type: application/json
{
"text": "Contact {{PII_EMAIL_a1b2c3d4}} regarding invoice {{PII_CREDIT_CARD_e5f6a7b8}}",
"tenant_id": "acme-corp"
}

Request Fields

FieldTypeRequiredDescription
textstringyesText containing PII tokens to detokenize
tenant_idstringyesTenant ID for token lookup

Response (200 OK)

{
"text": "Contact user@example.com regarding invoice 4111111111111111"
}

Tokens that cannot be resolved (expired or unknown) are left as-is in the output.

RBAC

Requires org-admin or policy-admin role.


DELETE /admin/v1/pii/tokens/{tenantId}

Purges all stored PII tokens for a tenant. This is an irreversible operation — detokenization of previously redacted content will no longer be possible.

Request

DELETE /admin/v1/pii/tokens/acme-corp

Response (200 OK)

{
"tenant_id": "acme-corp",
"tokens_removed": 1542
}

RBAC

Requires org-admin role.


POST /admin/v1/webhooks

Creates a new webhook subscription. Webhooks receive HTTP POST callbacks when specified gateway events occur.

Request

POST /admin/v1/webhooks
Content-Type: application/json
{
"name": "slack-alerts",
"url": "https://hooks.slack.com/services/...",
"secret": "whsec_my_signing_secret",
"event_types": ["POLICY_DENIAL", "PII_DETECTED"],
"tenant_id": "tenant-1",
"description": "Slack alerts for governance events"
}

Request Fields

FieldTypeRequiredDescription
namestringyesDisplay name for the webhook
urlstringyesHTTPS endpoint to receive webhook payloads
secretstringyesSigning secret for HMAC-SHA256 payload verification
event_typesarrayyesList of event types to subscribe to
tenant_idstringnoScope webhook to a specific tenant (null = all tenants)
descriptionstringnoHuman-readable description

Supported Event Types

Event TypeDescription
POLICY_DENIALA request was denied by a policy rule
PII_DETECTEDPII was detected in a request or response
IP_ACCESS_DENIEDA request was blocked by IP access control
BUDGET_CAP_SOFTSoft budget cap threshold reached
BUDGET_CAP_HARDHard budget cap exceeded
INJECTION_DETECTEDPrompt injection detected
AGENT_LOOP_DETECTEDAgentic loop detected
MCP_APPROVAL_REQUESTEDMCP tool invocation requires human approval
MCP_APPROVAL_TIMEOUTMCP approval request timed out

Response (201 Created)

The secret field is masked in all responses after creation.

{
"id": "wh-abc123",
"tenant_id": "tenant-1",
"name": "slack-alerts",
"description": "Slack alerts for governance events",
"url": "https://hooks.slack.com/services/...",
"secret": "whs***ret",
"event_types": ["POLICY_DENIAL", "PII_DETECTED"],
"status": "ACTIVE",
"version": 1,
"created_at": "2026-01-15T10:30:00Z",
"updated_at": "2026-01-15T10:30:00Z"
}

GET /admin/v1/webhooks

Lists all webhook subscriptions, optionally filtered by tenant.

Request

GET /admin/v1/webhooks
GET /admin/v1/webhooks?tenant_id=tenant-1

Query Parameters

ParameterTypeRequiredDescription
tenant_idstringnoFilter webhooks by tenant ID

Response (200 OK)

{
"object": "list",
"data": [
{
"id": "wh-abc123",
"tenant_id": "tenant-1",
"name": "slack-alerts",
"description": "Slack alerts for governance events",
"url": "https://hooks.slack.com/services/...",
"secret": "whs***ret",
"event_types": ["POLICY_DENIAL", "PII_DETECTED"],
"status": "ACTIVE",
"version": 1,
"created_at": "2026-01-15T10:30:00Z",
"updated_at": "2026-01-15T10:30:00Z"
}
]
}

GET /admin/v1/webhooks/{id}

Returns a specific webhook subscription.

Request

GET /admin/v1/webhooks/wh-abc123

Response (200 OK)

Same structure as the webhook object in the list response.

Response (404 Not Found)

{
"error": {
"message": "Webhook not found: wh-unknown",
"type": "not_found_error",
"code": "webhook_not_found"
}
}

PUT /admin/v1/webhooks/{id}

Updates a webhook subscription. Each update increments the version number.

Request

PUT /admin/v1/webhooks/wh-abc123
Content-Type: application/json
{
"name": "slack-alerts-v2",
"event_types": ["POLICY_DENIAL", "PII_DETECTED", "BUDGET_CAP_HARD"]
}

All fields are optional -- only provided fields are updated.

Response (200 OK)

Returns the updated webhook with incremented version.

Response (404 Not Found)

Returned when the webhook ID does not exist.


DELETE /admin/v1/webhooks/{id}

Deletes a webhook subscription.

Request

DELETE /admin/v1/webhooks/wh-abc123

Response (204 No Content)

Empty body.

Response (404 Not Found)

Returned when the webhook ID does not exist.


POST /admin/v1/webhooks/{id}/test

Sends a test event to the webhook endpoint to verify connectivity and configuration.

Request

POST /admin/v1/webhooks/wh-abc123/test
Content-Type: application/json
{
"event_type": "POLICY_DENIAL"
}

Request Fields

FieldTypeRequiredDescription
event_typestringyesEvent type to simulate for the test

Response (200 OK)

{
"success": true,
"http_status": 200,
"error_message": null
}

If the target endpoint is unreachable or returns a non-2xx status:

{
"success": false,
"http_status": 503,
"error_message": "Connection refused"
}

Response (404 Not Found)

Returned when the webhook ID does not exist.


GET /admin/v1/webhooks/{id}/deliveries

Returns delivery logs for a webhook, showing the history of event dispatches and their outcomes.

Request

GET /admin/v1/webhooks/wh-abc123/deliveries

Response (200 OK)

{
"object": "list",
"data": [
{
"id": "dl-xyz789",
"webhook_id": "wh-abc123",
"event_id": "evt-456",
"event_type": "POLICY_DENIED",
"status": "SUCCESS",
"http_status": 200,
"attempt_count": 1,
"error_message": null,
"created_at": "2026-01-15T10:31:00Z",
"last_attempt_at": "2026-01-15T10:31:00Z"
}
]
}

Delivery Status Values

StatusDescription
SUCCESSPayload delivered and target returned 2xx
FAILEDAll delivery attempts exhausted without success
PENDINGDelivery is queued or being retried

Response (404 Not Found)

Returned when the webhook ID does not exist.


POST /v1/webhooks/actions/{action}

Processes an approval action for MCP tool invocations requiring human approval. This endpoint is unauthenticated -- the token query parameter is self-authenticating via HMAC signature.

Request

POST /v1/webhooks/actions/approve?token=<signed_token>
POST /v1/webhooks/actions/deny?token=<signed_token>

Path Parameters

ParameterTypeRequiredDescription
actionstringyesAction to take: approve or deny

Query Parameters

ParameterTypeRequiredDescription
tokenstringyesHMAC-signed token encoding the approval context

Response (200 OK)

Returns a confirmation of the action taken.

Response (400 Bad Request)

Returned when the token is invalid, expired, or the action is not recognized.


Webhook Delivery Payload

When a subscribed event occurs, the gateway sends an HTTP POST to the webhook URL with the following structure.

Delivery Headers

HeaderDescription
Content-Typeapplication/json
X-Gateway-Signaturesha256=<hex_hmac> -- HMAC-SHA256 of the payload body using the webhook secret
X-Gateway-EventThe audit event type (e.g., POLICY_DENIAL)
X-Gateway-DeliveryUnique delivery ID for idempotency tracking

Payload Structure

{
"id": "<delivery-uuid>",
"webhook_id": "<webhook-id>",
"timestamp": "<ISO-8601>",
"type": "<event_type>",
"tenant_id": "<tenant-id-or-null>",
"data": { ... }
}

The data field contains the event-specific payload from the audit event.

For MCP_APPROVAL_REQUESTED events, the payload includes two additional fields:

{
"id": "<delivery-uuid>",
"webhook_id": "<webhook-id>",
"timestamp": "<ISO-8601>",
"type": "MCP_APPROVAL_REQUESTED",
"tenant_id": "<tenant-id>",
"data": { ... },
"approve_url": "https://gateway.example.com/v1/webhooks/actions/approve?token=<signed_token>",
"deny_url": "https://gateway.example.com/v1/webhooks/actions/deny?token=<signed_token>"
}

Verifying Webhook Signatures

To verify the authenticity of a webhook delivery, compute the HMAC-SHA256 of the raw request body using your webhook secret and compare it to the value in the X-Gateway-Signature header:

expected = "sha256=" + hex(hmac_sha256(webhook_secret, request_body))
actual = request.headers["X-Gateway-Signature"]
secure_compare(expected, actual)

Always use constant-time comparison to prevent timing attacks.


POST /admin/v1/mcp/servers

Registers a new MCP server in the gateway registry.

Request

POST /admin/v1/mcp/servers
Content-Type: application/json
{
"server_id": "code-search",
"tenant_id": "tenant-1",
"transport": "HTTP_SSE",
"url": "https://mcp.example.com/sse",
"credential_ref": "vault://secret/mcp/code-search",
"tags": ["search", "code"]
}

Request Fields

FieldTypeRequiredDescription
server_idstringYesHuman-readable slug, unique per tenant
tenant_idstringYesTenant that owns this server
transportstringYesTransport protocol: HTTP_SSE
urlstringYesMCP server endpoint URL
credential_refstringNoVault path for server credentials
tagsstring[]NoTags for categorization

Response (201 Created)

Returns the created MCP server with a Location header pointing to the new resource.

{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"server_id": "code-search",
"tenant_id": "tenant-1",
"transport": "HTTP_SSE",
"url": "https://mcp.example.com/sse",
"credential_ref": "vault://secret/mcp/code-search",
"tags": ["search", "code"],
"status": "ACTIVE",
"tool_catalog": [],
"version": 1,
"created_at": "2026-01-15T10:00:00Z",
"updated_at": "2026-01-15T10:00:00Z"
}

Response (409 Conflict)

Returned when server_id already exists for the given tenant.


GET /admin/v1/mcp/servers

Lists all registered MCP servers. Supports optional ?tenant_id= query parameter to filter by tenant.

Request

GET /admin/v1/mcp/servers
GET /admin/v1/mcp/servers?tenant_id=tenant-1

Response (200 OK)

{
"object": "list",
"data": [
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"server_id": "code-search",
"tenant_id": "tenant-1",
"transport": "HTTP_SSE",
"url": "https://mcp.example.com/sse",
"status": "ACTIVE",
"tool_catalog": [],
"version": 1,
"created_at": "2026-01-15T10:00:00Z",
"updated_at": "2026-01-15T10:00:00Z"
}
]
}

GET /admin/v1/mcp/servers/{id}

Returns a specific MCP server by its internal ID.

Request

GET /admin/v1/mcp/servers/a1b2c3d4-e5f6-7890-abcd-ef1234567890

Response (200 OK)

Same format as the MCP server object in the list response.

Response (404 Not Found)

Returned when the MCP server ID does not exist.


PUT /admin/v1/mcp/servers/{id}

Updates an MCP server. Only provided fields are updated. Each update increments the version number.

Request

PUT /admin/v1/mcp/servers/a1b2c3d4-e5f6-7890-abcd-ef1234567890
Content-Type: application/json
{
"url": "https://mcp.new-url.com/sse",
"status": "SUSPENDED"
}

Response (200 OK)

Returns the updated MCP server with incremented version.

Response (404 Not Found)

Returned when the MCP server ID does not exist.


DELETE /admin/v1/mcp/servers/{id}

Deletes an MCP server registration.

Request

DELETE /admin/v1/mcp/servers/a1b2c3d4-e5f6-7890-abcd-ef1234567890

Response (204 No Content)

Response (404 Not Found)

Returned when the MCP server ID does not exist.


GET /admin/v1/mcp/servers/{id}/health

Checks live connectivity to the MCP server.

Request

GET /admin/v1/mcp/servers/a1b2c3d4-e5f6-7890-abcd-ef1234567890/health

Response (200 OK)

{
"reachable": true,
"http_status": 200,
"latency_ms": 42,
"error_message": null,
"checked_at": "2026-01-15T10:05:00Z"
}
note

Without a license (without enterprise license), health checks always return reachable: false with a message indicating that the feature requires an enterprise license.


POST /admin/v1/mcp/servers/{id}/tools/sync

Fetches the tool catalog from the MCP server and caches it locally.

Request

POST /admin/v1/mcp/servers/a1b2c3d4-e5f6-7890-abcd-ef1234567890/tools/sync

Response (200 OK)

{
"tools_count": 2,
"synced_at": "2026-01-15T10:05:00Z",
"tools": [
{
"name": "search",
"description": "Search code repositories"
},
{
"name": "read_file",
"description": "Read a file from the repository"
}
]
}

Response (400 Bad Request)

Without a license, tool sync returns a 400 with error code mcp_not_available.


POST /admin/v1/mcp/servers/{id}/credentials/rotate

Rotates the credential reference for an MCP server. Evicts the old credential from cache, optionally sets a new credential reference, and validates the new credential is resolvable. Requires enterprise license and org-admin role.

Request

POST /admin/v1/mcp/servers/a1b2c3d4-e5f6-7890-abcd-ef1234567890/credentials/rotate
Content-Type: application/json

{
"new_credential_ref": "vault://secret/mcp/new-path"
}

The new_credential_ref field is optional. If omitted, the existing credential reference is kept but the cache is evicted (forcing a fresh fetch from vault on the next request).

Response (200 OK)

{
"success": true,
"message": "Credential rotated successfully",
"old_credential_ref": "vault://secret/mcp/old-path",
"new_credential_ref": "vault://secret/mcp/new-path"
}

Response (400 Bad Request)

Without a license, returns 400 with error code mcp_not_available.


POST /admin/v1/mcp/servers/{id}/credentials/invalidate

Immediately evicts the cached credential for an MCP server. The next request will trigger a fresh fetch from the vault. Requires enterprise license and org-admin role.

Request

POST /admin/v1/mcp/servers/a1b2c3d4-e5f6-7890-abcd-ef1234567890/credentials/invalidate

Response (204 No Content)

No response body.

Response (404 Not Found)

Returns 404 with error code mcp_server_not_found if the server ID does not exist.


MCP Proxy Endpoints (port 8070)

The MCP Proxy is a standalone Spring Boot application (mcp-proxy-server) that runs on port 8070. It proxies requests from AI agents to registered MCP servers, injecting credentials from vault and applying a governance filter chain.

Enterprise license required

All MCP proxy endpoints require a valid signed JWT license key via GATEWAY_ENTERPRISE_LICENSE_KEY.

POST /mcp/{serverId}/tools/call

Invokes a tool on the specified MCP server. The proxy resolves the server from the registry, injects credentials, and forwards the request.

Request

POST /mcp/code-search/tools/call
Authorization: Bearer gw_<api-key>
Content-Type: application/json
X-Trace-Id: abc123
X-Session-Id: session-456
{
"name": "search_code",
"arguments": {
"query": "authentication handler",
"language": "java"
}
}

Headers

HeaderRequiredDescription
AuthorizationYesBearer token with a valid API key (same keys as LLM Gateway)
X-Trace-IdNoTrace ID for distributed tracing; auto-generated if absent
X-Session-IdNoSession ID for agent chain tracing; stored in audit context

Response (200 OK)

Returns the upstream MCP server response body as-is:

{
"content": [
{
"type": "text",
"text": "Found 3 matching files..."
}
]
}

The response includes X-Trace-Id header echoed back.

Error Responses

StatusCodeDescription
401mcp_auth_requiredMissing Bearer token
401mcp_auth_invalidInvalid API key
401mcp_auth_revokedAPI key has been revoked
403mcp_policy_deniedRequest denied by policy (server/tool/arg rule)
404mcp_server_not_foundServer ID not found for this tenant
503mcp_server_unavailableServer is suspended or disabled
502mcp_upstream_errorUpstream MCP server returned an error

POST /mcp/{serverId}/tools/list

Lists tools available on the specified MCP server by forwarding to the server's tools/list endpoint.

Request

POST /mcp/code-search/tools/list
Authorization: Bearer gw_<api-key>
Content-Type: application/json
{}

Response (200 OK)

Returns the upstream MCP server's tool listing:

{
"tools": [
{
"name": "search_code",
"description": "Search code repositories",
"inputSchema": { "type": "object", "properties": { "query": { "type": "string" } } }
}
]
}

POST /mcp/{serverId}/resources/{path}

Accesses a resource on the specified MCP server.

Request

POST /mcp/file-server/resources/read
Authorization: Bearer gw_<api-key>
Content-Type: application/json
{
"uri": "file:///data/report.csv"
}

POST /mcp/{serverId}/prompts/{path}

Accesses a prompt template on the specified MCP server.

Request

POST /mcp/prompt-server/prompts/summarize
Authorization: Bearer gw_<api-key>
Content-Type: application/json
{
"arguments": {
"text": "Long document text..."
}
}

MCP Proxy Error Envelope

All MCP proxy errors use a unified error envelope with the mcp_error type:

{
"error": {
"message": "MCP server not found: code-search",
"type": "mcp_error",
"code": "mcp_server_not_found",
"trace_id": "abc123"
}
}

All error codes are lowercased and prefixed with mcp_.


MCP Proxy Metrics

The MCP proxy exposes Prometheus metrics at GET /actuator/prometheus:

MetricTypeLabels
mcp_requests_totalCountertenant, server_id, operation, status
mcp_latency_secondsTimer (P50/P95/P99)tenant, server_id, operation
mcp_errors_totalCounterserver_id, error_code

Credential Encryption

Provider API keys can be encrypted at rest using AES-256-GCM. To use encrypted credentials:

  1. Set the master password via environment variable: GATEWAY_ENCRYPTION_MASTER_PASSWORD=<password>
  2. Encrypt your API key using the AesEncryptor utility
  3. Set the encrypted value with an ENC: prefix: OPENAI_API_KEY=ENC:<base64-ciphertext>

The gateway transparently decrypts ENC:-prefixed values at runtime using the configured master password. Enterprise deployments can replace the built-in SecretProvider with vault-backed implementations (HashiCorp Vault, AWS Secrets Manager, etc.) for zero-plaintext credential handling.