API Reference
Base URL: http://localhost:8080
All endpoints return Content-Type: application/json and include an X-Trace-ID response header.
OpenAPI Specification
Machine-readable OpenAPI 3.x specs are generated at startup by SpringDoc.
| Spec | URL | Contents |
|---|---|---|
| Public (JSON) | GET /v3/api-docs/public | All /v1/**, /admin/v1/**, and /status endpoints |
| Public (YAML) | GET /v3/api-docs/public.yaml | Same as above in YAML format |
| Internal (JSON) | GET /v3/api-docs/internal | Internal /internal/** endpoints only |
| Internal (YAML) | GET /v3/api-docs/internal.yaml | Same as above in YAML format |
Use the spec to generate clients (OpenAPI Generator), mock servers, or detect breaking changes in CI.
Common Headers
| Header | Direction | Description |
|---|---|---|
X-Trace-ID | Request | Optional. If provided, echoed back in the response. |
X-Trace-ID | Response | Always present. Inbound value or generated 32-character hex UUID. |
X-Cache | Response | HIT or MISS (only on non-streaming requests when cache is enabled) |
X-Cache-Control | Request | no-cache to bypass the response cache |
Authorization | Request | Bearer <api-key> for rate limiting key identification |
X-Budget-Remaining-Tokens | Response | Estimated tokens remaining in the active budget period. Omitted when no budget is configured. (Enterprise) |
X-Budget-Remaining-Pct | Response | Percentage of budget remaining (0–100). Omitted when no budget is configured. (Enterprise) |
X-Budget-Warning | Response | true when a WARN_AGENT policy rule has fired (budget utilization exceeded threshold). Omitted otherwise. (Enterprise) |
X-Context-Window-Warning | Response | true when estimated tokens exceed the context window warning threshold. (Enterprise) |
X-Context-Window-Utilization | Response | Context window utilization percentage (e.g., 87%). Present when warning threshold breached. (Enterprise) |
X-Gateway-Strict-Downgraded | Response | true when json_schema strict mode is downgraded (Anthropic/Bedrock). |
POST /v1/chat/completions
OpenAI-compatible chat completions. Supports all configured providers.
Request
POST /v1/chat/completions
Content-Type: application/json
{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
"temperature": 0.7,
"max_tokens": 512
}
Request Fields
| Field | Type | Required | Description |
|---|---|---|---|
model | string | yes | Model ID. Must match a configured provider's prefix. |
messages | array | yes | Non-empty list of {role, content} objects. |
temperature | number | no | Sampling temperature (0-2). |
max_tokens | integer | no | Maximum tokens to generate. |
stream | boolean | no | If true, returns Server-Sent Events stream. |
response_format | object | no | Response format constraint. See Structured Outputs. |
tools | array | no | Tool definitions (passed through to provider). |
tool_choice | string/object | no | Tool choice control. |
top_p | number | no | Nucleus sampling parameter. |
frequency_penalty | number | no | Frequency penalty (-2.0 to 2.0). |
presence_penalty | number | no | Presence penalty (-2.0 to 2.0). |
n | integer | no | Number of completions to generate. |
user | string | no | End-user identifier. |
Message Object
| Field | Type | Required | Description |
|---|---|---|---|
role | string | yes | system, user, or assistant |
content | string or array | yes | Text string or array of content parts (for multimodal) |
tool_calls | array | no | Tool calls from assistant |
tool_call_id | string | no | ID referencing a tool call |
Response (200 OK)
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1771667816,
"model": "gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 9,
"total_tokens": 33
}
}
Streaming Response
When "stream": true, the response is sent as Server-Sent Events:
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","model":"gpt-4o","choices":[{"index":0,"delta":{"content":"The"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","model":"gpt-4o","choices":[{"index":0,"delta":{"content":" capital"},"finish_reason":null}]}
data: [DONE]
POST /v1/embeddings
Returns embedding vectors for the given input. Requires an OpenAI-compatible embedding provider (OPENAI_API_KEY).
Request
POST /v1/embeddings
Content-Type: application/json
{
"model": "text-embedding-ada-002",
"input": "The quick brown fox"
}
Request Fields
| Field | Type | Required | Description |
|---|---|---|---|
model | string | yes | Embedding model ID (must start with text-embedding) |
input | string or array | yes | Text to embed. String or array of strings. |
Response (200 OK)
{
"object": "list",
"model": "text-embedding-ada-002",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0023, -0.0091, 0.0152, ...]
}
],
"usage": {
"prompt_tokens": 5,
"total_tokens": 5
}
}
GET /v1/models
Lists all available models from registered providers, each with a capabilities object.
Request
GET /v1/models
Response (200 OK)
{
"object": "list",
"data": [
{
"id": "gpt-4o",
"object": "model",
"created": 1704067200,
"owned_by": "openai",
"capabilities": {
"supports_streaming": true,
"supports_vision": true,
"supports_tool_calls": true,
"supports_structured_outputs": true,
"supports_json_mode": true,
"max_context_tokens": 128000
}
},
{
"id": "claude-sonnet-4-5",
"object": "model",
"created": 1704067200,
"owned_by": "anthropic",
"capabilities": {
"supports_streaming": true,
"supports_vision": true,
"supports_tool_calls": true,
"supports_structured_outputs": true,
"supports_json_mode": true,
"max_context_tokens": 200000
}
}
]
}
Capability Fields
| Field | Type | Description |
|---|---|---|
supports_streaming | boolean | Provider supports SSE streaming |
supports_vision | boolean | Provider supports image/vision inputs |
supports_tool_calls | boolean | Provider supports function/tool calling |
supports_structured_outputs | boolean | Provider supports structured output schemas |
supports_json_mode | boolean | Provider supports JSON-mode responses |
max_context_tokens | integer | Maximum context window size in tokens |
POST /v1/completions (Legacy)
Legacy text-completion endpoint. Internally converts the prompt into a chat message and routes through the same pipeline.
Request
POST /v1/completions
Content-Type: application/json
{
"model": "gpt-3.5-turbo-instruct",
"prompt": "Say this is a test",
"max_tokens": 64
}
Request Fields
| Field | Type | Required | Description |
|---|---|---|---|
model | string | yes | Model ID |
prompt | string or array | yes | Text prompt |
max_tokens | integer | no | Maximum tokens to generate |
temperature | number | no | Sampling temperature |
Response (200 OK)
{
"id": "cmpl-xyz789",
"object": "text_completion",
"created": 1771667816,
"model": "gpt-3.5-turbo-instruct",
"choices": [
{
"text": " This is indeed a test.",
"index": 0,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 6,
"completion_tokens": 7,
"total_tokens": 13
}
}
GET /
Redirects to /try.
GET /
302 Found
Location: /try
GET /try
Built-in browser-based chat test panel. Provides a model selector, message input, SSE streaming, response metadata (model, provider, latency, token count), and a "Copy as curl" button.
GET /try
Returns 200 OK with Content-Type: text/html.
GET /status
Returns gateway status as JSON with provider, route, and configuration information.
GET /status
Accept: application/json
{
"status": "running",
"mode": "standalone",
"version": "dev",
"uptimeSeconds": 120,
"configVersion": 0,
"providers": [
{
"name": "mock",
"type": "MockProvider",
"health": "HEALTHY",
"capabilities": {
"streaming": true,
"vision": false,
"toolCalls": false,
"structuredOutputs": true,
"jsonMode": true,
"maxContextTokens": 128000
}
}
],
"routes": [],
"region": {
"id": null,
"name": null,
"regionAware": false
},
"rateLimits": {
"enabled": false,
"globalRequestsPerSecond": 100,
"defaultPerKeyRequestsPerSecond": 10,
"defaultPerKeyTokensPerMinute": 100000
},
"warnings": ["No routes configured (using default model-prefix routing)"]
}
GET /actuator/gateway-status
Custom Actuator endpoint returning the same structured status as /status. Exposed via Spring Boot Actuator at /actuator/gateway-status.
GET /actuator/gateway-status
Returns the same JSON structure as GET /status.
POST /admin/v1/tenants
Creates a new tenant.
Request
POST /admin/v1/tenants
Content-Type: application/json
{
"name": "Acme Corp",
"status": "active",
"region": "us-east-1",
"metadata": {"plan": "enterprise"}
}
Request Fields
| Field | Type | Required | Description |
|---|---|---|---|
name | string | yes | Tenant display name |
status | string | no | active (default) or suspended |
region | string | no | Deployment region (e.g., us-east-1) |
metadata | object | no | Arbitrary key-value metadata |
Response (201 Created)
Includes a Location header with the new tenant's URL.
Location: /admin/v1/tenants/{id}
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"name": "Acme Corp",
"status": "active",
"region": "us-east-1",
"metadata": {"plan": "enterprise"},
"created_at": "2026-01-15T10:30:00Z",
"updated_at": "2026-01-15T10:30:00Z"
}
GET /admin/v1/tenants
Lists all tenants.
Request
GET /admin/v1/tenants
Response (200 OK)
{
"object": "list",
"data": [
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"name": "Acme Corp",
"status": "active",
"region": "us-east-1",
"metadata": {"plan": "enterprise"},
"created_at": "2026-01-15T10:30:00Z",
"updated_at": "2026-01-15T10:30:00Z"
}
]
}
GET /admin/v1/tenants/{id}
Retrieves a single tenant by ID.
Request
GET /admin/v1/tenants/a1b2c3d4-e5f6-7890-abcd-ef1234567890
Response (200 OK)
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"name": "Acme Corp",
"status": "active",
"region": "us-east-1",
"metadata": {"plan": "enterprise"},
"created_at": "2026-01-15T10:30:00Z",
"updated_at": "2026-01-15T10:30:00Z"
}
Response (404 Not Found)
Returned when the tenant ID does not exist.
PUT /admin/v1/tenants/{id}
Updates an existing tenant. Only provided fields are updated.
Request
PUT /admin/v1/tenants/a1b2c3d4-e5f6-7890-abcd-ef1234567890
Content-Type: application/json
{
"name": "Acme Inc",
"status": "suspended"
}
Request Fields
| Field | Type | Required | Description |
|---|---|---|---|
name | string | no | Updated tenant name |
status | string | no | active or suspended |
region | string | no | Updated region |
metadata | object | no | Replacement metadata |
Response (200 OK)
Returns the updated tenant with a refreshed updated_at timestamp.
Response (404 Not Found)
Returned when the tenant ID does not exist.
DELETE /admin/v1/tenants/{id}
Deletes a tenant.
Request
DELETE /admin/v1/tenants/a1b2c3d4-e5f6-7890-abcd-ef1234567890
Response (204 No Content)
Returned on successful deletion. No response body.
Response (404 Not Found)
Returned when the tenant ID does not exist.
POST /admin/v1/tenants/{tenantId}/keys
Creates a new API key for the specified tenant.
Request
POST /admin/v1/tenants/a1b2c3d4-e5f6-7890-abcd-ef1234567890/keys
Content-Type: application/json
{
"name": "production-key",
"scopes": ["completions:write"],
"expires_at": "2027-01-01T00:00:00Z"
}
Request Fields
| Field | Type | Required | Description |
|---|---|---|---|
name | string | yes | Display name for the API key |
scopes | array | no | Permission scopes. Defaults to ["completions:write"] |
expires_at | string | no | ISO-8601 expiration timestamp. null for non-expiring |
Response (201 Created)
Includes a Location header with the new key's URL. The key field contains the plaintext API key — this is the only time it is returned.
Location: /admin/v1/tenants/{tenantId}/keys/{id}
{
"id": "f7e6d5c4-b3a2-1098-7654-321fedcba098",
"key": "gw_a3b4c5d6e7f8091011121314151617181920212223",
"key_prefix": "gw_a3b4c5d6e",
"name": "production-key",
"scopes": ["completions:write"],
"status": "active",
"created_at": "2026-01-15T10:30:00Z"
}
Response (404 Not Found)
Returned when the tenant ID does not exist.
GET /admin/v1/tenants/{tenantId}/keys
Lists all API keys for a tenant. Plaintext keys are never returned.
Request
GET /admin/v1/tenants/a1b2c3d4-e5f6-7890-abcd-ef1234567890/keys
Response (200 OK)
{
"object": "list",
"data": [
{
"id": "f7e6d5c4-b3a2-1098-7654-321fedcba098",
"name": "production-key",
"key_prefix": "gw_a3b4c5d6e",
"scopes": ["completions:write"],
"status": "active",
"expires_at": "2027-01-01T00:00:00Z",
"created_at": "2026-01-15T10:30:00Z",
"updated_at": "2026-01-15T10:30:00Z"
}
]
}
GET /admin/v1/tenants/{tenantId}/keys/{keyId}
Retrieves a single API key by ID.
Request
GET /admin/v1/tenants/a1b2c3d4-e5f6-7890-abcd-ef1234567890/keys/f7e6d5c4-b3a2-1098-7654-321fedcba098
Response (200 OK)
{
"id": "f7e6d5c4-b3a2-1098-7654-321fedcba098",
"name": "production-key",
"key_prefix": "gw_a3b4c5d6e",
"scopes": ["completions:write"],
"status": "active",
"expires_at": "2027-01-01T00:00:00Z",
"created_at": "2026-01-15T10:30:00Z",
"updated_at": "2026-01-15T10:30:00Z"
}
Response (404 Not Found)
Returned when the tenant or key ID does not exist, or the key belongs to a different tenant.
DELETE /admin/v1/tenants/{tenantId}/keys/{keyId}
Revokes an API key (soft delete). The key status is set to revoked and it can no longer be used for authentication.
Request
DELETE /admin/v1/tenants/a1b2c3d4-e5f6-7890-abcd-ef1234567890/keys/f7e6d5c4-b3a2-1098-7654-321fedcba098
Response (204 No Content)
Returned on successful revocation. No response body.
Response (404 Not Found)
Returned when the tenant or key ID does not exist.
POST /admin/v1/tenants/{tenantId}/keys/{keyId}/rotate
Rotates an API key. The old key is marked as rotated (still valid during a grace period) and a new key is generated with the same name and scopes.
Request
POST /admin/v1/tenants/a1b2c3d4-e5f6-7890-abcd-ef1234567890/keys/f7e6d5c4-b3a2-1098-7654-321fedcba098/rotate
Response (201 Created)
Returns the new key with plaintext (same as create). The old key's status changes to rotated.
{
"id": "new-key-uuid",
"key": "gw_newplaintextkey...",
"key_prefix": "gw_newplaint",
"name": "production-key",
"scopes": ["completions:write"],
"status": "active",
"created_at": "2026-02-01T12:00:00Z"
}
Response (404 Not Found)
Returned when the tenant or key ID does not exist.
GET /admin/v1/providers/{id}/capabilities
Returns the capabilities of a specific registered provider. Returns 404 if the provider is not configured.
Request
GET /admin/v1/providers/openai/capabilities
Response (200 OK)
{
"supports_streaming": true,
"supports_vision": true,
"supports_tool_calls": true,
"supports_structured_outputs": true,
"supports_json_mode": true,
"max_context_tokens": 128000
}
Response (404 Not Found)
Returned when the provider ID is unknown or not registered.
Valid provider IDs: openai, anthropic, gemini, bedrock, ollama, mock
POST /admin/v1/routes
Creates a new versioned route configuration. Changes propagate live without restart.
Request
POST /admin/v1/routes
Content-Type: application/json
{
"model_pattern": "gpt*",
"strategy": "model-prefix",
"providers": [
{"provider": "openai", "weight": 1}
],
"pinned_model_version": null,
"latency_sla_ms": 0
}
Request Fields
| Field | Type | Required | Description |
|---|---|---|---|
model_pattern | string | yes | Glob pattern to match model names (e.g., gpt*) |
strategy | string | no | Routing strategy: model-prefix, round-robin, weighted. Default: model-prefix |
providers | array | yes | List of provider entries with provider, weight, model_override, region |
pinned_model_version | string | no | Pin all requests on this route to a specific model version |
latency_sla_ms | long | no | Latency SLA in milliseconds for cost-aware routing. Default: 0 (disabled). When set, cost-aware routing selects the cheapest provider meeting this latency target |
Response (201 Created)
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"model_pattern": "gpt*",
"strategy": "model-prefix",
"providers": [{"provider": "openai", "weight": 1}],
"latency_sla_ms": 0,
"version": 1,
"created_at": "2026-01-01T00:00:00Z",
"updated_at": "2026-01-01T00:00:00Z"
}
GET /admin/v1/routes
Lists all route configurations.
Request
GET /admin/v1/routes
Response (200 OK)
{
"object": "list",
"data": [
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"model_pattern": "gpt*",
"strategy": "model-prefix",
"providers": [{"provider": "openai", "weight": 1}],
"latency_sla_ms": 0,
"version": 1,
"created_at": "2026-01-01T00:00:00Z",
"updated_at": "2026-01-01T00:00:00Z"
}
]
}
GET /admin/v1/routes/{id}
Returns a specific route configuration.
Request
GET /admin/v1/routes/a1b2c3d4-e5f6-7890-abcd-ef1234567890
Response (200 OK)
Same format as the route object in the list response.
Response (404 Not Found)
Returned when the route ID does not exist.
PUT /admin/v1/routes/{id}
Updates a route configuration. Each update increments the version number. Changes propagate live.
Request
PUT /admin/v1/routes/a1b2c3d4-e5f6-7890-abcd-ef1234567890
Content-Type: application/json
{
"model_pattern": "claude*",
"strategy": "round-robin",
"providers": [
{"provider": "anthropic", "weight": 1},
{"provider": "bedrock", "weight": 1}
]
}
All fields are optional — only provided fields are updated.
Response (200 OK)
Returns the updated route with incremented version.
DELETE /admin/v1/routes/{id}
Deletes a route and its version history. Changes propagate live.
Request
DELETE /admin/v1/routes/a1b2c3d4-e5f6-7890-abcd-ef1234567890
Response (204 No Content)
Response (404 Not Found)
Returned when the route ID does not exist.
GET /admin/v1/routes/{id}/versions
Returns the version history for a route (last 10 versions minimum).
Request
GET /admin/v1/routes/a1b2c3d4-e5f6-7890-abcd-ef1234567890/versions
Response (200 OK)
{
"object": "list",
"data": [
{
"route_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"version": 2,
"model_pattern": "claude*",
"strategy": "round-robin",
"providers": [{"provider": "anthropic", "weight": 1}],
"created_at": "2026-01-02T00:00:00Z"
},
{
"route_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"version": 1,
"model_pattern": "gpt*",
"strategy": "model-prefix",
"providers": [{"provider": "openai", "weight": 1}],
"created_at": "2026-01-01T00:00:00Z"
}
]
}
POST /admin/v1/routes/{id}/rollback
Rolls back a route to a specified version. Creates a new version with the rolled-back configuration. Changes propagate live.
Request
POST /admin/v1/routes/a1b2c3d4-e5f6-7890-abcd-ef1234567890/rollback
Content-Type: application/json
{
"version": 1
}
Response (200 OK)
Returns the route with rolled-back configuration and incremented version.
Response (404 Not Found)
Returned when the route ID or the specified version does not exist.
GET /admin/v1/latency
Returns current EWMA latency for all tracked provider+model pairs. Available when the enterprise LatencyTracker is active; returns an empty list without license.
Request
GET /admin/v1/latency
Response (200 OK)
{
"object": "list",
"data": [
{
"provider": "openai",
"model": "gpt-4o",
"ewmaLatencyMs": 120.5,
"rawLatencyMs": 115.0,
"sampleCount": 42,
"lastUpdated": "2026-01-01T00:00:00Z"
},
{
"provider": "anthropic",
"model": "claude-sonnet-4-5-20250514",
"ewmaLatencyMs": 95.3,
"rawLatencyMs": 88.0,
"sampleCount": 128,
"lastUpdated": "2026-01-01T00:01:00Z"
}
]
}
| Field | Type | Description |
|---|---|---|
provider | string | Provider name |
model | string | Model name |
ewmaLatencyMs | double | Exponentially weighted moving average latency in milliseconds |
rawLatencyMs | double | Most recent raw latency sample in milliseconds |
sampleCount | long | Total number of latency samples recorded |
lastUpdated | string | ISO-8601 timestamp of the last recorded sample |
GET /admin/v1/priority/stats
Returns current priority admission control stats including per-tier concurrency and threshold configuration.
Request
GET /admin/v1/priority/stats
Response (200 OK)
{
"object": "priority_stats",
"data": {
"currentConcurrent": 17,
"maxConcurrent": 1000,
"perTierConcurrent": {
"premium": 5,
"standard": 10,
"bulk": 2
},
"perTierThresholdPct": {
"premium": 100,
"standard": 80,
"bulk": 50
}
}
}
| Field | Type | Description |
|---|---|---|
currentConcurrent | int | Total in-flight requests across all tiers |
maxConcurrent | int | Configured maximum concurrent requests (0 without license) |
perTierConcurrent | object | Current in-flight count per priority tier |
perTierThresholdPct | object | Configured throttle threshold percentage per tier |
POST /admin/v1/policies
Creates a new policy in DRAFT status.
Request
POST /admin/v1/policies
Content-Type: application/json
{
"name": "block-legacy-models",
"tenant_id": "acme-corp",
"description": "Block legacy models for production tenants",
"dsl": "version: \"1\"\nrules:\n - id: block-legacy\n conditions:\n model:\n denylist: [gpt-3.5-turbo]\n action: DENY\n deny_message: \"Legacy model not allowed\""
}
Response (201 Created)
{
"id": "pol-abc123",
"tenant_id": "acme-corp",
"name": "block-legacy-models",
"description": "Block legacy models for production tenants",
"dsl": "...",
"status": "DRAFT",
"version": 1,
"created_at": "2026-01-15T10:00:00Z",
"updated_at": "2026-01-15T10:00:00Z"
}
GET /admin/v1/policies
Lists all policies. Optional ?tenant_id= filter.
GET /admin/v1/policies/{id}
Returns a single policy by ID. Returns 404 if not found.
PUT /admin/v1/policies/{id}
Updates a policy. Each update increments the version. Only provided fields are changed.
DELETE /admin/v1/policies/{id}
Deletes a policy and its version history. Returns 204.
POST /admin/v1/policies/{id}/status
Changes policy status. Valid values: DRAFT, ACTIVE, SHADOW, ARCHIVED. When transitioning to ACTIVE, conflict detection runs against other active policies and returns warnings if conflicts are found.
Request
POST /admin/v1/policies/pol-abc123/status
Content-Type: application/json
{"status": "SHADOW"}
GET /admin/v1/policies/{id}/versions
Returns the version history for a policy (last 10 versions minimum).
POST /admin/v1/policies/{id}/rollback
Rolls back a policy to a specified version. Creates a new version with the restored configuration.
Request
{"version": 1}
POST /admin/v1/policies/dry-run
Simulates a policy evaluation. Validates YAML syntax and evaluates through the PolicyEngine.
Request
{
"dsl": "version: \"1\"\nrules:\n - id: test\n conditions:\n model:\n denylist: [gpt-3.5-turbo]\n action: DENY",
"tenant_id": "acme-corp",
"model": "gpt-3.5-turbo"
}
Response (200 OK)
{
"status": "DENIED",
"reason": "Request blocked by policy rule: test",
"rule_id": "test",
"evaluation_time_ms": 2
}
POST /admin/v1/policies/{id}/promote
Promotes a SHADOW policy to ACTIVE status. Runs conflict detection, cleans up shadow events, and publishes a policy mutation event.
Requires org-admin or policy-admin role.
Request
POST /admin/v1/policies/pol-abc123/promote
Response (200 OK)
{
"id": "pol-abc123",
"name": "block-legacy-models",
"status": "ACTIVE",
"version": 3,
"warnings": ["Potential conflict with policy pol-xyz: overlapping model denylist"]
}
Response (400 Bad Request)
Returned when the policy is not in SHADOW status.
GET /admin/v1/policies/{id}/shadow/stats
Returns shadow mode divergence statistics for a policy.
Requires org-admin, policy-admin, developer, or viewer role.
Query Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
period | No | 24h | Time window: 1h, 24h, or 7d |
Response (200 OK)
{
"policy_id": "pol-abc123",
"period": "24h",
"total_evaluations": 1250,
"divergent_count": 47,
"divergence_rate": 0.0376,
"by_rule": [
{
"rule_id": "block-legacy",
"count": 35,
"divergence_type": "shadow_deny_active_allow"
},
{
"rule_id": "max-tokens-limit",
"count": 12,
"divergence_type": "shadow_deny_active_allow"
}
]
}
GET /admin/v1/policies/{id}/shadow/events
Returns recent shadow policy events, optionally filtered to diverged-only.
Requires org-admin, policy-admin, developer, or viewer role.
Query Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
diverged | No | true | Filter to diverged events only |
limit | No | 50 | Maximum events to return |
Response (200 OK)
{
"object": "list",
"data": [
{
"id": "evt-uuid",
"policy_id": "pol-abc123",
"rule_id": "block-legacy",
"request_id": "trace-xyz",
"tenant_id": "acme-corp",
"model": "gpt-3.5-turbo",
"active_decision": "ALLOW",
"shadow_decision": "DENY",
"diverged": true,
"divergence_type": "shadow_deny_active_allow",
"timestamp": "2026-02-28T14:30:00Z"
}
]
}
Internal Config Distribution API
Internal endpoints for data plane pods to pull configuration from the control plane without direct database access. Protected by the X-Internal-Secret header when gateway.internal.secret is configured.
GET /internal/v1/config/version
Returns the current config version (monotonic counter).
Request
GET /internal/v1/config/version
X-Internal-Secret: <secret>
Response (200 OK)
{
"version": 42
}
GET /internal/v1/config/full
Returns a full config snapshot (tenants, routes, API keys without key hashes).
Request
GET /internal/v1/config/full?since_version=0
X-Internal-Secret: <secret>
| Parameter | Type | Required | Description |
|---|---|---|---|
since_version | integer | no | Advisory version hint (full snapshot always returned in core). Default: 0 |
Response (200 OK)
{
"version": 42,
"tenants": [
{
"id": "a1b2c3d4-...",
"name": "Acme Corp",
"status": "active",
"region": "us-east-1",
"metadata": {"plan": "enterprise"},
"created_at": "2026-01-15T10:30:00Z",
"updated_at": "2026-01-15T10:30:00Z"
}
],
"routes": [
{
"id": "r1b2c3d4-...",
"model_pattern": "gpt*",
"strategy": "model-prefix",
"providers": [{"provider": "openai", "weight": 1}],
"version": 1,
"created_at": "2026-01-01T00:00:00Z",
"updated_at": "2026-01-01T00:00:00Z"
}
],
"api_keys": [
{
"id": "k1b2c3d4-...",
"tenant_id": "a1b2c3d4-...",
"name": "production-key",
"key_prefix": "gw_a3b4c5d6e",
"scopes": ["completions:write"],
"status": "active",
"expires_at": "2027-01-01T00:00:00Z",
"created_at": "2026-01-15T10:30:00Z",
"updated_at": "2026-01-15T10:30:00Z"
}
]
}
Note: key_hash is deliberately omitted from API key entries for security.
Response (401 Unauthorized)
Returned when gateway.internal.secret is configured and the request is missing or has an invalid X-Internal-Secret header.
POST /admin/v1/pricing
Creates a new model pricing entry. Pricing entries define cost-per-million-tokens for input and output. Model names support glob patterns (e.g. gpt-4o* matches gpt-4o, gpt-4o-mini).
Request
POST /admin/v1/pricing
Content-Type: application/json
{
"model": "gpt-4o",
"provider": "openai",
"inputPricePerMillion": 2.50,
"outputPricePerMillion": 10.00
}
Request Fields
| Field | Type | Required | Description |
|---|---|---|---|
model | string | yes | Model name or glob pattern (e.g. gpt-4o*) |
provider | string | yes | Provider name (e.g. openai, anthropic) |
inputPricePerMillion | BigDecimal | yes | USD cost per 1M input tokens |
outputPricePerMillion | BigDecimal | yes | USD cost per 1M output tokens |
Response (201 Created)
{
"id": "f47ac10b-...",
"model": "gpt-4o",
"provider": "openai",
"inputPricePerMillion": 2.50,
"outputPricePerMillion": 10.00,
"effectiveDate": "2026-03-04T10:00:00Z",
"createdAt": "2026-03-04T10:00:00Z",
"updatedAt": "2026-03-04T10:00:00Z"
}
Includes Location: /admin/v1/pricing/\{id\} header.
GET /admin/v1/pricing
Lists all pricing entries. Optional ?model= and ?provider= filters.
Response (200 OK)
{
"object": "list",
"data": [
{
"id": "f47ac10b-...",
"model": "gpt-4o",
"provider": "openai",
"inputPricePerMillion": 2.50,
"outputPricePerMillion": 10.00,
"effectiveDate": "2026-03-04T10:00:00Z",
"createdAt": "2026-03-04T10:00:00Z",
"updatedAt": "2026-03-04T10:00:00Z"
}
]
}
GET /admin/v1/pricing/{id}
Returns a specific pricing entry. Returns 404 with pricing_not_found error if not found.
PUT /admin/v1/pricing/{id}
Updates a pricing entry. Returns 404 if not found. Increments ConfigVersionTracker.
DELETE /admin/v1/pricing/{id}
Deletes a pricing entry. Returns 204. Returns 404 if not found.
GET /admin/v1/costs
Lists cost records. Supports filtering by tenantId, apiKey, model, and provider query parameters.
Request
GET /admin/v1/costs?tenantId=acme-corp&model=gpt-4o
Response (200 OK)
[
{
"id": "cost-abc123",
"tenantId": "acme-corp",
"apiKey": "sk-test",
"model": "gpt-4o",
"provider": "openai",
"inputTokens": 1000,
"outputTokens": 500,
"inputCost": 0.003,
"outputCost": 0.0075,
"totalCost": 0.0105,
"currency": "USD",
"pricingId": "f47ac10b-...",
"timestamp": "2026-03-04T10:05:00Z"
}
]
GET /admin/v1/costs/summary
Returns aggregated cost summary. Supports filtering by tenantId, model, provider, from, and to (ISO 8601 timestamps).
Request
GET /admin/v1/costs/summary?tenantId=acme-corp&model=gpt-4o
Response (200 OK)
{
"tenantId": "acme-corp",
"model": "gpt-4o",
"provider": null,
"totalInputCost": 0.15,
"totalOutputCost": 0.375,
"totalCost": 0.525,
"requestCount": 50,
"totalInputTokens": 50000,
"totalOutputTokens": 25000
}
POST /admin/v1/budgets
Creates a budget cap with soft and hard limits.
Request
POST /admin/v1/budgets
Content-Type: application/json
{
"name": "Production Monthly",
"tenantId": "acme-corp",
"apiKeyId": null,
"period": "MONTHLY",
"limitUsd": 1000.00,
"softLimitPct": 80,
"enabled": true
}
Response (201 Created)
{
"id": "b-123",
"tenantId": "acme-corp",
"apiKeyId": null,
"name": "Production Monthly",
"period": "MONTHLY",
"limitUsd": 1000.00,
"softLimitPct": 80,
"enabled": true,
"version": 1,
"createdAt": "2026-03-04T12:00:00Z",
"updatedAt": "2026-03-04T12:00:00Z"
}
GET /admin/v1/budgets
Lists budget caps. Optional filters: tenantId, apiKeyId.
GET /admin/v1/budgets/{id}
Returns a specific budget cap. Returns 404 (BUDGET_NOT_FOUND) if not found.
PUT /admin/v1/budgets/{id}
Updates a budget cap. Increments version. Supports partial updates (name, period, limitUsd, softLimitPct, enabled).
DELETE /admin/v1/budgets/{id}
Deletes a budget cap. Returns 204 on success, 404 if not found.
GET /admin/v1/budgets/{id}/usage
Returns current period spend vs budget limit.
Response (200 OK)
{
"budgetId": "b-123",
"name": "Production Monthly",
"period": "MONTHLY",
"limitUsd": 1000.00,
"softLimitUsd": 800.00,
"currentSpend": 450.00,
"remainingUsd": 550.00,
"utilizationPct": 45.0,
"periodStart": "2026-03-01T00:00:00Z",
"periodEnd": "2026-04-01T00:00:00Z"
}
POST /admin/v1/chargeback
Generates a chargeback report for the specified time period. Requires enterprise license.
Request
{
"tenant_id": "acme-corp",
"from": "2026-02-01T00:00:00Z",
"to": "2026-03-01T00:00:00Z"
}
| Field | Type | Required | Description |
|---|---|---|---|
tenant_id | string | no | Tenant to report on (null = all) |
from | ISO 8601 | yes | Start of reporting period |
to | ISO 8601 | yes | End of reporting period |
Response (201 Created)
{
"id": "cb-a1b2c3d4",
"tenant_id": null,
"from": "2026-02-01T00:00:00Z",
"to": "2026-03-01T00:00:00Z",
"generated_at": "2026-03-01T12:00:00Z",
"generated_by": "admin",
"sections": {
"tenantSummary": [...],
"apiKeySummary": [...],
"modelSummary": [...],
"providerSummary": [...],
"dailyBreakdown": [...],
"forecast": { "trailing7d": [...], "trailing30d": [...] },
"anomalies": [...]
}
}
Response (400 Bad Request)
Returned without license when enterprise license is not available (CHARGEBACK_NOT_AVAILABLE).
GET /admin/v1/chargeback
Lists all chargeback reports.
Query Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
tenant_id | string | no | Filter by tenant ID |
Response (200 OK)
{
"object": "list",
"data": [...]
}
GET /admin/v1/chargeback/{id}
Returns a specific chargeback report (without PDF bytes).
Response (404 Not Found)
Returned when the report ID does not exist (CHARGEBACK_NOT_FOUND).
GET /admin/v1/chargeback/{id}/pdf
Downloads a chargeback report as PDF.
Response (200 OK)
Returns Content-Type: application/pdf with Content-Disposition: attachment; filename=chargeback-report-\{id\}.pdf.
GET /admin/v1/chargeback/{id}/csv
Downloads a chargeback report as CSV.
Response (200 OK)
Returns Content-Type: text/csv with Content-Disposition: attachment; filename=chargeback-report-\{id\}.csv.
DELETE /admin/v1/chargeback/{id}
Deletes a chargeback report. Returns 204.
POST /admin/v1/schemas
Creates an output schema configuration for automatic response JSON validation. Requires enterprise license.
Request
POST /admin/v1/schemas
Content-Type: application/json
{
"modelPattern": "gpt-4o*",
"routeId": "gpt-route",
"schema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"age": { "type": "integer" }
},
"required": ["name"]
},
"maxRetries": 2,
"correctionPrompt": "The response must be valid JSON matching the schema. Fix the following errors:",
"enabled": true
}
Response: 201 Created
GET /admin/v1/schemas
Lists all output schema configurations.
Response: 200 OK — Array of schema config objects.
GET /admin/v1/schemas/{id}
Gets a specific output schema configuration.
Response: 200 OK — Schema config object.
PUT /admin/v1/schemas/{id}
Updates an output schema configuration.
Response: 200 OK — Updated schema config object.
DELETE /admin/v1/schemas/{id}
Deletes an output schema configuration. Returns 204 No Content.
GET /admin/v1/costs/forecast
Returns cost forecasts based on trailing spend trends.
Query Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
tenant_id | string | no | Filter by tenant ID |
trailing_days | int | no | Trailing window (default: 7) |
Response (200 OK)
[
{
"tenantId": "acme-corp",
"model": "gpt-4o",
"trailingDays": 7,
"dailyAverage": 12.50,
"projectedMonthEnd": 375.00,
"trend": "increasing",
"computedAt": "2026-03-01T12:00:00Z"
}
]
GET /admin/v1/costs/anomalies
Returns active cost anomalies where current daily spend rate exceeds the configured threshold relative to the 30-day baseline.
Query Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
tenant_id | string | no | Filter by tenant ID |
Response (200 OK)
[
{
"tenantId": "acme-corp",
"model": "gpt-4o",
"currentDailyRate": 5.00,
"baselineDailyRate": 1.00,
"deviationPct": 500.0,
"thresholdPct": 200.0,
"detectedAt": "2026-03-01T12:00:00Z"
}
]
POST /v1/budget/estimate
Pre-request budget estimation endpoint. Returns estimated cost for a request and remaining budget information. Useful for agent orchestrators to check budget before making a request.
Request
POST /v1/budget/estimate
Content-Type: application/json
{
"model": "gpt-4o",
"max_tokens": 1000
}
| Field | Type | Required | Description |
|---|---|---|---|
model | string | yes | Model name for cost estimation |
max_tokens | integer | no | Max tokens for cost estimation |
Response (200 OK)
{
"estimated_cost_usd": 0.005,
"budget_remaining_usd": 70.0,
"budget_remaining_tokens": 500000,
"budget_remaining_pct": 70,
"would_exceed_budget": false
}
| Field | Type | Nullable | Description |
|---|---|---|---|
estimated_cost_usd | double | no | Estimated cost of the request in USD |
budget_remaining_usd | double | yes | Remaining budget in USD (null when no budget configured) |
budget_remaining_tokens | integer | yes | Estimated tokens remaining in budget (null when no budget) |
budget_remaining_pct | integer | yes | Percentage of budget remaining (null when no budget) |
would_exceed_budget | boolean | no | Whether this request would exceed the remaining budget |
Error Responses
| Status | Error Code | Description |
|---|---|---|
| 400 | invalid_request_error | Missing required model field |
GET /admin/v1/audit/events
Lists audit events, optionally filtered by tenant, event type, and date range. Returns newest first.
Request
GET /admin/v1/audit/events?tenant_id=acme-corp&event_type=GATEWAY_RESPONSE&from=2026-01-01T00:00:00Z&to=2026-01-02T00:00:00Z
Query Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
tenant_id | string | no | Filter by tenant ID |
event_type | string | no | Filter by event type (GATEWAY_REQUEST, GATEWAY_RESPONSE) |
from | ISO 8601 | no | Start of time range |
to | ISO 8601 | no | End of time range |
Response (200 OK)
[
{
"eventId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"timestamp": "2026-01-15T10:30:00Z",
"tenantId": "acme-corp",
"eventType": "GATEWAY_RESPONSE",
"payload": {
"model": "gpt-4o",
"provider": "openai",
"status": 200,
"latency_ms": 342,
"api_key": "sk-prod-1...",
"tokens_total": "235"
}
}
]
GET /admin/v1/audit/events/export
Downloads audit events as a CSV file. Accepts the same filter parameters as the list endpoint.
Request
GET /admin/v1/audit/events/export?tenant_id=acme-corp
Response (200 OK)
Returns Content-Type: text/csv with Content-Disposition: attachment; filename=audit-events.csv.
event_id,timestamp,tenant_id,event_type,payload
a1b2c3d4-...,2026-01-15T10:30:00Z,acme-corp,GATEWAY_RESPONSE,"{model=gpt-4o, status=200}"
GET /admin/v1/audit/events/export/json
Downloads audit events as a JSON file. Accepts the same filter parameters as the list endpoint.
Request
GET /admin/v1/audit/events/export/json?tenant_id=acme-corp
Response (200 OK)
Returns Content-Type: application/json with Content-Disposition: attachment; filename=audit-events.json.
[
{
"eventId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"timestamp": "2026-01-15T10:30:00Z",
"tenantId": "acme-corp",
"eventType": "GATEWAY_RESPONSE",
"payload": {
"model": "gpt-4o",
"provider": "openai",
"status": 200
}
}
]
POST /admin/v1/reports
Generates a compliance report (SOC2, HIPAA, or GDPR). Requires enterprise license.
Request
POST /admin/v1/reports
Content-Type: application/json
{
"type": "SOC2",
"tenant_id": "acme-corp",
"from": "2026-02-01T00:00:00Z",
"to": "2026-03-01T00:00:00Z"
}
| Field | Type | Required | Description |
|---|---|---|---|
type | string | yes | Report type: SOC2, HIPAA, or GDPR |
tenant_id | string | no | Tenant ID to scope report (null = all tenants) |
from | ISO 8601 | yes | Start of reporting period |
to | ISO 8601 | yes | End of reporting period |
Response (201 Created)
{
"id": "rpt-a1b2c3d4",
"type": "SOC2",
"tenant_id": null,
"from": "2026-02-01T00:00:00Z",
"to": "2026-03-01T00:00:00Z",
"generated_at": "2026-03-01T12:00:00Z",
"generated_by": "admin@acme.com",
"sections": {
"auditChainIntegrity": { "valid": true, "verifiedCount": 1542 },
"accessControlSummary": { "authSuccessCount": 980, "authFailureCount": 12 },
"policyEnforcementSummary": { "policyDeniedCount": 5 },
"tokenUsageSummary": [ ... ],
"eventCountByType": { "GATEWAY_RESPONSE": 1200, "GATEWAY_REQUEST": 1200 }
}
}
Error (400 — unlicensed mode)
{
"error": {
"message": "Compliance reports require enterprise license",
"type": "invalid_request_error",
"code": "compliance_not_available"
}
}
GET /admin/v1/reports
Lists all generated compliance reports.
Request
GET /admin/v1/reports
GET /admin/v1/reports?tenant_id=acme-corp
Response (200 OK)
{
"object": "list",
"data": [
{
"id": "rpt-a1b2c3d4",
"type": "SOC2",
"tenant_id": null,
"from": "2026-02-01T00:00:00Z",
"to": "2026-03-01T00:00:00Z",
"generated_at": "2026-03-01T12:00:00Z",
"generated_by": "admin@acme.com"
}
]
}
GET /admin/v1/reports/{id}
Returns a specific compliance report with full section data (without PDF bytes).
Request
GET /admin/v1/reports/rpt-a1b2c3d4
Response (200 OK)
Same structure as the POST response above.
Error (404)
{
"error": {
"message": "Report not found: rpt-unknown",
"type": "not_found_error",
"code": "report_not_found"
}
}
GET /admin/v1/reports/{id}/pdf
Downloads the compliance report as a PDF file.
Request
GET /admin/v1/reports/rpt-a1b2c3d4/pdf
Response (200 OK)
Returns Content-Type: application/pdf with Content-Disposition: attachment; filename=soc2-report-rpt-a1b2c3d4.pdf.
If the stored report has no cached PDF bytes, the PDF is re-rendered on the fly.
DELETE /admin/v1/reports/{id}
Deletes a compliance report.
Request
DELETE /admin/v1/reports/rpt-a1b2c3d4
Response (204 No Content)
Empty body.
POST /admin/v1/pii/detokenize
Detokenizes PII tokens in a text string, replacing {{PII_<TYPE>_<hex>}} placeholders with the original values. Requires enterprise license.
Request
POST /admin/v1/pii/detokenize
Content-Type: application/json
{
"text": "Contact {{PII_EMAIL_a1b2c3d4}} regarding invoice {{PII_CREDIT_CARD_e5f6a7b8}}",
"tenant_id": "acme-corp"
}
Request Fields
| Field | Type | Required | Description |
|---|---|---|---|
text | string | yes | Text containing PII tokens to detokenize |
tenant_id | string | yes | Tenant ID for token lookup |
Response (200 OK)
{
"text": "Contact user@example.com regarding invoice 4111111111111111"
}
Tokens that cannot be resolved (expired or unknown) are left as-is in the output.
RBAC
Requires org-admin or policy-admin role.
DELETE /admin/v1/pii/tokens/{tenantId}
Purges all stored PII tokens for a tenant. This is an irreversible operation — detokenization of previously redacted content will no longer be possible.
Request
DELETE /admin/v1/pii/tokens/acme-corp
Response (200 OK)
{
"tenant_id": "acme-corp",
"tokens_removed": 1542
}
RBAC
Requires org-admin role.
POST /admin/v1/webhooks
Creates a new webhook subscription. Webhooks receive HTTP POST callbacks when specified gateway events occur.
Request
POST /admin/v1/webhooks
Content-Type: application/json
{
"name": "slack-alerts",
"url": "https://hooks.slack.com/services/...",
"secret": "whsec_my_signing_secret",
"event_types": ["POLICY_DENIAL", "PII_DETECTED"],
"tenant_id": "tenant-1",
"description": "Slack alerts for governance events"
}
Request Fields
| Field | Type | Required | Description |
|---|---|---|---|
name | string | yes | Display name for the webhook |
url | string | yes | HTTPS endpoint to receive webhook payloads |
secret | string | yes | Signing secret for HMAC-SHA256 payload verification |
event_types | array | yes | List of event types to subscribe to |
tenant_id | string | no | Scope webhook to a specific tenant (null = all tenants) |
description | string | no | Human-readable description |
Supported Event Types
| Event Type | Description |
|---|---|
POLICY_DENIAL | A request was denied by a policy rule |
PII_DETECTED | PII was detected in a request or response |
IP_ACCESS_DENIED | A request was blocked by IP access control |
BUDGET_CAP_SOFT | Soft budget cap threshold reached |
BUDGET_CAP_HARD | Hard budget cap exceeded |
INJECTION_DETECTED | Prompt injection detected |
AGENT_LOOP_DETECTED | Agentic loop detected |
MCP_APPROVAL_REQUESTED | MCP tool invocation requires human approval |
MCP_APPROVAL_TIMEOUT | MCP approval request timed out |
Response (201 Created)
The secret field is masked in all responses after creation.
{
"id": "wh-abc123",
"tenant_id": "tenant-1",
"name": "slack-alerts",
"description": "Slack alerts for governance events",
"url": "https://hooks.slack.com/services/...",
"secret": "whs***ret",
"event_types": ["POLICY_DENIAL", "PII_DETECTED"],
"status": "ACTIVE",
"version": 1,
"created_at": "2026-01-15T10:30:00Z",
"updated_at": "2026-01-15T10:30:00Z"
}
GET /admin/v1/webhooks
Lists all webhook subscriptions, optionally filtered by tenant.
Request
GET /admin/v1/webhooks
GET /admin/v1/webhooks?tenant_id=tenant-1
Query Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
tenant_id | string | no | Filter webhooks by tenant ID |
Response (200 OK)
{
"object": "list",
"data": [
{
"id": "wh-abc123",
"tenant_id": "tenant-1",
"name": "slack-alerts",
"description": "Slack alerts for governance events",
"url": "https://hooks.slack.com/services/...",
"secret": "whs***ret",
"event_types": ["POLICY_DENIAL", "PII_DETECTED"],
"status": "ACTIVE",
"version": 1,
"created_at": "2026-01-15T10:30:00Z",
"updated_at": "2026-01-15T10:30:00Z"
}
]
}
GET /admin/v1/webhooks/{id}
Returns a specific webhook subscription.
Request
GET /admin/v1/webhooks/wh-abc123
Response (200 OK)
Same structure as the webhook object in the list response.
Response (404 Not Found)
{
"error": {
"message": "Webhook not found: wh-unknown",
"type": "not_found_error",
"code": "webhook_not_found"
}
}
PUT /admin/v1/webhooks/{id}
Updates a webhook subscription. Each update increments the version number.
Request
PUT /admin/v1/webhooks/wh-abc123
Content-Type: application/json
{
"name": "slack-alerts-v2",
"event_types": ["POLICY_DENIAL", "PII_DETECTED", "BUDGET_CAP_HARD"]
}
All fields are optional -- only provided fields are updated.
Response (200 OK)
Returns the updated webhook with incremented version.
Response (404 Not Found)
Returned when the webhook ID does not exist.
DELETE /admin/v1/webhooks/{id}
Deletes a webhook subscription.
Request
DELETE /admin/v1/webhooks/wh-abc123
Response (204 No Content)
Empty body.
Response (404 Not Found)
Returned when the webhook ID does not exist.
POST /admin/v1/webhooks/{id}/test
Sends a test event to the webhook endpoint to verify connectivity and configuration.
Request
POST /admin/v1/webhooks/wh-abc123/test
Content-Type: application/json
{
"event_type": "POLICY_DENIAL"
}
Request Fields
| Field | Type | Required | Description |
|---|---|---|---|
event_type | string | yes | Event type to simulate for the test |
Response (200 OK)
{
"success": true,
"http_status": 200,
"error_message": null
}
If the target endpoint is unreachable or returns a non-2xx status:
{
"success": false,
"http_status": 503,
"error_message": "Connection refused"
}
Response (404 Not Found)
Returned when the webhook ID does not exist.
GET /admin/v1/webhooks/{id}/deliveries
Returns delivery logs for a webhook, showing the history of event dispatches and their outcomes.
Request
GET /admin/v1/webhooks/wh-abc123/deliveries
Response (200 OK)
{
"object": "list",
"data": [
{
"id": "dl-xyz789",
"webhook_id": "wh-abc123",
"event_id": "evt-456",
"event_type": "POLICY_DENIED",
"status": "SUCCESS",
"http_status": 200,
"attempt_count": 1,
"error_message": null,
"created_at": "2026-01-15T10:31:00Z",
"last_attempt_at": "2026-01-15T10:31:00Z"
}
]
}
Delivery Status Values
| Status | Description |
|---|---|
SUCCESS | Payload delivered and target returned 2xx |
FAILED | All delivery attempts exhausted without success |
PENDING | Delivery is queued or being retried |
Response (404 Not Found)
Returned when the webhook ID does not exist.
POST /v1/webhooks/actions/{action}
Processes an approval action for MCP tool invocations requiring human approval. This endpoint is unauthenticated -- the token query parameter is self-authenticating via HMAC signature.
Request
POST /v1/webhooks/actions/approve?token=<signed_token>
POST /v1/webhooks/actions/deny?token=<signed_token>
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | yes | Action to take: approve or deny |
Query Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
token | string | yes | HMAC-signed token encoding the approval context |
Response (200 OK)
Returns a confirmation of the action taken.
Response (400 Bad Request)
Returned when the token is invalid, expired, or the action is not recognized.
Webhook Delivery Payload
When a subscribed event occurs, the gateway sends an HTTP POST to the webhook URL with the following structure.
Delivery Headers
| Header | Description |
|---|---|
Content-Type | application/json |
X-Gateway-Signature | sha256=<hex_hmac> -- HMAC-SHA256 of the payload body using the webhook secret |
X-Gateway-Event | The audit event type (e.g., POLICY_DENIAL) |
X-Gateway-Delivery | Unique delivery ID for idempotency tracking |
Payload Structure
{
"id": "<delivery-uuid>",
"webhook_id": "<webhook-id>",
"timestamp": "<ISO-8601>",
"type": "<event_type>",
"tenant_id": "<tenant-id-or-null>",
"data": { ... }
}
The data field contains the event-specific payload from the audit event.
For MCP_APPROVAL_REQUESTED events, the payload includes two additional fields:
{
"id": "<delivery-uuid>",
"webhook_id": "<webhook-id>",
"timestamp": "<ISO-8601>",
"type": "MCP_APPROVAL_REQUESTED",
"tenant_id": "<tenant-id>",
"data": { ... },
"approve_url": "https://gateway.example.com/v1/webhooks/actions/approve?token=<signed_token>",
"deny_url": "https://gateway.example.com/v1/webhooks/actions/deny?token=<signed_token>"
}
Verifying Webhook Signatures
To verify the authenticity of a webhook delivery, compute the HMAC-SHA256 of the raw request body using your webhook secret and compare it to the value in the X-Gateway-Signature header:
expected = "sha256=" + hex(hmac_sha256(webhook_secret, request_body))
actual = request.headers["X-Gateway-Signature"]
secure_compare(expected, actual)
Always use constant-time comparison to prevent timing attacks.
POST /admin/v1/mcp/servers
Registers a new MCP server in the gateway registry.
Request
POST /admin/v1/mcp/servers
Content-Type: application/json
{
"server_id": "code-search",
"tenant_id": "tenant-1",
"transport": "HTTP_SSE",
"url": "https://mcp.example.com/sse",
"credential_ref": "vault://secret/mcp/code-search",
"tags": ["search", "code"]
}
Request Fields
| Field | Type | Required | Description |
|---|---|---|---|
server_id | string | Yes | Human-readable slug, unique per tenant |
tenant_id | string | Yes | Tenant that owns this server |
transport | string | Yes | Transport protocol: HTTP_SSE |
url | string | Yes | MCP server endpoint URL |
credential_ref | string | No | Vault path for server credentials |
tags | string[] | No | Tags for categorization |
Response (201 Created)
Returns the created MCP server with a Location header pointing to the new resource.
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"server_id": "code-search",
"tenant_id": "tenant-1",
"transport": "HTTP_SSE",
"url": "https://mcp.example.com/sse",
"credential_ref": "vault://secret/mcp/code-search",
"tags": ["search", "code"],
"status": "ACTIVE",
"tool_catalog": [],
"version": 1,
"created_at": "2026-01-15T10:00:00Z",
"updated_at": "2026-01-15T10:00:00Z"
}
Response (409 Conflict)
Returned when server_id already exists for the given tenant.
GET /admin/v1/mcp/servers
Lists all registered MCP servers. Supports optional ?tenant_id= query parameter to filter by tenant.
Request
GET /admin/v1/mcp/servers
GET /admin/v1/mcp/servers?tenant_id=tenant-1
Response (200 OK)
{
"object": "list",
"data": [
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"server_id": "code-search",
"tenant_id": "tenant-1",
"transport": "HTTP_SSE",
"url": "https://mcp.example.com/sse",
"status": "ACTIVE",
"tool_catalog": [],
"version": 1,
"created_at": "2026-01-15T10:00:00Z",
"updated_at": "2026-01-15T10:00:00Z"
}
]
}
GET /admin/v1/mcp/servers/{id}
Returns a specific MCP server by its internal ID.
Request
GET /admin/v1/mcp/servers/a1b2c3d4-e5f6-7890-abcd-ef1234567890
Response (200 OK)
Same format as the MCP server object in the list response.
Response (404 Not Found)
Returned when the MCP server ID does not exist.
PUT /admin/v1/mcp/servers/{id}
Updates an MCP server. Only provided fields are updated. Each update increments the version number.
Request
PUT /admin/v1/mcp/servers/a1b2c3d4-e5f6-7890-abcd-ef1234567890
Content-Type: application/json
{
"url": "https://mcp.new-url.com/sse",
"status": "SUSPENDED"
}
Response (200 OK)
Returns the updated MCP server with incremented version.
Response (404 Not Found)
Returned when the MCP server ID does not exist.
DELETE /admin/v1/mcp/servers/{id}
Deletes an MCP server registration.
Request
DELETE /admin/v1/mcp/servers/a1b2c3d4-e5f6-7890-abcd-ef1234567890
Response (204 No Content)
Response (404 Not Found)
Returned when the MCP server ID does not exist.
GET /admin/v1/mcp/servers/{id}/health
Checks live connectivity to the MCP server.
Request
GET /admin/v1/mcp/servers/a1b2c3d4-e5f6-7890-abcd-ef1234567890/health
Response (200 OK)
{
"reachable": true,
"http_status": 200,
"latency_ms": 42,
"error_message": null,
"checked_at": "2026-01-15T10:05:00Z"
}
Without a license (without enterprise license), health checks always return reachable: false with a message indicating that the feature requires an enterprise license.
POST /admin/v1/mcp/servers/{id}/tools/sync
Fetches the tool catalog from the MCP server and caches it locally.
Request
POST /admin/v1/mcp/servers/a1b2c3d4-e5f6-7890-abcd-ef1234567890/tools/sync
Response (200 OK)
{
"tools_count": 2,
"synced_at": "2026-01-15T10:05:00Z",
"tools": [
{
"name": "search",
"description": "Search code repositories"
},
{
"name": "read_file",
"description": "Read a file from the repository"
}
]
}
Response (400 Bad Request)
Without a license, tool sync returns a 400 with error code mcp_not_available.
POST /admin/v1/mcp/servers/{id}/credentials/rotate
Rotates the credential reference for an MCP server. Evicts the old credential from cache, optionally sets a new credential reference, and validates the new credential is resolvable. Requires enterprise license and org-admin role.
Request
POST /admin/v1/mcp/servers/a1b2c3d4-e5f6-7890-abcd-ef1234567890/credentials/rotate
Content-Type: application/json
{
"new_credential_ref": "vault://secret/mcp/new-path"
}
The new_credential_ref field is optional. If omitted, the existing credential reference is kept but the cache is evicted (forcing a fresh fetch from vault on the next request).
Response (200 OK)
{
"success": true,
"message": "Credential rotated successfully",
"old_credential_ref": "vault://secret/mcp/old-path",
"new_credential_ref": "vault://secret/mcp/new-path"
}
Response (400 Bad Request)
Without a license, returns 400 with error code mcp_not_available.
POST /admin/v1/mcp/servers/{id}/credentials/invalidate
Immediately evicts the cached credential for an MCP server. The next request will trigger a fresh fetch from the vault. Requires enterprise license and org-admin role.
Request
POST /admin/v1/mcp/servers/a1b2c3d4-e5f6-7890-abcd-ef1234567890/credentials/invalidate
Response (204 No Content)
No response body.
Response (404 Not Found)
Returns 404 with error code mcp_server_not_found if the server ID does not exist.
MCP Proxy Endpoints (port 8070)
The MCP Proxy is a standalone Spring Boot application (mcp-proxy-server) that runs on port 8070. It proxies requests from AI agents to registered MCP servers, injecting credentials from vault and applying a governance filter chain.
All MCP proxy endpoints require a valid signed JWT license key via GATEWAY_ENTERPRISE_LICENSE_KEY.
POST /mcp/{serverId}/tools/call
Invokes a tool on the specified MCP server. The proxy resolves the server from the registry, injects credentials, and forwards the request.
Request
POST /mcp/code-search/tools/call
Authorization: Bearer gw_<api-key>
Content-Type: application/json
X-Trace-Id: abc123
X-Session-Id: session-456
{
"name": "search_code",
"arguments": {
"query": "authentication handler",
"language": "java"
}
}
Headers
| Header | Required | Description |
|---|---|---|
Authorization | Yes | Bearer token with a valid API key (same keys as LLM Gateway) |
X-Trace-Id | No | Trace ID for distributed tracing; auto-generated if absent |
X-Session-Id | No | Session ID for agent chain tracing; stored in audit context |
Response (200 OK)
Returns the upstream MCP server response body as-is:
{
"content": [
{
"type": "text",
"text": "Found 3 matching files..."
}
]
}
The response includes X-Trace-Id header echoed back.
Error Responses
| Status | Code | Description |
|---|---|---|
| 401 | mcp_auth_required | Missing Bearer token |
| 401 | mcp_auth_invalid | Invalid API key |
| 401 | mcp_auth_revoked | API key has been revoked |
| 403 | mcp_policy_denied | Request denied by policy (server/tool/arg rule) |
| 404 | mcp_server_not_found | Server ID not found for this tenant |
| 503 | mcp_server_unavailable | Server is suspended or disabled |
| 502 | mcp_upstream_error | Upstream MCP server returned an error |
POST /mcp/{serverId}/tools/list
Lists tools available on the specified MCP server by forwarding to the server's tools/list endpoint.
Request
POST /mcp/code-search/tools/list
Authorization: Bearer gw_<api-key>
Content-Type: application/json
{}
Response (200 OK)
Returns the upstream MCP server's tool listing:
{
"tools": [
{
"name": "search_code",
"description": "Search code repositories",
"inputSchema": { "type": "object", "properties": { "query": { "type": "string" } } }
}
]
}
POST /mcp/{serverId}/resources/{path}
Accesses a resource on the specified MCP server.
Request
POST /mcp/file-server/resources/read
Authorization: Bearer gw_<api-key>
Content-Type: application/json
{
"uri": "file:///data/report.csv"
}
POST /mcp/{serverId}/prompts/{path}
Accesses a prompt template on the specified MCP server.
Request
POST /mcp/prompt-server/prompts/summarize
Authorization: Bearer gw_<api-key>
Content-Type: application/json
{
"arguments": {
"text": "Long document text..."
}
}
MCP Proxy Error Envelope
All MCP proxy errors use a unified error envelope with the mcp_error type:
{
"error": {
"message": "MCP server not found: code-search",
"type": "mcp_error",
"code": "mcp_server_not_found",
"trace_id": "abc123"
}
}
All error codes are lowercased and prefixed with mcp_.
MCP Proxy Metrics
The MCP proxy exposes Prometheus metrics at GET /actuator/prometheus:
| Metric | Type | Labels |
|---|---|---|
mcp_requests_total | Counter | tenant, server_id, operation, status |
mcp_latency_seconds | Timer (P50/P95/P99) | tenant, server_id, operation |
mcp_errors_total | Counter | server_id, error_code |
Credential Encryption
Provider API keys can be encrypted at rest using AES-256-GCM. To use encrypted credentials:
- Set the master password via environment variable:
GATEWAY_ENCRYPTION_MASTER_PASSWORD=<password> - Encrypt your API key using the
AesEncryptorutility - Set the encrypted value with an
ENC:prefix:OPENAI_API_KEY=ENC:<base64-ciphertext>
The gateway transparently decrypts ENC:-prefixed values at runtime using the configured master password. Enterprise deployments can replace the built-in SecretProvider with vault-backed implementations (HashiCorp Vault, AWS Secrets Manager, etc.) for zero-plaintext credential handling.