Agentic AI Governance
DVARA provides a comprehensive governance layer for multi-agent workflows — covering tool call visibility, session tracking, loop detection, and human-in-the-loop approval gates. These controls run in the MCP filter chain, are configurable per tenant, and produce forensic-grade audit events.
How It Works
Agentic governance runs as a chain of stages in the MCP Proxy. Every agent tool call passes through authentication, rate limiting, tenant resolution, registry lookup, budget enforcement, policy evaluation, loop detection, approval gating, PII scanning, injection scanning, and audit recording before reaching the upstream MCP server.
1. MCP Tool Call Visibility
Every MCP tool invocation is logged as a tool-call record with full context: actor, tool name, server, latency, HTTP status, PII flags, and policy decision. Records are queryable via admin API and visible in the DVARA Flightdeck.
Admin API
# List tool calls (filterable)
curl http://localhost:8090/v1/admin/mcp/tool-calls?tenant_id=acme
# Filter by server or tool
curl http://localhost:8090/v1/admin/mcp/tool-calls?server_id=code-search
curl http://localhost:8090/v1/admin/mcp/tool-calls?tool_name=search
# Aggregated summary by server + tool
curl http://localhost:8090/v1/admin/mcp/tool-calls/summary?tenant_id=acme
Tool Call Record Fields
| Field | Description |
|---|---|
id | Unique record identifier |
tenant_id | Tenant that owns the request |
session_id | Agent session identifier |
trace_id | Distributed trace identifier |
server_id | MCP server that handled the call |
tool_name | Tool that was invoked |
operation | MCP operation (e.g. tools/call) |
user_id | Authenticated user (if available) |
policy_decision | Policy engine result (ALLOW / DENY) |
http_status | Upstream HTTP response status |
latency_ms | End-to-end latency |
response_bytes | Response payload size |
is_error | Whether the call resulted in an error |
error_code | Error code (if error) |
pii_in_args | PII detected in request arguments |
pii_in_response | PII detected in response |
timestamp | When the call occurred |
Summary Response
The /summary endpoint aggregates tool calls by server and tool name:
{
"data": [
{
"server_id": "code-search",
"tool_name": "search",
"total_calls": 142,
"error_count": 3,
"avg_latency_ms": 45.2
}
]
}
Prometheus Metrics
| Metric | Type | Labels | Notes |
|---|---|---|---|
mcp_tool_calls_total | Counter | tenant, server_id, tool_name, status | Incremented on every MCP tool call. |
mcp_tool_call_latency_seconds | Histogram | server_id, tool_name | Tool call latency histogram. |
mcp_agent_loop_detected_total | Counter | tenant, loop_type | Incremented when loop detection fires (repetition, cycle, or rate). |
mcp_agent_sessions_killed_total | Counter | tenant | Incremented when an agent session is killed. |
2. Multi-Agent Chain Tracing
Sessions track the complete lifecycle of an agent's interaction — from first tool call to last. Each session aggregates tool call counts, error counts, latency, and the set of distinct servers and tools used.
How Sessions Work
- After every successful tool call, the gateway records the call against a session keyed by the
X-Session-Idheader. - Sessions are marked "active" based on a configurable TTL (default: 60 minutes since last activity).
- Killed sessions are immediately blocked — future tool calls for that session return
403.
Admin API
# List all sessions
curl http://localhost:8090/v1/admin/sessions
# Filter by tenant or active status
curl http://localhost:8090/v1/admin/sessions?tenant_id=acme
curl http://localhost:8090/v1/admin/sessions?active=true
# Get session detail
curl http://localhost:8090/v1/admin/sessions/sess-abc123
# Get session timeline (tool call history)
curl http://localhost:8090/v1/admin/sessions/sess-abc123/timeline
# Kill a session (blocks future tool calls)
curl -X POST http://localhost:8090/v1/admin/sessions/sess-abc123/kill
Session Response
{
"session_id": "sess-abc123",
"tenant_id": "acme",
"first_seen": "2026-03-05T10:00:00Z",
"last_seen": "2026-03-05T10:05:30Z",
"tool_call_count": 15,
"error_count": 1,
"total_latency_ms": 2340,
"distinct_servers": ["code-search", "database"],
"distinct_tools": ["search", "query", "read_file"],
"active": true
}
Kill Switch
When a session is killed via POST /v1/admin/sessions/\{id\}/kill:
- An
AGENT_SESSION_KILLEDaudit event is written. - All subsequent MCP requests for that session ID receive
403 Forbiddenwith error codemcp_agent_session_killedin the response body. - The session is marked inactive.
The response body follows the standard MCP error envelope: {"error": {"code": "mcp_agent_session_killed", "type": "mcp_error", "message": "…", "trace_id": "…"}}.
Configuration
dvara:
mcp-gateway:
agentic:
enabled: true
session-ttl-minutes: 60 # Active session TTL
session-max-capacity: 10000 # Max tracked sessions
Audit Events
| Event Type | When |
|---|---|
AGENT_SESSION_KILLED | Session terminated via kill API |
3. Agent Loop Detection & Kill Switch
The loop detector monitors agent behavior per session and triggers when it detects repetitive, cyclical, or excessive tool call patterns. This prevents runaway agents from consuming unlimited tokens or causing unintended side effects.
Detection Patterns
| Pattern | Description | Default Threshold |
|---|---|---|
| Repetition | Same tool called N times consecutively | 5 consecutive calls |
| Cycle | A→B→A→B repeating pattern detected | Pattern length 2–4, repeated 3+ times |
| Rate | Exceeds maximum calls per minute in session | 60 calls/minute |
How It Works
- The loop detector evaluates every MCP request against per-session history.
- Per-session history is maintained in a circular buffer (default: 100 entries).
- Three detection algorithms run on each call:
- Repetition: counts consecutive identical tool keys (
serverId::toolName). - Cycle: scans for repeating patterns of length 2–4 in the call history.
- Rate: counts calls within a 60-second sliding window.
- Repetition: counts consecutive identical tool keys (
- On detection: audit event + webhook + optional auto-kill + return
429 Too Many Requests.
Configuration
dvara:
mcp-gateway:
agentic:
loop-detection:
enabled: true
repetition-threshold: 5 # Consecutive same-tool threshold
cycle-max-length: 4 # Max cycle pattern length to check
cycle-repetitions: 3 # Required repetitions to trigger
max-calls-per-minute: 60 # Rate limit per session
auto-kill: false # Auto-kill session on detection
history-size: 100 # Per-session history buffer
Per-Tenant Configuration
Override global settings via tenant metadata:
| Metadata Key | Type | Description |
|---|---|---|
agentic.loop-detection.enabled | boolean | Enable/disable for tenant |
agentic.loop-detection.repetition-threshold | int | Override repetition threshold |
agentic.loop-detection.max-calls-per-minute | int | Override rate limit |
agentic.loop-detection.auto-kill | boolean | Override auto-kill |
Error Response
When a loop is detected, the response follows the standard MCP error envelope:
{
"error": {
"code": "mcp_agent_loop_detected",
"type": "mcp_error",
"message": "Tool 'code-search::search' called 5 times consecutively",
"trace_id": "abc123..."
}
}
HTTP Status: 429 Too Many Requests. All agentic MCP error responses (loop, session-killed, approval-denied, approval-timeout) follow the same envelope shape — lowercase code, type: "mcp_error", and trace_id included.
Auto-Kill
When auto-kill: true is configured, the loop detector automatically kills the offending session upon detection. This means:
- The current request returns
429withAGENT_LOOP_DETECTED(the only audit event written at the trigger point) - The session is marked killed in the session tracker
- All future requests for that session return
403 mcp_agent_session_killed— but no further audit event is written from those blocked requests; theAGENT_LOOP_DETECTEDevent already captured the trigger
The AGENT_SESSION_KILLED audit event is written only when an operator explicitly kills a session via the admin API (POST /v1/admin/sessions/{id}/kill) — not on the auto-kill path. To correlate auto-kills in audit logs, filter for AGENT_LOOP_DETECTED with auto-kill: true configured.
Audit Events
| Event Type | Payload Fields |
|---|---|
AGENT_LOOP_DETECTED | loop_type, session_id, server_id, tool_name, trace_id, message |
Prometheus Metrics
Dedicated counters are available for loop detection and session kills:
sum(rate(mcp_agent_loop_detected_total[5m]))
sum(rate(mcp_agent_sessions_killed_total[5m]))
Use the loop_type label on mcp_agent_loop_detected_total to distinguish repetition, cycle, and rate-limit triggers. Audit events (AGENT_LOOP_DETECTED, AGENT_SESSION_KILLED) carry the full forensic detail for incident reconstruction.
4. Human-in-the-Loop Approval Gates
Approval gates allow you to require human sign-off before high-risk tool calls are executed. When a tool call matches an approval rule, the request blocks until a human approves or denies it, or until a timeout expires.
How It Works
- The approval gate evaluates every MCP request against tenant-specific approval rules.
- Rules match on tool name patterns (glob) and/or server IDs.
- When matched:
MCP_APPROVAL_REQUESTEDaudit event is written.- A webhook is dispatched with approve/deny URLs carrying HMAC-signed tokens.
- The request blocks until a decision arrives or the timeout fires.
- The webhook recipient (Slack bot, approval UI, etc.) calls the approve/deny URL.
- The gateway validates the HMAC token and records the decision.
- The blocked request resumes with the decision.
Approval Flow
Agent → MCP Proxy → approval gate
├─ rules match → approval required
├─ audit(MCP_APPROVAL_REQUESTED)
├─ webhook dispatch (approve/deny URLs)
└─ wait for decision
├─ approve → forward to upstream → 200
├─ deny → 403 mcp_approval_denied
└─ timeout → 408 mcp_approval_timeout
Configuration
dvara:
mcp-gateway:
agentic:
approval:
enabled: true
timeout-seconds: 300 # Wait timeout (5 minutes)
default-action: deny # Action on timeout: "deny" or "approve"
max-pending-approvals: 1000 # Max concurrent pending approvals
Per-Tenant Configuration
Approval rules are configured via tenant metadata:
| Metadata Key | Type | Description |
|---|---|---|
approval.required-tools | string | Comma-separated glob patterns (e.g. "database_*,file_write") |
approval.required-servers | string | Comma-separated server IDs |
approval.timeout-seconds | int | Override global timeout |
approval.default-action | string | Override timeout action ("deny" or "approve") |
Example: Require Approval for Database Writes
Set the following tenant metadata:
{
"approval.required-tools": "db_write,db_delete,database_*",
"approval.required-servers": "production-db",
"approval.timeout-seconds": 600,
"approval.default-action": "deny"
}
Any tool call matching db_write, db_delete, or database_* patterns, or targeting the production-db server, will require human approval.
Webhook Payload
When approval is required, a webhook is dispatched with approve/deny action URLs. The payload structure (verbatim from WebhookPayloadBuilder):
{
"id": "<delivery-uuid>",
"webhook_id": "<webhook id>",
"timestamp": "2026-03-05T10:00:00Z",
"type": "MCP_APPROVAL_REQUESTED",
"tenant_id": "acme",
"data": {
"event_id": "<event-uuid>",
"server_id": "production-db",
"tool_name": "db_delete",
"session_id": "<session-uuid>",
"trace_id": "<trace-id>",
"user_id": "<user-id-if-available>",
"matched_rules": "[tool:database_*, server:production-db]"
},
"approve_url": "https://gateway.example.com/v1/webhooks/actions/approve?token=<hmac-signed>",
"deny_url": "https://gateway.example.com/v1/webhooks/actions/deny?token=<hmac-signed>"
}
Note on the wire format:
- The
idfield at the top level is the delivery UUID (one per webhook delivery attempt), distinct fromdata.event_id(one per audit event — the event id is stable across retries of the same delivery). tenant_idlives at the top level, not insidedata.approve_url/deny_urlare at the top level (not nested under anactionsobject). They are only included when the event type isMCP_APPROVAL_REQUESTEDanddvara.llm-gateway.webhooks.approval-base-urlis configured.matched_rulesis serialized as a string — Java'sList.toString()form ("[tool:..., server:...]"), not a JSON array. Receivers that want to parse the rule list need to strip the brackets and split on,.
Error Responses
| Scenario | HTTP Status | Error Code |
|---|---|---|
| Approval denied | 403 | mcp_approval_denied |
| Approval timed out | 408 | mcp_approval_timeout |
| Max pending approvals reached | 403 (deny) | — |
Audit Events
| Event Type | When |
|---|---|
MCP_APPROVAL_REQUESTED | Tool call blocked pending approval |
MCP_APPROVAL_GRANTED | Approval received |
MCP_APPROVAL_DENIED | Denial received |
MCP_APPROVAL_TIMEOUT | Timeout expired |
Prometheus Metrics
| Metric | Type | Labels |
|---|---|---|
mcp_approval_requests_total | Counter | tenant, server_id, tool_name |
mcp_approval_granted_total | Counter | tenant |
mcp_approval_denied_total | Counter | tenant |
mcp_approval_timeout_total | Counter | tenant |
DVARA Flightdeck
Tool Calls Page (/mcp/tool-calls)
The tool calls page provides real-time visibility into all MCP tool call activity:
- Filters: tenant, server, tool name
- Table: timestamp, server, tool, tenant, session, HTTP status, latency, PII flags
- Click to expand: full tool call detail (trace ID, operation, policy decision, etc.)
- Auto-refresh: live polling every 5 seconds
Sessions Page (/mcp/sessions)
The sessions page tracks all agent sessions:
- Filters: tenant, active-only toggle
- Table: session ID, tenant, status, tool calls, errors, servers, latency, last seen
- Detail view: session info + tool call timeline (3-second live polling)
- Kill button: terminate session with confirmation dialog
Navigation
Tool calls, sessions, and the approval queue are accessible from the Agents section of the DVARA Flightdeck sidebar.
RBAC
| Endpoint | Method | owner | policy-admin | billing-admin | developer | viewer |
|---|---|---|---|---|---|---|
/v1/admin/mcp/tool-calls, /summary | GET | Y | Y | Y | Y | |
/v1/admin/sessions, /\{id\}, /\{id\}/timeline | GET | Y | Y | Y | Y | |
/v1/admin/sessions/\{id\}/kill | POST | Y | Y |
Capacity and concurrency
The agentic governance layer runs at the full throughput of the DVARA MCP Proxy. Session tracking, loop detection, and approval-gate bookkeeping add negligible overhead to the request path. Approval gates pause the request while waiting for a human decision so a paused approval does not consume CPU or RAM beyond the request itself.
| Limit | Default | Behavior when full |
|---|---|---|
| Active sessions | 10000 | Oldest inactive sessions are evicted |
| Loop detection history per session | 100 entries | Circular buffer; oldest entries are dropped |
| Pending approvals | 1000 | Excess requests are auto-denied |
| Tool call records | Unbounded (PostgreSQL) | Retained until you archive or delete them |
All limits are configurable through the properties reference. Tool call records are persisted durably to PostgreSQL, so they survive restarts and can feed compliance reports months after the fact.
Enterprise-only
Agentic governance is an enterprise feature. Without an enterprise license, the session tracker, loop detector, and approval gate are all no-ops and tool call records are not persisted. With a license, the full governance layer activates automatically and the DVARA Flightdeck exposes the sessions, tool calls, approval queue, and analytics pages.
5. Multi-Tenancy Isolation
The MCP Proxy enforces strict tenant isolation at multiple layers:
- Tenant context — Any MCP request without a tenant id is rejected with
403 TENANT_REQUIRED(theMcpTenantContextFilterbuilds the response body directly with the uppercase code, bypassing the lowercase+mcp_-prefix transformation other MCP error codes get). The tenant id is also placed in the structured-logging context for every downstream log line. - Registry isolation — Server lookups are scoped to
(tenantId, serverId). Tenant A cannot discover, list, or call servers registered by tenant B, even ifserverIdstrings collide. - Tool-call record isolation — Tool-call queries return only records belonging to the calling tenant.
- Session isolation — Session queries return only sessions belonging to the calling tenant.
- Shared-server pattern — One physical MCP server can be registered under multiple tenants with separate
(tenantId, serverId)entries, each with independent policy surfaces and no shared state.
6. Credential Hot-Swap
MCP server credentials can be rotated or invalidated at runtime without gateway downtime.
API Endpoints
# Rotate credentials (optional new credential reference)
curl -X POST http://localhost:8090/v1/admin/mcp/servers/{id}/credentials/rotate \
-H "Content-Type: application/json" \
-d '{"new_credential_ref": "vault://secret/mcp/new-path"}'
# Invalidate cached credentials immediately
curl -X POST http://localhost:8090/v1/admin/mcp/servers/{id}/credentials/invalidate
Rotation Response
{
"success": true,
"message": "Credential rotated successfully",
"old_credential_ref": "vault://secret/mcp/old-path",
"new_credential_ref": "vault://secret/mcp/new-path"
}
How It Works
- Rotate: Evicts the old credential from the gateway's secret cache, updates the server's
credentialRef, validates the new credential is resolvable, and writes aCREDENTIAL_ROTATEDaudit event. - Invalidate: Evicts the credential from cache immediately without setting a new one. The next request triggers a fresh fetch from the vault. Returns
204 No Content.
RBAC
Credential rotation and invalidation require the owner role.
| Endpoint | Method | owner | policy-admin | developer | viewer |
|---|---|---|---|---|---|
/\{id\}/credentials/rotate | POST | Y | |||
/\{id\}/credentials/invalidate | POST | Y |
7. Rich Rate Limit Errors
When rate limits are enforced on MCP tool calls, the 429 response includes actionable detail for intelligent retry/reroute decisions.
Response Format
The rate-limit response nests detail inside the error object. Unlike the other MCP error codes (which get lowercased + mcp_-prefixed by McpExceptionHandler), the rate-limit filter constructs the body directly, so the wire code is the raw uppercase form:
{
"error": {
"type": "rate_limit_error",
"code": "RATE_LIMIT_EXCEEDED",
"message": "Rate limit exceeded",
"rate_limit": {
"limited_resource": "github-api",
"server_id": "github-api",
"limit_type": "requests_per_minute",
"retry_after_seconds": 47,
"alternative_servers": ["github-api-secondary", "github-api-backup"]
}
}
}
alternative_servers only appears when at least one candidate was found; an empty list is omitted from the response rather than serialized as []. Agent orchestrators should read the retry hint from error.rate_limit.retry_after_seconds in the body — the MCP Proxy does not set Retry-After or X-RateLimit-* HTTP headers on rate-limit responses (the value is body-only).
Alternative Servers
alternative_servers lists other ACTIVE MCP servers for the same tenant that share matching tags with the rate-limited server. Up to 5 alternatives are returned. This enables agent orchestrators to automatically reroute to available alternatives instead of blind waiting.
8. Approval Gate Metrics
The approval gate records Prometheus metrics at each decision point:
| Metric | Type | Labels | When |
|---|---|---|---|
mcp_approval_requests_total | Counter | tenant, server_id, tool_name | Approval requested |
mcp_approval_granted_total | Counter | tenant | Approval granted |
mcp_approval_denied_total | Counter | tenant | Approval denied |
mcp_approval_timeout_total | Counter | tenant | Approval timed out |
All MCP Proxy metrics are exposed at /actuator/prometheus. The MCP Proxy follows the same authentication model as the LLM Gateway — the endpoint requires Authorization: Bearer $DVARA_ACTUATOR_METRICS_API_KEY (the same shared secret as the LLM Gateway's metrics scrape; set it once for the install and both apps validate against it). The MCP Proxy's /actuator/health and /actuator/health/{liveness,readiness} are anonymous and safe for k8s probes; its dangerous endpoints (/env, /heapdump, /threaddump, etc.) are excluded from the actuator registry and return 404 regardless of auth. See Observability → Health Endpoints for the full auth model.