Agentic AI Governance

DVARA provides a comprehensive governance layer for multi-agent workflows — covering tool call visibility, session tracking, loop detection, and human-in-the-loop approval gates. These controls run in the MCP filter chain, are configurable per tenant, and produce forensic-grade audit events.

How It Works

Agentic governance runs as a chain of stages in the MCP Proxy. Every agent tool call passes through authentication, rate limiting, tenant resolution, registry lookup, budget enforcement, policy evaluation, loop detection, approval gating, PII scanning, injection scanning, and audit recording before reaching the upstream MCP server.

1. MCP Tool Call Visibility

Every MCP tool invocation is logged as a tool-call record with full context: actor, tool name, server, latency, HTTP status, PII flags, and policy decision. Records are queryable via admin API and visible in the DVARA Flightdeck.

Admin API

# List tool calls (filterable)
curl http://localhost:8090/v1/admin/mcp/tool-calls?tenant_id=acme

# Filter by server or tool
curl http://localhost:8090/v1/admin/mcp/tool-calls?server_id=code-search
curl http://localhost:8090/v1/admin/mcp/tool-calls?tool_name=search

# Aggregated summary by server + tool
curl http://localhost:8090/v1/admin/mcp/tool-calls/summary?tenant_id=acme

Tool Call Record Fields

Field	Description
`id`	Unique record identifier
`tenant_id`	Tenant that owns the request
`session_id`	Agent session identifier
`trace_id`	Distributed trace identifier
`server_id`	MCP server that handled the call
`tool_name`	Tool that was invoked
`operation`	MCP operation (e.g. `tools/call`)
`user_id`	Authenticated user (if available)
`policy_decision`	Policy engine result (`ALLOW` / `DENY`)
`http_status`	Upstream HTTP response status
`latency_ms`	End-to-end latency
`response_bytes`	Response payload size
`is_error`	Whether the call resulted in an error
`error_code`	Error code (if error)
`pii_in_args`	PII detected in request arguments
`pii_in_response`	PII detected in response
`timestamp`	When the call occurred

Summary Response

The /summary endpoint aggregates tool calls by server and tool name:

{
  "data": [
    {
      "server_id": "code-search",
      "tool_name": "search",
      "total_calls": 142,
      "error_count": 3,
      "avg_latency_ms": 45.2
    }
  ]
}

Prometheus Metrics

Metric	Type	Labels	Notes
`mcp_tool_calls_total`	Counter	`tenant`, `server_id`, `tool_name`, `status`	Incremented on every MCP tool call.
`mcp_tool_call_latency_seconds`	Histogram	`server_id`, `tool_name`	Tool call latency histogram.
`mcp_agent_loop_detected_total`	Counter	`tenant`, `loop_type`	Incremented when loop detection fires (repetition, cycle, or rate).
`mcp_agent_sessions_killed_total`	Counter	`tenant`	Incremented when an agent session is killed.

2. Multi-Agent Chain Tracing

Sessions track the complete lifecycle of an agent's interaction — from first tool call to last. Each session aggregates tool call counts, error counts, latency, and the set of distinct servers and tools used.

How Sessions Work

After every successful tool call, the gateway records the call against a session keyed by the X-Session-Id header.
Sessions are marked "active" based on a configurable TTL (default: 60 minutes since last activity).
Killed sessions are immediately blocked — future tool calls for that session return 403.

Admin API

# List all sessions
curl http://localhost:8090/v1/admin/sessions

# Filter by tenant or active status
curl http://localhost:8090/v1/admin/sessions?tenant_id=acme
curl http://localhost:8090/v1/admin/sessions?active=true

# Get session detail
curl http://localhost:8090/v1/admin/sessions/sess-abc123

# Get session timeline (tool call history)
curl http://localhost:8090/v1/admin/sessions/sess-abc123/timeline

# Kill a session (blocks future tool calls)
curl -X POST http://localhost:8090/v1/admin/sessions/sess-abc123/kill

Session Response

{
  "session_id": "sess-abc123",
  "tenant_id": "acme",
  "first_seen": "2026-03-05T10:00:00Z",
  "last_seen": "2026-03-05T10:05:30Z",
  "tool_call_count": 15,
  "error_count": 1,
  "total_latency_ms": 2340,
  "distinct_servers": ["code-search", "database"],
  "distinct_tools": ["search", "query", "read_file"],
  "active": true
}

Kill Switch

When a session is killed via POST /v1/admin/sessions/\{id\}/kill:

An AGENT_SESSION_KILLED audit event is written.
All subsequent MCP requests for that session ID receive 403 Forbidden with error code mcp_agent_session_killed in the response body.
The session is marked inactive.

The response body follows the standard MCP error envelope: {"error": {"code": "mcp_agent_session_killed", "type": "mcp_error", "message": "…", "trace_id": "…"}}.

Configuration

dvara:
  mcp-gateway:
    agentic:
      enabled: true
      session-ttl-minutes: 60        # Active session TTL
      session-max-capacity: 10000    # Max tracked sessions

Audit Events

Event Type	When
`AGENT_SESSION_KILLED`	Session terminated via kill API

3. Agent Loop Detection & Kill Switch

The loop detector monitors agent behavior per session and triggers when it detects repetitive, cyclical, or excessive tool call patterns. This prevents runaway agents from consuming unlimited tokens or causing unintended side effects.

Detection Patterns

Pattern	Description	Default Threshold
Repetition	Same tool called N times consecutively	5 consecutive calls
Cycle	A→B→A→B repeating pattern detected	Pattern length 2–4, repeated 3+ times
Rate	Exceeds maximum calls per minute in session	60 calls/minute

How It Works

The loop detector evaluates every MCP request against per-session history.
Per-session history is maintained in a circular buffer (default: 100 entries).
Three detection algorithms run on each call:
- Repetition: counts consecutive identical tool keys (serverId::toolName).
- Cycle: scans for repeating patterns of length 2–4 in the call history.
- Rate: counts calls within a 60-second sliding window.
On detection: audit event + webhook + optional auto-kill + return 429 Too Many Requests.

Configuration

dvara:
  mcp-gateway:
    agentic:
      loop-detection:
        enabled: true
        repetition-threshold: 5      # Consecutive same-tool threshold
        cycle-max-length: 4          # Max cycle pattern length to check
        cycle-repetitions: 3         # Required repetitions to trigger
        max-calls-per-minute: 60     # Rate limit per session
        auto-kill: false             # Auto-kill session on detection
        history-size: 100            # Per-session history buffer

Per-Tenant Configuration

Override global settings via tenant metadata:

Metadata Key	Type	Description
`agentic.loop-detection.enabled`	boolean	Enable/disable for tenant
`agentic.loop-detection.repetition-threshold`	int	Override repetition threshold
`agentic.loop-detection.max-calls-per-minute`	int	Override rate limit
`agentic.loop-detection.auto-kill`	boolean	Override auto-kill

Error Response

When a loop is detected, the response follows the standard MCP error envelope:

{
  "error": {
    "code": "mcp_agent_loop_detected",
    "type": "mcp_error",
    "message": "Tool 'code-search::search' called 5 times consecutively",
    "trace_id": "abc123..."
  }
}

HTTP Status: 429 Too Many Requests. All agentic MCP error responses (loop, session-killed, approval-denied, approval-timeout) follow the same envelope shape — lowercase code, type: "mcp_error", and trace_id included.

Auto-Kill

When auto-kill: true is configured, the loop detector automatically kills the offending session upon detection. This means:

The current request returns 429 with AGENT_LOOP_DETECTED (the only audit event written at the trigger point)
The session is marked killed in the session tracker
All future requests for that session return 403 mcp_agent_session_killed — but no further audit event is written from those blocked requests; the AGENT_LOOP_DETECTED event already captured the trigger

The AGENT_SESSION_KILLED audit event is written only when an operator explicitly kills a session via the admin API (POST /v1/admin/sessions/{id}/kill) — not on the auto-kill path. To correlate auto-kills in audit logs, filter for AGENT_LOOP_DETECTED with auto-kill: true configured.

Audit Events

Event Type	Payload Fields
`AGENT_LOOP_DETECTED`	`loop_type`, `session_id`, `server_id`, `tool_name`, `trace_id`, `message`

Prometheus Metrics

Dedicated counters are available for loop detection and session kills:

sum(rate(mcp_agent_loop_detected_total[5m]))
sum(rate(mcp_agent_sessions_killed_total[5m]))

Use the loop_type label on mcp_agent_loop_detected_total to distinguish repetition, cycle, and rate-limit triggers. Audit events (AGENT_LOOP_DETECTED, AGENT_SESSION_KILLED) carry the full forensic detail for incident reconstruction.

4. Human-in-the-Loop Approval Gates

Approval gates allow you to require human sign-off before high-risk tool calls are executed. When a tool call matches an approval rule, the request blocks until a human approves or denies it, or until a timeout expires.

How It Works

The approval gate evaluates every MCP request against tenant-specific approval rules.
Rules match on tool name patterns (glob) and/or server IDs.
When matched:
- MCP_APPROVAL_REQUESTED audit event is written.
- A webhook is dispatched with approve/deny URLs carrying HMAC-signed tokens.
- The request blocks until a decision arrives or the timeout fires.
The webhook recipient (Slack bot, approval UI, etc.) calls the approve/deny URL.
The gateway validates the HMAC token and records the decision.
The blocked request resumes with the decision.

Approval Flow

Agent → MCP Proxy → approval gate
                    ├─ rules match → approval required
                    ├─ audit(MCP_APPROVAL_REQUESTED)
                    ├─ webhook dispatch (approve/deny URLs)
                    └─ wait for decision
                         ├─ approve → forward to upstream → 200
                         ├─ deny    → 403 mcp_approval_denied
                         └─ timeout → 408 mcp_approval_timeout

Configuration

dvara:
  mcp-gateway:
    agentic:
      approval:
        enabled: true
        timeout-seconds: 300           # Wait timeout (5 minutes)
        default-action: deny           # Action on timeout: "deny" or "approve"
        max-pending-approvals: 1000    # Max concurrent pending approvals

Per-Tenant Configuration

Approval rules are configured via tenant metadata:

Metadata Key	Type	Description
`approval.required-tools`	string	Comma-separated glob patterns (e.g. `"database_*,file_write"`)
`approval.required-servers`	string	Comma-separated server IDs
`approval.timeout-seconds`	int	Override global timeout
`approval.default-action`	string	Override timeout action (`"deny"` or `"approve"`)

Example: Require Approval for Database Writes

Set the following tenant metadata:

{
  "approval.required-tools": "db_write,db_delete,database_*",
  "approval.required-servers": "production-db",
  "approval.timeout-seconds": 600,
  "approval.default-action": "deny"
}

Any tool call matching db_write, db_delete, or database_* patterns, or targeting the production-db server, will require human approval.

Webhook Payload

When approval is required, a webhook is dispatched with approve/deny action URLs. The payload structure (verbatim from WebhookPayloadBuilder):

{
  "id": "<delivery-uuid>",
  "webhook_id": "<webhook id>",
  "timestamp": "2026-03-05T10:00:00Z",
  "type": "MCP_APPROVAL_REQUESTED",
  "tenant_id": "acme",
  "data": {
    "event_id": "<event-uuid>",
    "server_id": "production-db",
    "tool_name": "db_delete",
    "session_id": "<session-uuid>",
    "trace_id": "<trace-id>",
    "user_id": "<user-id-if-available>",
    "matched_rules": "[tool:database_*, server:production-db]"
  },
  "approve_url": "https://gateway.example.com/v1/webhooks/actions/approve?token=<hmac-signed>",
  "deny_url": "https://gateway.example.com/v1/webhooks/actions/deny?token=<hmac-signed>"
}

Note on the wire format:

The id field at the top level is the delivery UUID (one per webhook delivery attempt), distinct from data.event_id (one per audit event — the event id is stable across retries of the same delivery).
tenant_id lives at the top level, not inside data.
approve_url / deny_url are at the top level (not nested under an actions object). They are only included when the event type is MCP_APPROVAL_REQUESTED and dvara.llm-gateway.webhooks.approval-base-url is configured.
matched_rules is serialized as a string — Java's List.toString() form ("[tool:..., server:...]"), not a JSON array. Receivers that want to parse the rule list need to strip the brackets and split on , .

Error Responses

Scenario	HTTP Status	Error Code
Approval denied	403	`mcp_approval_denied`
Approval timed out	408	`mcp_approval_timeout`
Max pending approvals reached	403 (deny)	—

Audit Events

Event Type	When
`MCP_APPROVAL_REQUESTED`	Tool call blocked pending approval
`MCP_APPROVAL_GRANTED`	Approval received
`MCP_APPROVAL_DENIED`	Denial received
`MCP_APPROVAL_TIMEOUT`	Timeout expired

Prometheus Metrics

Metric	Type	Labels
`mcp_approval_requests_total`	Counter	tenant, server_id, tool_name
`mcp_approval_granted_total`	Counter	tenant
`mcp_approval_denied_total`	Counter	tenant
`mcp_approval_timeout_total`	Counter	tenant

DVARA Flightdeck

Tool Calls Page (`/mcp/tool-calls`)

The tool calls page provides real-time visibility into all MCP tool call activity:

Filters: tenant, server, tool name
Table: timestamp, server, tool, tenant, session, HTTP status, latency, PII flags
Click to expand: full tool call detail (trace ID, operation, policy decision, etc.)
Auto-refresh: live polling every 5 seconds

Sessions Page (`/mcp/sessions`)

The sessions page tracks all agent sessions:

Filters: tenant, active-only toggle
Table: session ID, tenant, status, tool calls, errors, servers, latency, last seen
Detail view: session info + tool call timeline (3-second live polling)
Kill button: terminate session with confirmation dialog

Tool calls, sessions, and the approval queue are accessible from the Agents section of the DVARA Flightdeck sidebar.

RBAC

Endpoint	Method	owner	policy-admin	developer	viewer
`/v1/admin/mcp/tool-calls`, `/summary`	GET	Y	Y	Y	Y
`/v1/admin/sessions`, `/\{id\}`, `/\{id\}/timeline`	GET	Y	Y	Y	Y
`/v1/admin/sessions/\{id\}/kill`	POST	Y	Y

Capacity and concurrency

The agentic governance layer runs at the full throughput of the DVARA MCP Proxy. Session tracking, loop detection, and approval-gate bookkeeping add negligible overhead to the request path. Approval gates pause the request while waiting for a human decision so a paused approval does not consume CPU or RAM beyond the request itself.

Limit	Default	Behavior when full
Active sessions	`10000`	Oldest inactive sessions are evicted
Loop detection history per session	`100` entries	Circular buffer; oldest entries are dropped
Pending approvals	`1000`	Excess requests are auto-denied
Tool call records	Unbounded (PostgreSQL)	Retained until you archive or delete them

All limits are configurable through the properties reference. Tool call records are persisted durably to PostgreSQL, so they survive restarts and can feed compliance reports months after the fact.

Enterprise-only

Agentic governance is an enterprise feature. Without an enterprise license, the session tracker, loop detector, and approval gate are all no-ops and tool call records are not persisted. With a license, the full governance layer activates automatically and the DVARA Flightdeck exposes the sessions, tool calls, approval queue, and analytics pages.

5. Multi-Tenancy Isolation

The MCP Proxy enforces strict tenant isolation at multiple layers:

Tenant context — Any MCP request without a tenant id is rejected with 403 TENANT_REQUIRED (the McpTenantContextFilter builds the response body directly with the uppercase code, bypassing the lowercase+mcp_-prefix transformation other MCP error codes get). The tenant id is also placed in the structured-logging context for every downstream log line.
Registry isolation — Server lookups are scoped to (tenantId, serverId). Tenant A cannot discover, list, or call servers registered by tenant B, even if serverId strings collide.
Tool-call record isolation — Tool-call queries return only records belonging to the calling tenant.
Session isolation — Session queries return only sessions belonging to the calling tenant.
Shared-server pattern — One physical MCP server can be registered under multiple tenants with separate (tenantId, serverId) entries, each with independent policy surfaces and no shared state.

6. Credential Hot-Swap

MCP server credentials can be rotated or invalidated at runtime without gateway downtime.

API Endpoints

# Rotate credentials (optional new credential reference)
curl -X POST http://localhost:8090/v1/admin/mcp/servers/{id}/credentials/rotate \
  -H "Content-Type: application/json" \
  -d '{"new_credential_ref": "vault://secret/mcp/new-path"}'

# Invalidate cached credentials immediately
curl -X POST http://localhost:8090/v1/admin/mcp/servers/{id}/credentials/invalidate

Rotation Response

{
  "success": true,
  "message": "Credential rotated successfully",
  "old_credential_ref": "vault://secret/mcp/old-path",
  "new_credential_ref": "vault://secret/mcp/new-path"
}

How It Works

Rotate: Evicts the old credential from the gateway's secret cache, updates the server's credentialRef, validates the new credential is resolvable, and writes a CREDENTIAL_ROTATED audit event.
Invalidate: Evicts the credential from cache immediately without setting a new one. The next request triggers a fresh fetch from the vault. Returns 204 No Content.

RBAC

Credential rotation and invalidation require the owner role.

Endpoint	Method	owner	policy-admin	developer	viewer
`/\{id\}/credentials/rotate`	POST	Y
`/\{id\}/credentials/invalidate`	POST	Y

7. Rich Rate Limit Errors

When rate limits are enforced on MCP tool calls, the 429 response includes actionable detail for intelligent retry/reroute decisions.

Response Format

The rate-limit response nests detail inside the error object. Unlike the other MCP error codes (which get lowercased + mcp_-prefixed by McpExceptionHandler), the rate-limit filter constructs the body directly, so the wire code is the raw uppercase form:

{
  "error": {
    "type": "rate_limit_error",
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Rate limit exceeded",
    "rate_limit": {
      "limited_resource": "github-api",
      "server_id": "github-api",
      "limit_type": "requests_per_minute",
      "retry_after_seconds": 47,
      "alternative_servers": ["github-api-secondary", "github-api-backup"]
    }
  }
}

alternative_servers only appears when at least one candidate was found; an empty list is omitted from the response rather than serialized as []. Agent orchestrators should read the retry hint from error.rate_limit.retry_after_seconds in the body — the MCP Proxy does not set Retry-After or X-RateLimit-* HTTP headers on rate-limit responses (the value is body-only).

Alternative Servers

alternative_servers lists other ACTIVE MCP servers for the same tenant that share matching tags with the rate-limited server. Up to 5 alternatives are returned. This enables agent orchestrators to automatically reroute to available alternatives instead of blind waiting.

8. Approval Gate Metrics

The approval gate records Prometheus metrics at each decision point:

Metric	Type	Labels	When
`mcp_approval_requests_total`	Counter	tenant, server_id, tool_name	Approval requested
`mcp_approval_granted_total`	Counter	tenant	Approval granted
`mcp_approval_denied_total`	Counter	tenant	Approval denied
`mcp_approval_timeout_total`	Counter	tenant	Approval timed out

All MCP Proxy metrics are exposed at /actuator/prometheus. The MCP Proxy follows the same authentication model as the LLM Gateway — the endpoint requires Authorization: Bearer $DVARA_ACTUATOR_METRICS_API_KEY (the same shared secret as the LLM Gateway's metrics scrape; set it once for the install and both apps validate against it). The MCP Proxy's /actuator/health and /actuator/health/{liveness,readiness} are anonymous and safe for k8s probes; its dangerous endpoints (/env, /heapdump, /threaddump, etc.) are excluded from the actuator registry and return 404 regardless of auth. See Observability → Health Endpoints for the full auth model.

How It Works​

1. MCP Tool Call Visibility​

Admin API​

Tool Call Record Fields​

Summary Response​

Prometheus Metrics​

2. Multi-Agent Chain Tracing​

How Sessions Work​

Admin API​

Session Response​

Kill Switch​

Configuration​

Audit Events​

3. Agent Loop Detection & Kill Switch​

Detection Patterns​

How It Works​

Configuration​

Per-Tenant Configuration​

Error Response​

Auto-Kill​

Audit Events​

Prometheus Metrics​

4. Human-in-the-Loop Approval Gates​

How It Works​

Approval Flow​

Configuration​

Per-Tenant Configuration​

Example: Require Approval for Database Writes​

Webhook Payload​

Error Responses​

Audit Events​

Prometheus Metrics​

DVARA Flightdeck​

Tool Calls Page (/mcp/tool-calls)​

Sessions Page (/mcp/sessions)​

Navigation​

RBAC​

Capacity and concurrency​

Enterprise-only​

5. Multi-Tenancy Isolation​

6. Credential Hot-Swap​

API Endpoints​

Rotation Response​

How It Works​

RBAC​

7. Rich Rate Limit Errors​

Response Format​

Alternative Servers​

8. Approval Gate Metrics​

How It Works

1. MCP Tool Call Visibility

Admin API

Tool Call Record Fields

Summary Response

Prometheus Metrics

2. Multi-Agent Chain Tracing

How Sessions Work

Admin API

Session Response

Kill Switch

Configuration

Audit Events

3. Agent Loop Detection & Kill Switch

Detection Patterns

How It Works

Configuration

Per-Tenant Configuration

Error Response

Auto-Kill

Audit Events

Prometheus Metrics

4. Human-in-the-Loop Approval Gates

How It Works

Approval Flow

Configuration

Per-Tenant Configuration

Example: Require Approval for Database Writes

Webhook Payload

Error Responses

Audit Events

Prometheus Metrics

DVARA Flightdeck

Tool Calls Page (`/mcp/tool-calls`)

Sessions Page (`/mcp/sessions`)

Navigation

RBAC

Capacity and concurrency

Enterprise-only

5. Multi-Tenancy Isolation

6. Credential Hot-Swap

API Endpoints

Rotation Response

How It Works

RBAC

7. Rich Rate Limit Errors

Response Format

Alternative Servers

8. Approval Gate Metrics