PII Detection and Redaction
Dvara Enterprise detects and enforces policies on Personally Identifiable Information (PII) and Protected Health Information (PHI) in prompts and LLM responses. This prevents sensitive data from being forwarded to external LLM providers without explicit authorization.
Requires: Enterprise license (signed JWT via GATEWAY_ENTERPRISE_LICENSE_KEY). See license-generator module for key generation.
How It Works
PII enforcement runs at two points in the LLM request lifecycle:
- Request scanning — After policy evaluation, before dispatching to the LLM provider. Detects PII in user messages and tool result blocks.
- Response scanning — After receiving the LLM response, before returning to the client. Detects PII in assistant messages (output leak detection).
Additionally, PII is always stripped from requests before they are written to the response cache.
For MCP (Model Context Protocol) traffic, PII enforcement runs via McpPiiArgsFilter (order 600) in the MCP filter chain:
- Argument scanning — For
tools/calloperations, recursively scans all string values in the tool call arguments map for PII before forwarding to the upstream MCP server. - Response scanning — After the MCP server responds (2xx only), recursively scans all string values in the response body for PII output leaks before returning to the agent.
MCP PII scanning uses the same configuration (global gateway.pii.* and per-tenant pii.* metadata) and the same detection engine as LLM PII scanning.
Supported PII Types
| Type | Label | Detection Method |
|---|---|---|
| Email addresses | email | Regex pattern |
| US phone numbers | phone_us | Regex (with optional +1) |
| International phone numbers | phone_intl | Regex (E.164 format) |
| Social Security Numbers | ssn | Regex (rejects 000/666/9xx prefixes) |
| Credit card numbers | credit_card | Regex + Luhn checksum validation |
| Dates of birth | dob | Regex (MM/DD/YYYY) |
| IPv4 addresses | ipv4 | Regex |
| US passport numbers | passport_us | Regex |
| Driver's license numbers | drivers_license | Regex (generic format) |
| IBAN numbers | iban | Regex (international bank account) |
| Medical Record Numbers | mrn | Keyword (MRN/Medical Record) + digits |
| DEA numbers | dea | Format regex + DEA checksum validation |
| NPI numbers | npi | Context keyword "NPI" required + Luhn checksum |
| Person names | person_name | Salutation heuristic (Mr/Mrs/Dr/Prof), confidence 0.7 |
Actions
Each PII detection can trigger one of three actions:
| Action | Behavior | Audit Event |
|---|---|---|
LOG | Log the detection, forward request unchanged | PII_DETECTED |
BLOCK | Reject the request with HTTP 400 (pii_detected) | PII_DETECTED |
REDACT | Replace PII with reversible tokens, forward modified request | PII_REDACTED |
When REDACT is active, PII values are replaced with tokens like {{PII_EMAIL_a1b2c3d4}}. The original values are encrypted (AES-256-GCM) and stored in-memory per tenant. Authorized admins can detokenize the content later via the admin API.
Response scanning always uses REDACT behavior when PII is detected in LLM output, producing a PII_OUTPUT_LEAK audit event.
Configuration
Global Configuration
Add to application.yml:
gateway:
pii:
enabled: true # enable PII scanning
default-action: LOG # LOG, BLOCK, or REDACT
scan-responses: true # scan LLM responses for output leaks
strip-before-cache: true # always redact before caching
token-encryption-password: ${GATEWAY_PII_TOKEN_ENCRYPTION_PASSWORD:}
max-tokens-per-tenant: 50000 # max stored PII tokens per tenant
token-retention-days: 30 # auto-expire after N days
Per-Tenant Configuration
Override PII behavior per tenant by setting metadata keys on the tenant object:
curl -X PUT http://localhost:8080/admin/v1/tenants/acme-corp \
-H "Content-Type: application/json" \
-d '{
"metadata": {
"pii.enabled": "true",
"pii.action": "REDACT",
"pii.scan-responses": "true",
"pii.custom-patterns": "{\"employee_id\": \"EMP-\\\\d{6}\"}"
}
}'
| Metadata Key | Values | Description |
|---|---|---|
pii.enabled | true / false | Override global PII detection |
pii.action | BLOCK / REDACT / LOG | Override default action |
pii.scan-responses | true / false | Override response scanning |
pii.custom-patterns | JSON string | Add custom regex patterns |
Custom Patterns
Custom patterns are specified as a JSON object where keys are labels and values are regex strings. They are detected as CUSTOM entity type:
{
"employee_id": "EMP-\\d{6}",
"internal_project": "PROJ-[A-Z]{2,4}-\\d{4}"
}
Admin API
Detokenize
Restore original PII values from redacted text:
curl -X POST http://localhost:8080/admin/v1/pii/detokenize \
-H "Content-Type: application/json" \
-d '{
"text": "Contact {{PII_EMAIL_a1b2c3d4}} about account",
"tenant_id": "acme-corp"
}'
Response:
{
"text": "Contact user@example.com about account"
}
Requires org-admin or policy-admin role.
Purge Tokens
Remove all stored PII tokens for a tenant (irreversible):
curl -X DELETE http://localhost:8080/admin/v1/pii/tokens/acme-corp
Response:
{
"tenant_id": "acme-corp",
"tokens_removed": 1542
}
Requires org-admin role.
Audit Trail
All PII events are written to the audit trail. The audit payload includes entity types and counts but never includes the actual PII values.
LLM Traffic
| Event Type | When |
|---|---|
PII_DETECTED | PII found in request (action: LOG or BLOCK) |
PII_REDACTED | PII redacted from request (action: REDACT) |
PII_OUTPUT_LEAK | PII detected in LLM response |
MCP Traffic
| Event Type | When |
|---|---|
MCP_PII_DETECTED | PII found in tool call arguments (action: LOG or BLOCK) |
MCP_PII_REDACTED | PII redacted from tool call arguments (action: REDACT) |
MCP_PII_OUTPUT_LEAK | PII detected in MCP server response |
MCP PII audit events include server_id, tool_name, entity_count, entity_types, source (request/response), and action. The pii_action field is also propagated to MCP_REQUEST and MCP_RESPONSE audit events written by the downstream McpAuditPreFilter.
Example audit event payload:
{
"eventType": "PII_REDACTED",
"payload": {
"tenant_id": "acme-corp",
"action": "REDACT",
"entity_types": ["EMAIL", "SSN"],
"entity_count": 2
}
}
RBAC Permissions
| Permission | Roles | Description |
|---|---|---|
PII_ADMIN | org-admin, policy-admin | Detokenize, configure PII settings |
PII_READ | org-admin, policy-admin, billing-admin | View PII audit events |
Security Considerations
- PII tokens are encrypted at rest using AES-256-GCM with the configured
token-encryption-password - Token storage is in-memory (not persisted to disk) — tokens are lost on gateway restart
- Audit events never contain the actual PII values, only entity types and counts
- The
BLOCKaction rejects the entire request — no partial content is forwarded - Cache stripping always uses
REDACTbehavior regardless of the tenant's configured action