Version: 1.3.0

Hallucination Detection

When a response is supposed to be grounded in a set of source documents — a RAG answer, a summary, a citation — DVARA can catch claims that aren't actually supported by the sources. The detector splits the response into sentence-level claims, embeds each claim and each source passage, and flags any claim whose maximum cosine similarity to any source falls below a configurable threshold.

Grounding runs as a response-side check on non-streaming calls and as an at-stream-end check on SSE responses.

How it works

The detector is fail-open — if embedding fails, the filter passes the response through and writes a warning log. It never blocks on its own failures.

The sentence splitter applies a 5-word minimum per claim — sentences shorter than 5 words (after whitespace split) are filtered out and never embedded. So one-word answers like "Yes." / "False." / "It is." or short list bullets are skipped silently and the response is treated as grounded. The detector is designed for long-form responses; if your workload needs grounding on short-token outputs, gate it differently (e.g. structured-output schemas with required citation fields).

Supplying source documents

The detector looks for sources in request.metadata["grounding.sources"] as a list of strings. Your client supplies them on each chat-completion request:

POST /v1/chat/completions
{
  "model": "gpt-4o",
  "messages": [
    { "role": "system", "content": "Answer based only on the provided context." },
    { "role": "user", "content": "What is DVARA's auth model?" }
  ],
  "metadata": {
    "grounding.sources": [
      "DVARA supports three auth modes: built-in, OIDC, and SAML 2.0.",
      "The data plane always requires API keys."
    ]
  }
}

If metadata.grounding.sources is missing or empty and grounding is enabled, the detector is a no-op — it doesn't fabricate sources for you.

Sources above dvara.llm-gateway.guardrail.grounding.max-source-length (default 10000 chars) are silently dropped. Lists above dvara.llm-gateway.guardrail.grounding.max-sources (default 50) have their tail dropped. This protects you from runaway embedding bills on pathological inputs.

Configuration

Set these in application.yml:

dvara:
  llm-gateway:
    guardrail:
      grounding:
        enabled: true
        similarity-threshold: 0.7    # cosine threshold; claims below this are ungrounded
        action: LOG                  # LOG, FLAG, or BLOCK
        max-sources: 50              # max source documents per request
        max-source-length: 10000     # max chars per source document

Or via environment variables — the gateway maps dvara.llm-gateway.guardrail.grounding.* YAML keys to DVARA_LLM_GATEWAY_GUARDRAIL_GROUNDING_* env vars automatically:

DVARA_LLM_GATEWAY_GUARDRAIL_GROUNDING_ENABLED=true
DVARA_LLM_GATEWAY_GUARDRAIL_GROUNDING_SIMILARITY_THRESHOLD=0.7
DVARA_LLM_GATEWAY_GUARDRAIL_GROUNDING_ACTION=LOG
DVARA_LLM_GATEWAY_GUARDRAIL_GROUNDING_MAX_SOURCES=50
DVARA_LLM_GATEWAY_GUARDRAIL_GROUNDING_MAX_SOURCE_LENGTH=10000

Property	Default	Description
`enabled`	`false`	Enable embedding-based grounding detection
`similarity-threshold`	`0.7`	Cosine similarity threshold — claims below this are ungrounded
`action`	`LOG`	`LOG`, `FLAG`, or `BLOCK`
`max-sources`	`50`	Maximum source documents per request (excess silently dropped)
`max-source-length`	`10000`	Maximum chars per source document (oversized silently dropped)

Actions

LOG (default) — write a HALLUCINATION_DETECTED audit event and a Prometheus metric sample, then pass the response through unchanged. Use this in shadow mode while you tune the threshold.
FLAG — behaviorally identical to LOG in the current release: the audit event and the metric sample are written, and the response is passed through unchanged. The value of setting FLAG is that the action label on the gateway_grounding_check_total counter reflects your intent, so dashboards can distinguish "running in shadow" from "running in warn-before-block" when you're staging a rollout. There is no X-Gateway-Grounding-Warning response header in this release — do not rely on a client-side header to drive UX.
BLOCK — the original response bytes are discarded and the client receives a structured error (see Error code below for the wire format).

Start with LOG, measure the false-positive rate on real traffic, then promote to FLAG or BLOCK once you're confident.

Per-tenant overrides

The DVARA Flightdeck exposes per-tenant grounding settings on the Grounding tab of the tenant edit form (Identity → Tenants → Edit). The form covers enabled, action, max-sources, and max-source-length with server-side validation. Updates emit a TENANT_METADATA_UPDATED audit event with a diff of every changed key.

For Terraform, CI/CD, and other programmatic tooling, the same overrides go through the Automation API as tenant metadata:

curl -X PUT http://localhost:8090/v1/admin/tenants/acme-corp \
  -H "Content-Type: application/json" \
  -d '{
    "metadata": {
      "grounding.enabled": "true",
      "grounding.action": "BLOCK",
      "grounding.max-sources": "20",
      "grounding.max-source-length": "5000"
    }
  }'

Metadata key	Type	Overrides
`grounding.enabled`	boolean	Enable / disable per tenant
`grounding.action`	`BLOCK` / `FLAG` / `LOG`	Action for this tenant
`grounding.max-sources`	int	Per-tenant source cap
`grounding.max-source-length`	int	Per-tenant source length cap

Missing keys fall back to the global config.

similarity-threshold is not overridable per tenant — it's bound at startup and applies to every tenant uniformly. Changing it requires a gateway restart. If different tenants need different strictness, use per-tenant grounding.action (e.g. BLOCK for the strict tenant, LOG for the rest) rather than reaching for a per-tenant threshold.

Streaming responses

For streaming (stream=true) responses, the gateway buffers the full response text across all SSE chunks. At stream end, the grounding check runs against the accumulated text. The action determines what happens:

BLOCK — the stream is terminated with a final chunk containing finishReason: content_filter and a HALLUCINATION_DETECTED_STREAMING audit event is written. The client sees a graceful stream termination.
LOG / FLAG — a HALLUCINATION_DETECTED_STREAMING audit event is written and the stream completes normally.

Because streaming grounding checks happen at stream end, they don't add latency mid-response — you get full streaming throughput.

Observability

The grounding detector emits a single Prometheus counter and two audit-event types. Hook these into your existing dashboards before turning the action up from LOG to BLOCK.

Prometheus metric

Metric	Type	Labels	Description
`gateway_grounding_check_total`	Counter	`grounded` (`true`/`false`), `action` (`LOG`/`FLAG`/`BLOCK`)	Incremented once per non-streaming response that runs through the detector.

PromQL to dashboard the ungrounded rate — the percentage of responses where the detector flagged at least one ungrounded claim:

sum(rate(gateway_grounding_check_total{grounded="false"}[5m]))
/
sum(rate(gateway_grounding_check_total[5m]))

PromQL to alert on regressions — fire when ungrounded rate exceeds 10% over a 15-minute window:

sum(rate(gateway_grounding_check_total{grounded="false"}[15m]))
/
sum(rate(gateway_grounding_check_total[15m]))
> 0.1

PromQL to track the rollout from LOG → FLAG → BLOCK — a stacked breakdown by action:

sum by (action) (rate(gateway_grounding_check_total[5m]))

Audit events

An audit event is written only when the detector flags at least one ungrounded claim — fully grounded responses do not emit an event. Two event types are emitted depending on the response mode:

HALLUCINATION_DETECTED — non-streaming responses.
HALLUCINATION_DETECTED_STREAMING — streaming responses, written at stream end.

Envelope fields (set by the audit writer): event ID, timestamp, tenantId, and the event type. The trace ID and request metadata live on the surrounding HTTP access log line, not on the audit payload.

Payload fields:

Field	Type	Description
`grounded`	boolean	Always `false` on these events — the event is only written when the check fails.
`confidence`	double	Detector's confidence in the grounding decision (0.0–1.0).
`overall_similarity`	double	Aggregate cosine similarity across all claims.
`ungrounded_claim_count`	int	Number of sentence claims that fell below the similarity threshold.
`ungrounded_claims`	list of strings	The flagged claim text for every claim that failed the check. Truncated to 100 characters with a `...` suffix when the original is longer — reviewers seeing the suffix should cross-reference the original response to read the full sentence.
`source`	string	Present only on `HALLUCINATION_DETECTED_STREAMING`, with value `streaming_response`. Distinguishes the streaming origin in log queries.

Error code

When action=BLOCK and the detector finds an ungrounded claim, the gateway returns HTTP 403 with type: guardrail_violation and code: hallucination_detected — the same envelope shape as the other guardrail enforcement codes (guardrail_blocked, pii_detected).

{
  "error": {
    "type": "guardrail_violation",
    "code": "hallucination_detected",
    "message": "Response blocked: hallucination detected (3 ungrounded claims)",
    "trace_id": "a6783439db1f46a6bfed511a0011e955"
  }
}

Tuning the threshold

The default threshold (0.7) is a conservative starting point. Things to keep in mind:

Claim tokenizer quality matters more than the embedding model. A bad sentence split can attribute unrelated content to a claim.
Source passage length affects similarity scores — very long passages dilute similarity. Break sources into semantically coherent chunks (1–3 sentences each).
Embedding model affects absolute similarity values. Grounding uses the same in-process embedding service as the semantic cache. The default simple-hash embedder is deterministic and catches close paraphrases but isn't semantically rich; switching the semantic cache to the neural onnx provider (all-MiniLM-L6-v2) makes grounding semantically richer too.
False positives are most common on responses that paraphrase sources heavily or on lists where each bullet is short. Raise the threshold cautiously — below 0.5 is usually too permissive.

Guardrails overview — the broader guardrail system and how grounding fits in
ML & Plugin Guardrails — complementary input-side protections
Configuration — full property reference

How it works​

Supplying source documents​

Configuration​

Actions​

Per-tenant overrides​

Streaming responses​

Observability​

Prometheus metric​

Audit events​

Error code​

Tuning the threshold​

Related​