Skip to main content

Response Caching

Dvara supports exact-match response caching to avoid hitting upstream providers for repeated identical prompts. This reduces latency and cost.

Backends

Two cache backends are available:

BackendUse CaseDependencies
CaffeineSingle-instance deployments (default)None (in-memory)
RedisMulti-instance deploymentsspring-boot-starter-data-redis

Enabling Caching

Set gateway.cache.enabled=true in application.yml:

gateway:
cache:
enabled: true
ttl-seconds: 3600 # cache entry time-to-live (default: 3600)
max-size: 10000 # max entries in Caffeine cache (default: 10000)

When caching is disabled (default), a no-op cache is injected. All requests pass through to providers with zero overhead.

Cache Key Derivation

The cache key is a SHA-256 hash of the canonical request:

Included in key: model + messages (role + content, in order) + temperature + maxTokens

Excluded from key:

  • stream — streaming requests bypass the cache entirely
  • metadata — transient, not part of the semantic request

Two requests produce the same cache key only if they have identical model, messages, temperature, and max tokens.

How It Works

  1. A non-streaming request arrives at POST /v1/chat/completions.
  2. If the X-Cache-Control: no-cache header is present, the cache lookup is skipped.
  3. Otherwise, the gateway looks up the request in the cache.
  4. Cache hit: the cached response is returned immediately with X-Cache: HIT. No provider call is made.
  5. Cache miss: the request is forwarded to the provider. The response is stored in the cache and returned with X-Cache: MISS.
  6. Streaming requests ("stream": true) bypass the cache entirely — no lookup, no storage, no X-Cache header.

Response Headers

HeaderValueWhen
X-CacheHITResponse served from cache
X-CacheMISSResponse fetched from provider, now cached
(absent)Streaming request (cache bypassed)

Bypassing the Cache

Send the X-Cache-Control: no-cache header to force a fresh provider call:

curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Cache-Control: no-cache" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "What time is it?"}]
}'

The response will still be stored in the cache (with X-Cache: MISS), so subsequent requests without the bypass header will hit the cache.

Using Redis

For multi-instance deployments, enable the Redis backend:

gateway:
cache:
enabled: true
ttl-seconds: 3600
redis:
enabled: true

spring:
data:
redis:
host: localhost
port: 6379

Redis keys are stored with the prefix dvara:cache: and TTL is set per entry.

Graceful degradation: if Redis is unavailable, the cache silently falls through to the provider. Requests are never blocked by a Redis failure.

Cache Metering

Cache hits are logged at INFO level with the number of tokens saved:

Cache HIT for key=a1b2c3d4e5f6 — tokens saved: 33

Token usage from cached responses is still recorded for rate-limiting purposes.

Enterprise Override

Both CaffeineResponseCache and RedisResponseCache are registered with @ConditionalOnMissingBean semantics. Enterprise deployments can provide a custom ResponseCache bean to integrate with any caching infrastructure.

Enterprise Distributed Repository Caching

Requires enterprise-caching on classpath + valid JWT license key + gateway.caching.distributed.enabled=true.

For multi-instance enterprise deployments, the enterprise-caching module provides Redis cache-aside decorators that wrap underlying repository implementations. This ensures all gateway instances share a consistent view of configuration data without querying the database on every read.

How It Works

Nine repository decorators follow the cache-aside pattern:

  • Read path: check Redis first; on cache miss, read from the underlying repository and populate Redis.
  • Write path: write to the underlying repository, then evict the corresponding cache entry so the next read fetches fresh data.

Decorated Repositories and TTLs

RepositoryCache TTLDescription
TenantRepository60sTenant configuration
ApiKeyRepository30sAPI key lookups (shorter TTL for security)
RouteRepository60sRoute configurations
PolicyRepository30sPolicy definitions (shorter TTL for policy changes)
ModelPricingRepository60sModel pricing entries
BudgetCapRepository30sBudget cap definitions
McpServerRepository60sMCP server registrations
WebhookRepository60sWebhook configurations
OutputSchemaRepository60sOutput schema configs

Key Format

All Redis keys follow the pattern:

{keyPrefix}:{cacheName}:{key}

For example, with the default dvara prefix:

dvara:tenants:acme-corp
dvara:api-keys:sk-abc123
dvara:routes:gpt-route

Configuration

gateway:
caching:
distributed:
enabled: true # required to activate decorators
ttl-seconds: 300 # default TTL (individual repos use their own TTLs)
max-entries: 10000
key-prefix: dvara # Redis key prefix

spring:
data:
redis:
host: localhost
port: 6379
password: ${REDIS_PASSWORD:}

See Configuration Reference for the full property table.