Response Caching
Dvara supports exact-match response caching to avoid hitting upstream providers for repeated identical prompts. This reduces latency and cost.
Backends
Two cache backends are available:
| Backend | Use Case | Dependencies |
|---|---|---|
| Caffeine | Single-instance deployments (default) | None (in-memory) |
| Redis | Multi-instance deployments | spring-boot-starter-data-redis |
Enabling Caching
Set gateway.cache.enabled=true in application.yml:
gateway:
cache:
enabled: true
ttl-seconds: 3600 # cache entry time-to-live (default: 3600)
max-size: 10000 # max entries in Caffeine cache (default: 10000)
When caching is disabled (default), a no-op cache is injected. All requests pass through to providers with zero overhead.
Cache Key Derivation
The cache key is a SHA-256 hash of the canonical request:
Included in key: model + messages (role + content, in order) + temperature + maxTokens
Excluded from key:
stream— streaming requests bypass the cache entirelymetadata— transient, not part of the semantic request
Two requests produce the same cache key only if they have identical model, messages, temperature, and max tokens.
How It Works
- A non-streaming request arrives at
POST /v1/chat/completions. - If the
X-Cache-Control: no-cacheheader is present, the cache lookup is skipped. - Otherwise, the gateway looks up the request in the cache.
- Cache hit: the cached response is returned immediately with
X-Cache: HIT. No provider call is made. - Cache miss: the request is forwarded to the provider. The response is stored in the cache and returned with
X-Cache: MISS. - Streaming requests (
"stream": true) bypass the cache entirely — no lookup, no storage, noX-Cacheheader.
Response Headers
| Header | Value | When |
|---|---|---|
X-Cache | HIT | Response served from cache |
X-Cache | MISS | Response fetched from provider, now cached |
| (absent) | — | Streaming request (cache bypassed) |
Bypassing the Cache
Send the X-Cache-Control: no-cache header to force a fresh provider call:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Cache-Control: no-cache" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "What time is it?"}]
}'
The response will still be stored in the cache (with X-Cache: MISS), so subsequent requests without the bypass header will hit the cache.
Using Redis
For multi-instance deployments, enable the Redis backend:
gateway:
cache:
enabled: true
ttl-seconds: 3600
redis:
enabled: true
spring:
data:
redis:
host: localhost
port: 6379
Redis keys are stored with the prefix dvara:cache: and TTL is set per entry.
Graceful degradation: if Redis is unavailable, the cache silently falls through to the provider. Requests are never blocked by a Redis failure.
Cache Metering
Cache hits are logged at INFO level with the number of tokens saved:
Cache HIT for key=a1b2c3d4e5f6 — tokens saved: 33
Token usage from cached responses is still recorded for rate-limiting purposes.
Enterprise Override
Both CaffeineResponseCache and RedisResponseCache are registered with @ConditionalOnMissingBean semantics. Enterprise deployments can provide a custom ResponseCache bean to integrate with any caching infrastructure.
Enterprise Distributed Repository Caching
Requires enterprise-caching on classpath + valid JWT license key + gateway.caching.distributed.enabled=true.
For multi-instance enterprise deployments, the enterprise-caching module provides Redis cache-aside decorators that wrap underlying repository implementations. This ensures all gateway instances share a consistent view of configuration data without querying the database on every read.
How It Works
Nine repository decorators follow the cache-aside pattern:
- Read path: check Redis first; on cache miss, read from the underlying repository and populate Redis.
- Write path: write to the underlying repository, then evict the corresponding cache entry so the next read fetches fresh data.
Decorated Repositories and TTLs
| Repository | Cache TTL | Description |
|---|---|---|
TenantRepository | 60s | Tenant configuration |
ApiKeyRepository | 30s | API key lookups (shorter TTL for security) |
RouteRepository | 60s | Route configurations |
PolicyRepository | 30s | Policy definitions (shorter TTL for policy changes) |
ModelPricingRepository | 60s | Model pricing entries |
BudgetCapRepository | 30s | Budget cap definitions |
McpServerRepository | 60s | MCP server registrations |
WebhookRepository | 60s | Webhook configurations |
OutputSchemaRepository | 60s | Output schema configs |
Key Format
All Redis keys follow the pattern:
{keyPrefix}:{cacheName}:{key}
For example, with the default dvara prefix:
dvara:tenants:acme-corp
dvara:api-keys:sk-abc123
dvara:routes:gpt-route
Configuration
gateway:
caching:
distributed:
enabled: true # required to activate decorators
ttl-seconds: 300 # default TTL (individual repos use their own TTLs)
max-entries: 10000
key-prefix: dvara # Redis key prefix
spring:
data:
redis:
host: localhost
port: 6379
password: ${REDIS_PASSWORD:}
See Configuration Reference for the full property table.