Rate Limiting
Dvara supports in-memory, per-API-key rate limiting for single-instance deployments. Rate limits are enforced as a servlet filter on all /v1/* endpoints.
Enabling Rate Limiting
Set gateway.rate-limit.enabled=true in application.yml:
gateway:
rate-limit:
enabled: true
global:
requests-per-second: 100
default-per-key:
requests-per-second: 10
tokens-per-minute: 100000
How It Works
Three limit types are checked on every request:
| Limit | Window | Description |
|---|---|---|
| Global requests/sec | 1-second sliding window | Total requests across all API keys. Checked first. |
| Per-key requests/sec | 1-second sliding window | Requests from a single API key. |
| Per-key tokens/min | 60-second sliding window | Total tokens consumed by a single API key. Tracked after each successful response. |
API Key Extraction
The gateway extracts the API key from the Authorization header:
Authorization: Bearer sk-my-api-key
If no Authorization header is present (or the Bearer token is empty), the request is rate-limited under the anonymous key.
Rate Limit Exceeded Response
When a limit is exceeded, the gateway returns HTTP 429 Too Many Requests with a Retry-After header:
HTTP/1.1 429 Too Many Requests
Retry-After: 1
X-Trace-ID: a6783439db1f46a6bfed511a0011e955
{
"error": {
"message": "Per-key request rate limit exceeded",
"type": "rate_limit_error",
"code": "rate_limit_exceeded",
"trace_id": "a6783439db1f46a6bfed511a0011e955"
}
}
Per-Key Overrides
Specific API keys can be given higher or lower limits than the default:
gateway:
rate-limit:
enabled: true
default-per-key:
requests-per-second: 10
tokens-per-minute: 100000
keys:
sk-premium-key:
requests-per-second: 100
tokens-per-minute: 1000000
sk-trial-key:
requests-per-second: 2
tokens-per-minute: 10000
Full Configuration Example
gateway:
rate-limit:
enabled: true
global:
requests-per-second: 100 # gateway-wide cap
default-per-key:
requests-per-second: 10 # default per-key request rate
tokens-per-minute: 100000 # default per-key token budget
keys:
sk-prod-key1:
requests-per-second: 50
tokens-per-minute: 500000
sk-dev-key:
requests-per-second: 5
tokens-per-minute: 50000
Token Budget Tracking
Token usage is recorded after each successful response. The total_tokens value from the provider response is deducted from the key's token budget. When the budget is exhausted, subsequent requests return 429 until the 60-second window slides forward.
Hot Reload
Rate limit configuration is read from GatewayProperties on every request. Changes applied via Spring Boot Actuator /refresh (or any mechanism that updates @ConfigurationProperties) take effect immediately without a restart.
Enterprise Override
The in-memory rate limiter is registered with @ConditionalOnMissingBean semantics. Enterprise deployments can replace it with a distributed implementation (e.g., Redis + Bucket4j) by providing a custom RateLimiter bean.
Enterprise Redis Distributed Rate Limiter
Requires enterprise-caching on classpath + valid JWT license key + gateway.ratelimit.distributed.enabled=true.
For multi-instance deployments, the in-memory Caffeine-based rate limiter does not share state across gateway instances. The enterprise-caching module provides a Redis-backed distributed rate limiter that replaces the default in-memory implementation, ensuring consistent rate limiting across all instances.
How It Works
The distributed rate limiter uses Redis sorted sets with an atomic Lua script to implement a sliding window algorithm. Each request adds a timestamped entry to the sorted set and removes expired entries in a single atomic operation. This guarantees accurate counting even under high concurrency across multiple gateway instances.
Key Format
Rate limit state is stored in Redis with the following key pattern:
dvara:ratelimit:{tenantId}:{apiKey}
Configuration
Enable the distributed rate limiter alongside the standard rate limit settings:
gateway:
ratelimit:
distributed:
enabled: true # replaces in-memory rate limiter
rate-limit:
enabled: true
global:
requests-per-second: 100
default-per-key:
requests-per-second: 10
tokens-per-minute: 100000
spring:
data:
redis:
host: localhost
port: 6379
password: ${REDIS_PASSWORD:}
The distributed rate limiter is registered via CachingAutoConfiguration and replaces the default RateLimiter bean. All existing rate limit configuration properties (gateway.rate-limit.*) continue to work unchanged -- only the storage backend changes from in-memory to Redis.
See Configuration Reference for the full property table.