Skip to main content

Rate Limiting

Dvara supports in-memory, per-API-key rate limiting for single-instance deployments. Rate limits are enforced as a servlet filter on all /v1/* endpoints.

Enabling Rate Limiting

Set gateway.rate-limit.enabled=true in application.yml:

gateway:
rate-limit:
enabled: true
global:
requests-per-second: 100
default-per-key:
requests-per-second: 10
tokens-per-minute: 100000

How It Works

Three limit types are checked on every request:

LimitWindowDescription
Global requests/sec1-second sliding windowTotal requests across all API keys. Checked first.
Per-key requests/sec1-second sliding windowRequests from a single API key.
Per-key tokens/min60-second sliding windowTotal tokens consumed by a single API key. Tracked after each successful response.

API Key Extraction

The gateway extracts the API key from the Authorization header:

Authorization: Bearer sk-my-api-key

If no Authorization header is present (or the Bearer token is empty), the request is rate-limited under the anonymous key.

Rate Limit Exceeded Response

When a limit is exceeded, the gateway returns HTTP 429 Too Many Requests with a Retry-After header:

HTTP/1.1 429 Too Many Requests
Retry-After: 1
X-Trace-ID: a6783439db1f46a6bfed511a0011e955
{
"error": {
"message": "Per-key request rate limit exceeded",
"type": "rate_limit_error",
"code": "rate_limit_exceeded",
"trace_id": "a6783439db1f46a6bfed511a0011e955"
}
}

Per-Key Overrides

Specific API keys can be given higher or lower limits than the default:

gateway:
rate-limit:
enabled: true
default-per-key:
requests-per-second: 10
tokens-per-minute: 100000
keys:
sk-premium-key:
requests-per-second: 100
tokens-per-minute: 1000000
sk-trial-key:
requests-per-second: 2
tokens-per-minute: 10000

Full Configuration Example

gateway:
rate-limit:
enabled: true
global:
requests-per-second: 100 # gateway-wide cap
default-per-key:
requests-per-second: 10 # default per-key request rate
tokens-per-minute: 100000 # default per-key token budget
keys:
sk-prod-key1:
requests-per-second: 50
tokens-per-minute: 500000
sk-dev-key:
requests-per-second: 5
tokens-per-minute: 50000

Token Budget Tracking

Token usage is recorded after each successful response. The total_tokens value from the provider response is deducted from the key's token budget. When the budget is exhausted, subsequent requests return 429 until the 60-second window slides forward.

Hot Reload

Rate limit configuration is read from GatewayProperties on every request. Changes applied via Spring Boot Actuator /refresh (or any mechanism that updates @ConfigurationProperties) take effect immediately without a restart.

Enterprise Override

The in-memory rate limiter is registered with @ConditionalOnMissingBean semantics. Enterprise deployments can replace it with a distributed implementation (e.g., Redis + Bucket4j) by providing a custom RateLimiter bean.

Enterprise Redis Distributed Rate Limiter

Requires enterprise-caching on classpath + valid JWT license key + gateway.ratelimit.distributed.enabled=true.

For multi-instance deployments, the in-memory Caffeine-based rate limiter does not share state across gateway instances. The enterprise-caching module provides a Redis-backed distributed rate limiter that replaces the default in-memory implementation, ensuring consistent rate limiting across all instances.

How It Works

The distributed rate limiter uses Redis sorted sets with an atomic Lua script to implement a sliding window algorithm. Each request adds a timestamped entry to the sorted set and removes expired entries in a single atomic operation. This guarantees accurate counting even under high concurrency across multiple gateway instances.

Key Format

Rate limit state is stored in Redis with the following key pattern:

dvara:ratelimit:{tenantId}:{apiKey}

Configuration

Enable the distributed rate limiter alongside the standard rate limit settings:

gateway:
ratelimit:
distributed:
enabled: true # replaces in-memory rate limiter

rate-limit:
enabled: true
global:
requests-per-second: 100
default-per-key:
requests-per-second: 10
tokens-per-minute: 100000

spring:
data:
redis:
host: localhost
port: 6379
password: ${REDIS_PASSWORD:}

The distributed rate limiter is registered via CachingAutoConfiguration and replaces the default RateLimiter bean. All existing rate limit configuration properties (gateway.rate-limit.*) continue to work unchanged -- only the storage backend changes from in-memory to Redis.

See Configuration Reference for the full property table.