Rate Limiting
The MATIH Platform enforces per-tenant rate limiting to protect shared infrastructure from abuse and ensure fair resource distribution across tenants. Rate limits are applied at the Kong API Gateway and enforced by the billing service tier configuration.
Rate Limiting Strategy
Rate limits are applied at two levels:
| Level | Enforcement Point | Scope |
|---|---|---|
| API Gateway | Kong rate-limit plugin | Per-tenant, per-endpoint |
| Service level | Application middleware | Per-tenant, per-resource |
Rate Limits by Tier
| Tier | Requests per Minute | Requests per Hour | Burst Limit |
|---|---|---|---|
| Free | 60 | 1,000 | 10 |
| Professional | 300 | 10,000 | 50 |
| Enterprise | 1,000 | 100,000 | 200 |
Rate Limit Headers
Every API response includes rate limit headers:
| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum requests allowed in the current window |
X-RateLimit-Remaining | Remaining requests in the current window |
X-RateLimit-Reset | Unix timestamp when the current window resets |
Retry-After | Seconds to wait before retrying (only on 429 responses) |
Rate Limit Response
When a rate limit is exceeded, the API returns a 429 status:
{
"success": false,
"error": {
"code": "RATE_LIMITED",
"category": "CLIENT_ERROR",
"message": "Rate limit exceeded. Try again in 30 seconds.",
"details": {
"limit": 300,
"remaining": 0,
"resetAt": "2026-02-12T10:31:00Z",
"retryAfter": 30
}
}
}Per-Endpoint Rate Limits
Some endpoints have specific rate limits in addition to the global tenant limit:
| Endpoint Category | Additional Limit | Rationale |
|---|---|---|
Authentication (/auth/login) | 10 per minute per IP | Brute-force prevention |
AI chat (/ai/chat) | 30 per minute | LLM resource protection |
Query execution (/query/execute) | 100 per minute | Database load protection |
Export (/bi/export) | 10 per minute | Rendering resource protection |
Bulk operations (/*/bulk) | 5 per minute | Resource-intensive operations |
Implementation
Gateway Level (Kong)
The Kong rate-limit plugin uses Redis as a backing store for rate limit counters:
Request arrives at Kong Gateway
|
v
Extract tenant_id from JWT claims
|
v
Check Redis counter: rate_limit:{tenant_id}:{window}
|
+-- Counter < limit --> Increment counter, forward request
|
+-- Counter >= limit --> Return 429 Too Many RequestsService Level
Services can enforce additional rate limits using the RateLimiter from commons-java:
@RateLimited(
key = "tenant_id",
limit = 30,
window = 60,
unit = TimeUnit.SECONDS
)
@PostMapping("/api/v1/ai/chat")
public ResponseEntity<ChatResponse> chat(@RequestBody ChatRequest request) {
// Handler code
}Quota Management
The billing service tracks cumulative usage against monthly quotas:
| Metric | Free Tier | Professional | Enterprise |
|---|---|---|---|
| AI tokens per month | 10,000 | 500,000 | Unlimited |
| Queries per month | 1,000 | 50,000 | Unlimited |
| Storage (GB) | 1 | 50 | Custom |
| API calls per month | 10,000 | 100,000 | Unlimited |
When a quota is reached, the service returns a 402 Payment Required response with instructions to upgrade.
Monitoring
Rate limiting metrics are exposed through Prometheus:
| Metric | Type | Description |
|---|---|---|
kong_rate_limit_total | Counter | Total rate limit checks |
kong_rate_limit_exceeded_total | Counter | Total rate limit rejections |
matih_tenant_api_calls_total | Counter | Per-tenant API call count |
Alerts are configured for:
- Sustained rate limit rejections (indicates a tenant may need a higher tier)
- Unusual spike in API calls (potential abuse or misconfigured client)
Related Pages
- REST Conventions -- Response header conventions
- Error Handling -- 429 error response format
- Multi-Tenancy -- Tenant tier and feature gating