Rate Limiting

The MATIH Platform enforces per-tenant rate limiting to protect shared infrastructure from abuse and ensure fair resource distribution across tenants. Rate limits are applied at the Kong API Gateway and enforced by the billing service tier configuration.

Rate Limiting Strategy

Rate limits are applied at two levels:

Level	Enforcement Point	Scope
API Gateway	Kong rate-limit plugin	Per-tenant, per-endpoint
Service level	Application middleware	Per-tenant, per-resource

Rate Limits by Tier

Tier	Requests per Minute	Requests per Hour	Burst Limit
Free	60	1,000	10
Professional	300	10,000	50
Enterprise	1,000	100,000	200

Rate Limit Headers

Every API response includes rate limit headers:

Header	Description
`X-RateLimit-Limit`	Maximum requests allowed in the current window
`X-RateLimit-Remaining`	Remaining requests in the current window
`X-RateLimit-Reset`	Unix timestamp when the current window resets
`Retry-After`	Seconds to wait before retrying (only on 429 responses)

Rate Limit Response

When a rate limit is exceeded, the API returns a 429 status:

{
  "success": false,
  "error": {
    "code": "RATE_LIMITED",
    "category": "CLIENT_ERROR",
    "message": "Rate limit exceeded. Try again in 30 seconds.",
    "details": {
      "limit": 300,
      "remaining": 0,
      "resetAt": "2026-02-12T10:31:00Z",
      "retryAfter": 30
    }
  }
}

Per-Endpoint Rate Limits

Some endpoints have specific rate limits in addition to the global tenant limit:

Endpoint Category	Additional Limit	Rationale
Authentication (`/auth/login`)	10 per minute per IP	Brute-force prevention
AI chat (`/ai/chat`)	30 per minute	LLM resource protection
Query execution (`/query/execute`)	100 per minute	Database load protection
Export (`/bi/export`)	10 per minute	Rendering resource protection
Bulk operations (`/*/bulk`)	5 per minute	Resource-intensive operations

Implementation

Gateway Level (Kong)

The Kong rate-limit plugin uses Redis as a backing store for rate limit counters:

Request arrives at Kong Gateway
  |
  v
Extract tenant_id from JWT claims
  |
  v
Check Redis counter: rate_limit:{tenant_id}:{window}
  |
  +-- Counter < limit --> Increment counter, forward request
  |
  +-- Counter >= limit --> Return 429 Too Many Requests

Service Level

Services can enforce additional rate limits using the RateLimiter from commons-java:

@RateLimited(
    key = "tenant_id",
    limit = 30,
    window = 60,
    unit = TimeUnit.SECONDS
)
@PostMapping("/api/v1/ai/chat")
public ResponseEntity<ChatResponse> chat(@RequestBody ChatRequest request) {
    // Handler code
}

Quota Management

The billing service tracks cumulative usage against monthly quotas:

Metric	Free Tier	Professional	Enterprise
AI tokens per month	10,000	500,000	Unlimited
Queries per month	1,000	50,000	Unlimited
Storage (GB)	1	50	Custom
API calls per month	10,000	100,000	Unlimited

When a quota is reached, the service returns a 402 Payment Required response with instructions to upgrade.

Monitoring

Rate limiting metrics are exposed through Prometheus:

Metric	Type	Description
`kong_rate_limit_total`	Counter	Total rate limit checks
`kong_rate_limit_exceeded_total`	Counter	Total rate limit rejections
`matih_tenant_api_calls_total`	Counter	Per-tenant API call count

Alerts are configured for:

Sustained rate limit rejections (indicates a tenant may need a higher tier)
Unusual spike in API calls (potential abuse or misconfigured client)

REST Conventions -- Response header conventions
Error Handling -- 429 error response format
Multi-Tenancy -- Tenant tier and feature gating

Authentication in APIs Architecture Decision Records