Per-Tenant Ingress
Each tenant in the MATIH platform receives a dedicated NGINX ingress controller with its own LoadBalancer IP address. This architecture provides complete network isolation between tenants at the ingress layer, enables per-tenant rate limiting, and supports tenant-specific TLS certificates and custom domains.
Ingress Architecture
Internet
|
v
Azure Load Balancer (20.85.123.45)
|
v
NGINX Ingress Controller (matih-acme namespace)
|
+--- Host: acme.matih.ai/api/* ---------> api-gateway:8080
+--- Host: bi.acme.matih.ai/* -----------> bi-workbench:3000
+--- Host: acme.matih.ai/ws/* -----------> ai-service:8000
+--- Host: data.acme.matih.ai/* ---------> data-workbench:3002Key Components
| Component | Location | Description |
|---|---|---|
TenantIngressService | control-plane/tenant-service/.../service/TenantIngressService.java | Ingress controller deployment and management |
IngressProvisioner | control-plane/tenant-service/.../provisioning/IngressProvisioner.java | Provisioning phase integration |
RetryableHelmService | control-plane/tenant-service/.../service/helm/RetryableHelmService.java | Helm operations with retry logic |
| NGINX values template | infrastructure/helm/ingress-nginx/values-tenant.yaml | Per-tenant NGINX configuration |
Dedicated Ingress Controller Deployment
The TenantIngressService.deployIngressController() method deploys a dedicated NGINX ingress controller per tenant using Helm:
Deployment Configuration
# Per-tenant NGINX ingress controller values
controller:
ingressClassResource:
name: nginx-acme # Unique IngressClass per tenant
controllerValue: k8s.io/ingress-nginx-acme # Unique controller identifier
ingressClass: nginx-acme
replicaCount: 2 # Based on tenant tier
service:
type: LoadBalancer
annotations:
service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: /healthz
admissionWebhooks:
enabled: false # Disabled for tenant controllers
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256MiIngressClass Isolation
Each tenant gets a unique IngressClass name following the pattern nginx-{tenant-slug}. This prevents one tenant's ingress controller from processing another tenant's Ingress resources:
| Tenant | IngressClass | Controller Value |
|---|---|---|
| acme | nginx-acme | k8s.io/ingress-nginx-acme |
| beta | nginx-beta | k8s.io/ingress-nginx-beta |
| gamma | nginx-gamma | k8s.io/ingress-nginx-gamma |
Replica Scaling by Tier
The number of ingress controller replicas is determined by the tenant tier:
| Tier | Replicas | Rationale |
|---|---|---|
| Free | 1 | Cost optimization, acceptable single-point risk |
| Professional | 2 | High availability with rolling updates |
| Enterprise | 3 | Full redundancy across availability zones |
The method getTenantIngressReplicas(tenant) in TenantIngressService determines the replica count based on the tenant's tier.
LoadBalancer IP Assignment
After deploying the ingress controller, the TenantIngressService polls the Kubernetes API for the LoadBalancer's external IP:
Polling Logic
public String waitForLoadBalancerIp(String namespace, int maxWaitSeconds) {
int pollInterval = properties.getIngress()
.getLoadBalancerPollIntervalSeconds();
int maxAttempts = maxWaitSeconds / pollInterval;
for (int attempt = 1; attempt <= maxAttempts; attempt++) {
ServiceList services = kubernetesClient.services()
.inNamespace(namespace)
.withLabel("app.kubernetes.io/name", "ingress-nginx")
.list();
for (Service svc : services.getItems()) {
List<LoadBalancerIngress> ingresses =
svc.getStatus().getLoadBalancer().getIngress();
if (ingresses != null && !ingresses.isEmpty()) {
String ip = ingresses.get(0).getIp();
if (ip != null && !ip.isBlank()) {
return ip;
}
}
}
Thread.sleep(pollInterval * 1000L);
}
throw new RuntimeException(
"LoadBalancer IP not assigned within " + maxWaitSeconds + " seconds");
}Timeout Configuration
| Property | Default | Description |
|---|---|---|
matih.azure.ingress.load-balancer-poll-interval-seconds | 10 | Polling interval |
matih.azure.ingress.load-balancer-max-wait-seconds | 600 | Maximum wait time |
Ingress Resource Creation
After the ingress controller is running and has an IP, the service creates Kubernetes Ingress resources that define routing rules.
Standard Routing Rules
| Host Pattern | Path | Backend Service | Port |
|---|---|---|---|
acme.matih.ai | /api/* | api-gateway | 8080 |
acme.matih.ai | /ws/* | ai-service | 8000 |
bi.acme.matih.ai | /* | bi-workbench | 3000 |
data.acme.matih.ai | /* | data-workbench | 3002 |
ml.acme.matih.ai | /* | ml-workbench | 3001 |
agentic.acme.matih.ai | /* | agentic-workbench | 3003 |
TLS Configuration
TLS is terminated at the ingress controller using certificates issued by cert-manager:
spec:
tls:
- hosts:
- acme.matih.ai
- "*.acme.matih.ai"
secretName: acme-tls-certificateThe TLS certificate covers both the root domain and all subdomains via a wildcard SAN.
NGINX Configuration Tuning
The per-tenant NGINX ingress controller is configured with production-grade settings:
Timeouts and Buffers
| Setting | Value | Description |
|---|---|---|
proxy-read-timeout | 300s | For long-running queries |
proxy-send-timeout | 300s | For large result sets |
proxy-body-size | 50m | For file uploads |
proxy-buffer-size | 16k | For large headers (JWT tokens) |
keepalive-timeout | 75s | Connection reuse |
WebSocket Support
The AI service uses WebSocket connections for streaming responses. The ingress controller is configured to support WebSocket upgrades:
annotations:
nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
nginx.ingress.kubernetes.io/upstream-hash-by: "$remote_addr"Rate Limiting
Per-tenant rate limiting is configured at the ingress level:
| Tier | Rate Limit | Burst |
|---|---|---|
| Free | 100 req/s | 200 |
| Professional | 500 req/s | 1000 |
| Enterprise | Custom | Custom |
Ingress Health Monitoring
The ingress controller exposes health endpoints that are monitored by the tenant monitoring stack:
| Endpoint | Purpose |
|---|---|
/healthz | Liveness probe |
/metrics | Prometheus metrics |
/nginx_status | NGINX stub status |
Key Metrics
| Metric | Description |
|---|---|
nginx_ingress_controller_requests | Total requests by status code |
nginx_ingress_controller_response_duration_seconds | Request latency histogram |
nginx_ingress_controller_nginx_process_connections | Active connections |
nginx_ingress_controller_ssl_certificate_expiry | Certificate expiry time |
Cleanup on Tenant Deletion
When a tenant is deprovisioned, the ingress resources are cleaned up in order:
- Delete Ingress resources (routing rules)
- Delete TLS certificate and secret
- Uninstall NGINX ingress controller Helm release
- Azure releases the LoadBalancer IP
- Delete IngressClass resource
Dev Environment Configuration
In development environments, per-tenant ingress is typically disabled to reduce resource consumption:
| Aspect | Dev | Production |
|---|---|---|
| Dedicated ingress | Disabled | Enabled |
| Access method | kubectl port-forward or shared ingress | Dedicated LoadBalancer |
| TLS | Self-signed or disabled | Let's Encrypt production |
| IngressClass | Default nginx | Per-tenant nginx-{slug} |
Troubleshooting
| Issue | Diagnostic | Resolution |
|---|---|---|
| No external IP | Check Azure LB quota | Request quota increase |
| 502 Bad Gateway | Check backend pod readiness | Verify service health |
| TLS certificate error | Check cert-manager logs | Verify DNS-01 challenge resolution |
| Routing 404 | Check IngressClass match | Verify ingressClassName matches controller |
| Connection timeout | Check NGINX timeout settings | Increase proxy timeouts |
Validation Script
./scripts/tools/validate-tenant-ingress.sh --tenant acmeThis script validates:
- Ingress controller pods are running
- LoadBalancer IP is assigned
- Ingress resources exist with correct rules
- TLS certificate is valid and not expiring
- Backend services are reachable through the ingress
Next Steps
- DNS Zone Management -- DNS records that point to the ingress
- Billing Integration -- how ingress usage affects billing
- API Reference -- ingress management endpoints