MATIH Platform is in active MVP development. Documentation reflects current implementation status.
7. Tenant Lifecycle
Ingress

Per-Tenant Ingress

Each tenant in the MATIH platform receives a dedicated NGINX ingress controller with its own LoadBalancer IP address. This architecture provides complete network isolation between tenants at the ingress layer, enables per-tenant rate limiting, and supports tenant-specific TLS certificates and custom domains.


Ingress Architecture

Internet
    |
    v
Azure Load Balancer (20.85.123.45)
    |
    v
NGINX Ingress Controller (matih-acme namespace)
    |
    +--- Host: acme.matih.ai/api/* ---------> api-gateway:8080
    +--- Host: bi.acme.matih.ai/* -----------> bi-workbench:3000
    +--- Host: acme.matih.ai/ws/* -----------> ai-service:8000
    +--- Host: data.acme.matih.ai/* ---------> data-workbench:3002

Key Components

ComponentLocationDescription
TenantIngressServicecontrol-plane/tenant-service/.../service/TenantIngressService.javaIngress controller deployment and management
IngressProvisionercontrol-plane/tenant-service/.../provisioning/IngressProvisioner.javaProvisioning phase integration
RetryableHelmServicecontrol-plane/tenant-service/.../service/helm/RetryableHelmService.javaHelm operations with retry logic
NGINX values templateinfrastructure/helm/ingress-nginx/values-tenant.yamlPer-tenant NGINX configuration

Dedicated Ingress Controller Deployment

The TenantIngressService.deployIngressController() method deploys a dedicated NGINX ingress controller per tenant using Helm:

Deployment Configuration

# Per-tenant NGINX ingress controller values
controller:
  ingressClassResource:
    name: nginx-acme                              # Unique IngressClass per tenant
    controllerValue: k8s.io/ingress-nginx-acme    # Unique controller identifier
  ingressClass: nginx-acme
  replicaCount: 2                                  # Based on tenant tier
  service:
    type: LoadBalancer
    annotations:
      service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: /healthz
  admissionWebhooks:
    enabled: false                                 # Disabled for tenant controllers
  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      cpu: 500m
      memory: 256Mi

IngressClass Isolation

Each tenant gets a unique IngressClass name following the pattern nginx-{tenant-slug}. This prevents one tenant's ingress controller from processing another tenant's Ingress resources:

TenantIngressClassController Value
acmenginx-acmek8s.io/ingress-nginx-acme
betanginx-betak8s.io/ingress-nginx-beta
gammanginx-gammak8s.io/ingress-nginx-gamma

Replica Scaling by Tier

The number of ingress controller replicas is determined by the tenant tier:

TierReplicasRationale
Free1Cost optimization, acceptable single-point risk
Professional2High availability with rolling updates
Enterprise3Full redundancy across availability zones

The method getTenantIngressReplicas(tenant) in TenantIngressService determines the replica count based on the tenant's tier.


LoadBalancer IP Assignment

After deploying the ingress controller, the TenantIngressService polls the Kubernetes API for the LoadBalancer's external IP:

Polling Logic

public String waitForLoadBalancerIp(String namespace, int maxWaitSeconds) {
    int pollInterval = properties.getIngress()
            .getLoadBalancerPollIntervalSeconds();
    int maxAttempts = maxWaitSeconds / pollInterval;
 
    for (int attempt = 1; attempt <= maxAttempts; attempt++) {
        ServiceList services = kubernetesClient.services()
                .inNamespace(namespace)
                .withLabel("app.kubernetes.io/name", "ingress-nginx")
                .list();
 
        for (Service svc : services.getItems()) {
            List<LoadBalancerIngress> ingresses =
                svc.getStatus().getLoadBalancer().getIngress();
            if (ingresses != null && !ingresses.isEmpty()) {
                String ip = ingresses.get(0).getIp();
                if (ip != null && !ip.isBlank()) {
                    return ip;
                }
            }
        }
 
        Thread.sleep(pollInterval * 1000L);
    }
 
    throw new RuntimeException(
        "LoadBalancer IP not assigned within " + maxWaitSeconds + " seconds");
}

Timeout Configuration

PropertyDefaultDescription
matih.azure.ingress.load-balancer-poll-interval-seconds10Polling interval
matih.azure.ingress.load-balancer-max-wait-seconds600Maximum wait time

Ingress Resource Creation

After the ingress controller is running and has an IP, the service creates Kubernetes Ingress resources that define routing rules.

Standard Routing Rules

Host PatternPathBackend ServicePort
acme.matih.ai/api/*api-gateway8080
acme.matih.ai/ws/*ai-service8000
bi.acme.matih.ai/*bi-workbench3000
data.acme.matih.ai/*data-workbench3002
ml.acme.matih.ai/*ml-workbench3001
agentic.acme.matih.ai/*agentic-workbench3003

TLS Configuration

TLS is terminated at the ingress controller using certificates issued by cert-manager:

spec:
  tls:
    - hosts:
        - acme.matih.ai
        - "*.acme.matih.ai"
      secretName: acme-tls-certificate

The TLS certificate covers both the root domain and all subdomains via a wildcard SAN.


NGINX Configuration Tuning

The per-tenant NGINX ingress controller is configured with production-grade settings:

Timeouts and Buffers

SettingValueDescription
proxy-read-timeout300sFor long-running queries
proxy-send-timeout300sFor large result sets
proxy-body-size50mFor file uploads
proxy-buffer-size16kFor large headers (JWT tokens)
keepalive-timeout75sConnection reuse

WebSocket Support

The AI service uses WebSocket connections for streaming responses. The ingress controller is configured to support WebSocket upgrades:

annotations:
  nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
  nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
  nginx.ingress.kubernetes.io/upstream-hash-by: "$remote_addr"

Rate Limiting

Per-tenant rate limiting is configured at the ingress level:

TierRate LimitBurst
Free100 req/s200
Professional500 req/s1000
EnterpriseCustomCustom

Ingress Health Monitoring

The ingress controller exposes health endpoints that are monitored by the tenant monitoring stack:

EndpointPurpose
/healthzLiveness probe
/metricsPrometheus metrics
/nginx_statusNGINX stub status

Key Metrics

MetricDescription
nginx_ingress_controller_requestsTotal requests by status code
nginx_ingress_controller_response_duration_secondsRequest latency histogram
nginx_ingress_controller_nginx_process_connectionsActive connections
nginx_ingress_controller_ssl_certificate_expiryCertificate expiry time

Cleanup on Tenant Deletion

When a tenant is deprovisioned, the ingress resources are cleaned up in order:

  1. Delete Ingress resources (routing rules)
  2. Delete TLS certificate and secret
  3. Uninstall NGINX ingress controller Helm release
  4. Azure releases the LoadBalancer IP
  5. Delete IngressClass resource

Dev Environment Configuration

In development environments, per-tenant ingress is typically disabled to reduce resource consumption:

AspectDevProduction
Dedicated ingressDisabledEnabled
Access methodkubectl port-forward or shared ingressDedicated LoadBalancer
TLSSelf-signed or disabledLet's Encrypt production
IngressClassDefault nginxPer-tenant nginx-{slug}

Troubleshooting

IssueDiagnosticResolution
No external IPCheck Azure LB quotaRequest quota increase
502 Bad GatewayCheck backend pod readinessVerify service health
TLS certificate errorCheck cert-manager logsVerify DNS-01 challenge resolution
Routing 404Check IngressClass matchVerify ingressClassName matches controller
Connection timeoutCheck NGINX timeout settingsIncrease proxy timeouts

Validation Script

./scripts/tools/validate-tenant-ingress.sh --tenant acme

This script validates:

  • Ingress controller pods are running
  • LoadBalancer IP is assigned
  • Ingress resources exist with correct rules
  • TLS certificate is valid and not expiring
  • Backend services are reachable through the ingress

Next Steps