Cluster Networking
MATIH uses provider-native CNI plugins for pod networking, Calico for network policy enforcement, CoreDNS for service discovery, and NGINX Ingress Controllers for external traffic routing.
CNI Configuration
Each cloud provider uses its native CNI plugin:
| Provider | CNI | Pod Networking | Key Feature |
|---|---|---|---|
| AKS | Azure CNI | Pods get VNet IPs | Direct VNet integration |
| EKS | Amazon VPC CNI | Pods get VPC IPs | ENI-based networking |
| GKE | GKE VPC-native | Alias IP ranges | Secondary CIDR ranges |
All providers enforce network policies via Calico, which allows the platform to define fine-grained ingress and egress rules per namespace and service.
Service Discovery
Kubernetes DNS (CoreDNS) provides service discovery across namespaces:
# Within the same namespace
http://service-name:port
# Cross-namespace (FQDN pattern used by MATIH)
http://service-name.namespace.svc.cluster.local:port
# Examples from ai-service values
QUERY_ENGINE_URL: "http://query-engine.matih-data-plane.svc.cluster.local:8080"
IAM_SERVICE_URL: "http://iam-service.matih-control-plane.svc.cluster.local:8081"
CATALOG_URL: "http://catalog-service.matih-data-plane.svc.cluster.local:8086"MATIH services always use fully-qualified domain names (FQDNs) to ensure reliable cross-namespace resolution.
Ingress Architecture
External traffic enters through NGINX Ingress Controllers:
Internet
|
v
+------------------+
| Azure/AWS/GCP LB |
+------------------+
|
v
+------------------+
| NGINX Ingress | (matih-ingress namespace)
| Controller |
+------------------+
|
+---> matih-control-plane services (IAM, Tenant, Config, etc.)
|
+---> matih-data-plane services (AI, Query Engine, BI, etc.)
|
+---> matih-frontend applications (Workbenches)Per-Tenant Ingress
For production multi-tenant deployments, each tenant can have a dedicated NGINX ingress controller:
# From infrastructure/helm/ingress-nginx/values-tenant.yaml
# Deployed in the tenant namespace with its own LoadBalancer IP
controller:
ingressClassResource:
name: "nginx-tenant-${TENANT_SLUG}"
service:
type: LoadBalancer
annotations:
service.beta.kubernetes.io/azure-load-balancer-resource-group: "${RESOURCE_GROUP}"Kafka Networking
Strimzi Kafka uses internal listeners with TLS enforcement:
# From infrastructure/k8s/kafka/kafka-cluster.yaml
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: trueAll MATIH services connect to Kafka on port 9093 (TLS). Network policies block plaintext port 9092 access:
# Kafka TLS only (port 9093) - from network-policies.yaml
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 9093DNS and TLS
TLS certificates are managed by cert-manager with DNS01 challenges:
# From infrastructure/k8s/cert-manager/cluster-issuer-dns01.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod-dns01
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: ${ACME_EMAIL}
solvers:
- dns01:
azureDNS:
resourceGroupName: ${RESOURCE_GROUP}
subscriptionID: ${SUBSCRIPTION_ID}
managedIdentity:
clientID: ${AKS_IDENTITY_CLIENT_ID}The DNS zone hierarchy supports per-tenant subdomains:
matih.ai (Platform DNS zone)
|
+-- api.matih.ai (Control plane API)
+-- acme.matih.ai (Tenant: Acme Corp)
+-- bigcorp.matih.ai (Tenant: BigCorp)
+-- staging.matih.ai (Staging environment)Internal Communication Ports
Key internal ports used across the cluster:
| Service | Port | Protocol | Purpose |
|---|---|---|---|
| Control plane APIs | 8080-8089 | HTTP | REST/gRPC APIs |
| AI service | 8000 | HTTP | FastAPI with WebSocket |
| Trino | 8080 | HTTP | SQL queries |
| Kafka | 9093 | TCP/TLS | Event streaming |
| PostgreSQL | 5432 | TCP | Database |
| Redis | 6379 | TCP | Cache/pub-sub |
| Spark Connect | 15002 | gRPC | Interactive Spark |
| Qdrant | 6333 | HTTP/gRPC | Vector search |
| Neo4j | 7687 | Bolt | Graph queries |