MATIH Platform is in active MVP development. Documentation reflects current implementation status.
17. Kubernetes & Helm
Cluster Setup
Networking

Cluster Networking

MATIH uses provider-native CNI plugins for pod networking, Calico for network policy enforcement, CoreDNS for service discovery, and NGINX Ingress Controllers for external traffic routing.


CNI Configuration

Each cloud provider uses its native CNI plugin:

ProviderCNIPod NetworkingKey Feature
AKSAzure CNIPods get VNet IPsDirect VNet integration
EKSAmazon VPC CNIPods get VPC IPsENI-based networking
GKEGKE VPC-nativeAlias IP rangesSecondary CIDR ranges

All providers enforce network policies via Calico, which allows the platform to define fine-grained ingress and egress rules per namespace and service.


Service Discovery

Kubernetes DNS (CoreDNS) provides service discovery across namespaces:

# Within the same namespace
http://service-name:port

# Cross-namespace (FQDN pattern used by MATIH)
http://service-name.namespace.svc.cluster.local:port

# Examples from ai-service values
QUERY_ENGINE_URL: "http://query-engine.matih-data-plane.svc.cluster.local:8080"
IAM_SERVICE_URL:  "http://iam-service.matih-control-plane.svc.cluster.local:8081"
CATALOG_URL:      "http://catalog-service.matih-data-plane.svc.cluster.local:8086"

MATIH services always use fully-qualified domain names (FQDNs) to ensure reliable cross-namespace resolution.


Ingress Architecture

External traffic enters through NGINX Ingress Controllers:

Internet
   |
   v
+------------------+
| Azure/AWS/GCP LB |
+------------------+
   |
   v
+------------------+
| NGINX Ingress    |  (matih-ingress namespace)
| Controller       |
+------------------+
   |
   +---> matih-control-plane services (IAM, Tenant, Config, etc.)
   |
   +---> matih-data-plane services (AI, Query Engine, BI, etc.)
   |
   +---> matih-frontend applications (Workbenches)

Per-Tenant Ingress

For production multi-tenant deployments, each tenant can have a dedicated NGINX ingress controller:

# From infrastructure/helm/ingress-nginx/values-tenant.yaml
# Deployed in the tenant namespace with its own LoadBalancer IP
controller:
  ingressClassResource:
    name: "nginx-tenant-${TENANT_SLUG}"
  service:
    type: LoadBalancer
    annotations:
      service.beta.kubernetes.io/azure-load-balancer-resource-group: "${RESOURCE_GROUP}"

Kafka Networking

Strimzi Kafka uses internal listeners with TLS enforcement:

# From infrastructure/k8s/kafka/kafka-cluster.yaml
listeners:
  - name: plain
    port: 9092
    type: internal
    tls: false
  - name: tls
    port: 9093
    type: internal
    tls: true

All MATIH services connect to Kafka on port 9093 (TLS). Network policies block plaintext port 9092 access:

# Kafka TLS only (port 9093) - from network-policies.yaml
- to:
    - namespaceSelector: {}
  ports:
    - protocol: TCP
      port: 9093

DNS and TLS

TLS certificates are managed by cert-manager with DNS01 challenges:

# From infrastructure/k8s/cert-manager/cluster-issuer-dns01.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod-dns01
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: ${ACME_EMAIL}
    solvers:
    - dns01:
        azureDNS:
          resourceGroupName: ${RESOURCE_GROUP}
          subscriptionID: ${SUBSCRIPTION_ID}
          managedIdentity:
            clientID: ${AKS_IDENTITY_CLIENT_ID}

The DNS zone hierarchy supports per-tenant subdomains:

matih.ai                     (Platform DNS zone)
  |
  +-- api.matih.ai           (Control plane API)
  +-- acme.matih.ai          (Tenant: Acme Corp)
  +-- bigcorp.matih.ai       (Tenant: BigCorp)
  +-- staging.matih.ai       (Staging environment)

Internal Communication Ports

Key internal ports used across the cluster:

ServicePortProtocolPurpose
Control plane APIs8080-8089HTTPREST/gRPC APIs
AI service8000HTTPFastAPI with WebSocket
Trino8080HTTPSQL queries
Kafka9093TCP/TLSEvent streaming
PostgreSQL5432TCPDatabase
Redis6379TCPCache/pub-sub
Spark Connect15002gRPCInteractive Spark
Qdrant6333HTTP/gRPCVector search
Neo4j7687BoltGraph queries