Chapter 17: Kubernetes and Helm Infrastructure
The MATIH Enterprise Platform runs on Kubernetes as a cloud-agnostic, production-grade deployment target. With over 55 Helm charts, seven dedicated namespaces, and a layered architecture spanning control plane, data plane, observability, and frontend workloads, the Kubernetes infrastructure represents the operational backbone of the entire platform. This chapter provides a comprehensive guide to the cluster architecture, namespace topology, Helm chart patterns, data infrastructure deployments, network policies, and autoscaling strategies that keep MATIH running at scale.
What You Will Learn
By the end of this chapter, you will understand:
- Cluster architecture across Azure Kubernetes Service (AKS), Amazon Elastic Kubernetes Service (EKS), and Google Kubernetes Engine (GKE), including node pool design, networking models, and identity integration
- Namespace topology for the seven MATIH namespaces, their isolation boundaries, RBAC policies, resource quotas, and inter-namespace communication patterns
- Helm chart structure including the standard per-service chart template with deployment, service, configmap, secret, ingress, HPA, PDB, ServiceMonitor, NetworkPolicy, and helper templates
- Umbrella charts for the
matih-control-plane(10 services) andmatih-data-plane(14 services), their dependency management, and value override strategies - Data infrastructure including Trino, Kafka/Strimzi, PostgreSQL, Redis, Neo4j, Qdrant, MongoDB, Elasticsearch, ChromaDB, Dgraph, and StarRocks deployments
- Network policies enforcing namespace isolation, service-to-service communication rules, and external access controls
- Autoscaling patterns with Horizontal Pod Autoscalers (HPA), Vertical Pod Autoscalers (VPA), custom Prometheus metrics, and Pod Disruption Budgets (PDB)
Chapter Structure
| Section | Description | Audience |
|---|---|---|
| Cluster Architecture | AKS, EKS, and GKE cluster setup, node pools, networking, and identity configuration | Platform engineers, DevOps |
| Namespace Topology | All seven namespaces with isolation, RBAC, resource quotas, and communication patterns | Platform engineers, security teams |
| Helm Chart Structure | Standard chart template, helper functions, values patterns, and template authoring | DevOps engineers, developers |
| Umbrella Charts | Control plane and data plane umbrella charts, dependency management, and deep merge behavior | DevOps engineers, release managers |
| Data Infrastructure | Stateful data services: Trino, Kafka, PostgreSQL, Redis, Neo4j, Qdrant, and more | Data engineers, platform engineers |
| Network Policies | Network isolation, ingress/egress rules, and service mesh considerations | Security engineers, platform engineers |
| Autoscaling Patterns | HPA, VPA, custom metrics, PDB, and scaling behavior configuration | Platform engineers, SREs |
Kubernetes at a Glance
The MATIH platform deploys across seven namespaces on a managed Kubernetes cluster, with each namespace serving a distinct operational purpose:
+------------------------------------------------------------------+
| Kubernetes Cluster |
| |
| +--------------------+ +---------------------+ |
| | matih-system | | matih-observability | |
| | (Platform infra) | | (Prometheus, Grafana | |
| | | | Loki, Tempo) | |
| +--------------------+ +---------------------+ |
| |
| +--------------------+ +---------------------+ |
| | matih-control- | | matih-monitoring- | |
| | plane | | control-plane | |
| | (IAM, Tenant, | | (CP ServiceMonitors) | |
| | Config, Audit, | +---------------------+ |
| | Notification) | |
| +--------------------+ +---------------------+ |
| | matih-monitoring- | |
| +--------------------+ | data-plane | |
| | matih-data-plane | | (DP ServiceMonitors) | |
| | (AI, BI, ML, Query | +---------------------+ |
| | Catalog, Pipeline | |
| | + Data Infra) | +---------------------+ |
| +--------------------+ | matih-frontend | |
| | (BI, ML, Data, Agent | |
| | Workbenches) | |
| +---------------------+ |
+------------------------------------------------------------------+Key Design Principles
The MATIH Kubernetes infrastructure follows several foundational design principles:
1. Security by Default
Every service runs with a hardened security posture:
- Non-root execution: All containers run as non-root user (UID 1000 or 1001)
- Read-only root filesystem: Containers cannot write to their root filesystem
- Capability dropping: All Linux capabilities are dropped with
capabilities.drop: [ALL] - Privilege escalation prevention:
allowPrivilegeEscalation: falseon every container - Network isolation: NetworkPolicies restrict traffic to explicitly allowed paths
2. Consistent Chart Patterns
Every service chart follows an identical structure:
| File | Purpose |
|---|---|
Chart.yaml | Chart metadata and dependencies |
values.yaml | Production defaults |
values-dev.yaml | Development overrides |
templates/_helpers.tpl | Reusable template functions |
templates/deployment.yaml | Deployment specification |
templates/service.yaml | ClusterIP Service |
templates/configmap.yaml | Non-sensitive configuration |
templates/secret.yaml | Sensitive data references |
templates/ingress.yaml | Optional Ingress resource |
templates/hpa.yaml | Horizontal Pod Autoscaler |
templates/pdb.yaml | Pod Disruption Budget |
templates/servicemonitor.yaml | Prometheus ServiceMonitor |
templates/networkpolicy.yaml | Network isolation rules |
templates/NOTES.txt | Post-install instructions |
3. Cloud-Agnostic Design
MATIH runs on AKS, EKS, and GKE with identical Helm charts. Cloud-specific concerns (identity, storage classes, load balancers) are abstracted through:
- Terraform modules per cloud provider
- Values overlay files per environment
- External Secrets Operator for secret management
- cert-manager for TLS certificate provisioning
4. Observable Everything
Every service exposes Prometheus metrics, structured logs, and distributed traces:
- Metrics: ServiceMonitor CRDs for automatic Prometheus scraping
- Logs: Structured JSON logging collected by Fluent-bit/Promtail into Loki
- Traces: OpenTelemetry instrumentation with Tempo as the backend
- Health: Startup, liveness, and readiness probes on every container
Resource Summary
The following table summarizes the platform's Kubernetes resource footprint at a glance:
| Metric | Count |
|---|---|
| Namespaces | 7 |
| Helm charts (total) | 55+ |
| Control plane services | 10 |
| Data plane services | 14 |
| Frontend applications | 6 |
| Data infrastructure components | 12+ |
| Custom Prometheus alert rules | 20+ |
| Network policies | 25+ |
| Horizontal Pod Autoscalers | 20+ |
| Pod Disruption Budgets | 20+ |
Prerequisites
Before diving into this chapter, you should be familiar with:
- Kubernetes fundamentals (Pods, Deployments, Services, ConfigMaps, Secrets)
- Helm 3 chart structure and templating with Go templates
- Basic networking concepts (DNS, TLS, network policies)
- Container security fundamentals (Linux capabilities, user namespaces)
- At least one managed Kubernetes provider (AKS, EKS, or GKE)
For installation and initial cluster provisioning, refer to Chapter 4: Installation and Deployment.
Navigation
Proceed to the first section to understand the cluster architecture across all three supported cloud providers:
- Next: Cluster Architecture -- AKS, EKS, and GKE cluster setup