MATIH Platform is in active MVP development. Documentation reflects current implementation status.
17. Kubernetes & Helm
Cluster Setup
Azure AKS

Azure Kubernetes Service (AKS)

Azure AKS is the primary deployment target for MATIH. The cluster is provisioned via Terraform with Azure CNI networking, Workload Identity for pod-level Azure access, and integration with Azure Key Vault for secret management.


Cluster Configuration

The AKS cluster is provisioned through the Terraform module at infrastructure/terraform/modules/azure/aks/:

resource "azurerm_kubernetes_cluster" "aks" {
  name                = "matih-${var.environment}"
  location            = var.location
  resource_group_name = var.resource_group_name
  dns_prefix          = "matih-${var.environment}"
  kubernetes_version  = "1.29"
 
  default_node_pool {
    name                = "system"
    vm_size             = "Standard_D4s_v3"
    node_count          = 3
    vnet_subnet_id      = var.subnet_id
    os_disk_size_gb     = 128
    os_disk_type        = "Managed"
    max_pods            = 110
    zones               = [1, 2, 3]
  }
 
  identity {
    type = "SystemAssigned"
  }
 
  network_profile {
    network_plugin    = "azure"
    network_policy    = "calico"
    service_cidr      = "10.1.0.0/16"
    dns_service_ip    = "10.1.0.10"
  }
 
  oidc_issuer_enabled       = true
  workload_identity_enabled = true
}

Node Pools

AKS uses dedicated node pools with autoscaling for each workload class:

Node PoolVM SizeMin/MaxPurposeTaint
systemStandard_D4s_v33/3System componentsNone
ctrlplaneStandard_D4s_v32/5Control plane servicesmatih.ai/control-plane=true:NoSchedule
dataplaneStandard_D8s_v32/10Data plane servicesmatih.ai/data-plane=true:NoSchedule
computeStandard_E16s_v32/10Trino, Spark workersmatih.ai/compute=true:NoSchedule
aicomputeStandard_D8s_v31/8AI/ML workloadsmatih.ai/ai-compute=true:NoSchedule
gpuStandard_NC6s_v30/4GPU inferencenvidia.com/gpu=true:NoSchedule
playgroundStandard_D2s_v31/3Playground/free tiermatih.io/playground=true:NoSchedule

Workload Identity

AKS Workload Identity allows pods to authenticate to Azure services without embedded credentials:

# ServiceAccount with Workload Identity annotation
apiVersion: v1
kind: ServiceAccount
metadata:
  name: external-secrets
  namespace: external-secrets
  annotations:
    azure.workload.identity/client-id: "${AKS_IDENTITY_CLIENT_ID}"

The following services use Workload Identity:

ServiceAzure ResourcePurpose
external-secretsAzure Key VaultSecret synchronization
cert-managerAzure DNSDNS01 ACME challenges
infrastructure-serviceARM APITenant infrastructure provisioning
ai-serviceAzure OpenAILLM inference API calls

Azure Container Registry

Images are stored in Azure Container Registry (ACR) with AKS kubelet identity pull permissions:

# Global image configuration
global:
  imageRegistry: matihlabsacr.azurecr.io/matih
  imagePullSecrets:
    - name: acr-secret
    - name: platform-acr-secret

For multi-tenant deployments, each tenant can have its own ACR with images synced from the platform ACR by the CD pipeline stage 04a.


Network Configuration

AKS uses Azure CNI for pod networking, providing each pod with a routable IP from the VNet:

SettingValue
Network pluginAzure CNI
Network policyCalico
Service CIDR10.1.0.0/16
DNS service IP10.1.0.10
Pod CIDRAllocated from VNet subnet
Max pods per node110

Monitoring Integration

AKS integrates with Azure Monitor and the platform observability stack:

  • Container Insights: Azure-native monitoring (optional, can be disabled if Prometheus is preferred)
  • Prometheus: Platform-deployed Prometheus scrapes all service metrics via ServiceMonitor CRDs
  • Log Analytics: Optional forwarding to Azure Log Analytics workspace
  • Grafana: Platform Grafana with Azure Monitor data source for infrastructure metrics