MATIH Platform is in active MVP development. Documentation reflects current implementation status.
17. Kubernetes & Helm
Cluster Autoscaler

Cluster Autoscaler

The Cluster Autoscaler automatically adjusts the number of nodes in the Kubernetes cluster based on pod scheduling demands. When pods cannot be scheduled due to insufficient resources, the Cluster Autoscaler adds nodes. When nodes are underutilized, it removes them to reduce costs.


Cluster Autoscaler Architecture

Unschedulable Pods --> Cluster Autoscaler --> Cloud Provider API --> Add Nodes
                            |
Underutilized Nodes --> Cluster Autoscaler --> Cloud Provider API --> Remove Nodes

Scaling Triggers

Scale Up

The Cluster Autoscaler adds nodes when:

ConditionDescription
Unschedulable podsPods in Pending state due to insufficient CPU/memory
HPA ceilingHPA wants more replicas but no node capacity
PVC pendingPersistent volumes cannot be provisioned in current zone

Scale Down

The Cluster Autoscaler removes nodes when:

ConditionDescription
Low utilizationNode resource utilization below threshold for 10+ minutes
Pods movableAll pods on the node can be rescheduled elsewhere
No constraintsNo PDBs, local storage, or system pods preventing eviction

Node Pool Configuration

The MATIH platform uses multiple node pools for workload isolation:

Node PoolInstance TypeMinMaxAutoscalePurpose
systemStandard_D4s_v324YesControl plane services
dataplaneStandard_D8s_v3210YesData plane services
ml-computeStandard_D16s_v306YesML training and inference
gpuStandard_NC6s_v304YesGPU workloads (LLM, Triton)
monitoringStandard_D4s_v313YesPrometheus, Grafana, Loki

Configuration Parameters

ParameterValueDescription
scan-interval10sHow often the autoscaler checks for unschedulable pods
scale-down-delay-after-add10mCooldown after adding a node
scale-down-delay-after-delete0sCooldown after removing a node
scale-down-unneeded-time10mTime node must be underutilized before removal
scale-down-utilization-threshold0.5Node utilization below which scale-down is considered
max-graceful-termination-sec600Max time for pod graceful termination during scale-down
skip-nodes-with-system-podstrueProtect nodes running kube-system pods
skip-nodes-with-local-storagetrueProtect nodes with local PVs

Cloud Provider Integration

ProviderManaged OfferingAPI
AzureAKS Cluster AutoscalerAzure VMSS
AWSEKS Cluster AutoscalerAWS ASG
GCPGKE Cluster AutoscalerGCE MIG

Azure AKS Configuration

For the MATIH Azure deployment, Cluster Autoscaler is managed natively by AKS:

# Node pool autoscaling is configured via Terraform
resource "azurerm_kubernetes_cluster_node_pool" "dataplane" {
  name                = "dataplane"
  kubernetes_cluster_id = azurerm_kubernetes_cluster.main.id
  vm_size             = "Standard_D8s_v3"
  enable_auto_scaling = true
  min_count           = 2
  max_count           = 10
  node_labels = {
    "matih.io/node-pool" = "dataplane"
  }
}

Pod Disruption Budgets

Critical services have PDBs to prevent the autoscaler from removing nodes hosting essential pods:

ServiceMinAvailableMaxUnavailable
AI Service1N/A
Query Engine1N/A
API Gateway1N/A
PostgreSQLN/A1
RedisN/A1

Monitoring

MetricDescription
cluster_autoscaler_nodes_countCurrent node count by pool
cluster_autoscaler_scaled_up_nodes_totalNodes added by autoscaler
cluster_autoscaler_scaled_down_nodes_totalNodes removed by autoscaler
cluster_autoscaler_unschedulable_pods_countPending unschedulable pods

Troubleshooting

IssueSymptomResolution
Pods stuck PendingNodes not addedCheck node pool max limit and quotas
Slow scale-up5+ minutes to add capacityCheck cloud API response time
Nodes not removedUnderutilized nodes remainCheck PDBs and local storage constraints
Budget exceededToo many nodes runningLower max node count or utilization threshold