Cluster Autoscaler

The Cluster Autoscaler automatically adjusts the number of nodes in the Kubernetes cluster based on pod scheduling demands. When pods cannot be scheduled due to insufficient resources, the Cluster Autoscaler adds nodes. When nodes are underutilized, it removes them to reduce costs.

Cluster Autoscaler Architecture

Unschedulable Pods --> Cluster Autoscaler --> Cloud Provider API --> Add Nodes
                            |
Underutilized Nodes --> Cluster Autoscaler --> Cloud Provider API --> Remove Nodes

Scaling Triggers

Scale Up

The Cluster Autoscaler adds nodes when:

Condition	Description
Unschedulable pods	Pods in Pending state due to insufficient CPU/memory
HPA ceiling	HPA wants more replicas but no node capacity
PVC pending	Persistent volumes cannot be provisioned in current zone

Scale Down

The Cluster Autoscaler removes nodes when:

Condition	Description
Low utilization	Node resource utilization below threshold for 10+ minutes
Pods movable	All pods on the node can be rescheduled elsewhere
No constraints	No PDBs, local storage, or system pods preventing eviction

Node Pool Configuration

The MATIH platform uses multiple node pools for workload isolation:

Node Pool	Instance Type	Min	Max	Autoscale	Purpose
system	Standard_D4s_v3	2	4	Yes	Control plane services
dataplane	Standard_D8s_v3	2	10	Yes	Data plane services
ml-compute	Standard_D16s_v3	0	6	Yes	ML training and inference
gpu	Standard_NC6s_v3	0	4	Yes	GPU workloads (LLM, Triton)
monitoring	Standard_D4s_v3	1	3	Yes	Prometheus, Grafana, Loki

Configuration Parameters

Parameter	Value	Description
`scan-interval`	10s	How often the autoscaler checks for unschedulable pods
`scale-down-delay-after-add`	10m	Cooldown after adding a node
`scale-down-delay-after-delete`	0s	Cooldown after removing a node
`scale-down-unneeded-time`	10m	Time node must be underutilized before removal
`scale-down-utilization-threshold`	0.5	Node utilization below which scale-down is considered
`max-graceful-termination-sec`	600	Max time for pod graceful termination during scale-down
`skip-nodes-with-system-pods`	true	Protect nodes running kube-system pods
`skip-nodes-with-local-storage`	true	Protect nodes with local PVs

Cloud Provider Integration

Provider	Managed Offering	API
Azure	AKS Cluster Autoscaler	Azure VMSS
AWS	EKS Cluster Autoscaler	AWS ASG
GCP	GKE Cluster Autoscaler	GCE MIG

Azure AKS Configuration

For the MATIH Azure deployment, Cluster Autoscaler is managed natively by AKS:

# Node pool autoscaling is configured via Terraform
resource "azurerm_kubernetes_cluster_node_pool" "dataplane" {
  name                = "dataplane"
  kubernetes_cluster_id = azurerm_kubernetes_cluster.main.id
  vm_size             = "Standard_D8s_v3"
  enable_auto_scaling = true
  min_count           = 2
  max_count           = 10
  node_labels = {
    "matih.io/node-pool" = "dataplane"
  }
}

Pod Disruption Budgets

Critical services have PDBs to prevent the autoscaler from removing nodes hosting essential pods:

Service	MinAvailable	MaxUnavailable
AI Service	1	N/A
Query Engine	1	N/A
API Gateway	1	N/A
PostgreSQL	N/A	1
Redis	N/A	1

Monitoring

Metric	Description
`cluster_autoscaler_nodes_count`	Current node count by pool
`cluster_autoscaler_scaled_up_nodes_total`	Nodes added by autoscaler
`cluster_autoscaler_scaled_down_nodes_total`	Nodes removed by autoscaler
`cluster_autoscaler_unschedulable_pods_count`	Pending unschedulable pods

Troubleshooting

Issue	Symptom	Resolution
Pods stuck Pending	Nodes not added	Check node pool max limit and quotas
Slow scale-up	5+ minutes to add capacity	Check cloud API response time
Nodes not removed	Underutilized nodes remain	Check PDBs and local storage constraints
Budget exceeded	Too many nodes running	Lower max node count or utilization threshold

Custom Metrics Overview