MATIH Platform is in active MVP development. Documentation reflects current implementation status.
17. Kubernetes & Helm
Horizontal Pod Autoscaler

Horizontal Pod Autoscaler

The Horizontal Pod Autoscaler (HPA) automatically scales the number of pod replicas for MATIH services based on CPU utilization, memory utilization, or custom metrics. HPAs are defined in the Helm chart templates and configured through values files for each environment.


HPA Configuration

The AI Service HPA template in infrastructure/helm/ai-service/templates/hpa.yaml uses the autoscaling/v2 API:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ai-service
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ai-service
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Service HPA Configurations

ServiceMinMaxCPU TargetMemory TargetCustom Metric
AI Service21070%80%LLM queue depth
Query Engine2870%N/AQuery backlog
API Gateway21060%N/ARequest rate
ML Service1670%80%N/A
BI Service1470%N/AN/A
Catalog Service1370%N/AN/A

Scaling Behavior

The MATIH platform uses asymmetric scaling behavior for all HPAs:

Scale Up (Aggressive)

ParameterValueRationale
Stabilization window0 secondsReact immediately to load spikes
Max pods per 15s4Add capacity quickly
Max percentage per 15s100%Double capacity if needed
Policy selectionMaxUse whichever policy adds more pods

Scale Down (Conservative)

ParameterValueRationale
Stabilization window300 secondsWait 5 minutes before scaling down
Max percentage per 60s10%Scale down gradually

Helm Values Configuration

HPA settings are configured in the Helm values files:

# values.yaml (base defaults)
autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70
  targetMemoryUtilizationPercentage: 80
 
# values-dev.yaml (development overrides)
autoscaling:
  enabled: false  # Disabled in dev to save resources
  minReplicas: 1
  maxReplicas: 3

Prerequisites

HPA requires the Metrics Server to be deployed in the cluster:

Kubernetes Metrics Server --> kubelet cAdvisor --> Pod CPU/Memory metrics
         |
         v
  HPA Controller --> Scale Deployment replicas

Troubleshooting

IssueSymptomResolution
HPA not scalingTARGETS: unknown/70%Verify Metrics Server is running
Scaling too slowPods not added during load spikeReduce stabilization window
Scaling oscillationReplicas rapidly increase and decreaseIncrease stabilization window
At max replicasHPA at maxReplicas during peakIncrease maxReplicas or optimize pod resources

Monitoring

MetricDescription
kube_hpa_status_current_replicasCurrent replica count
kube_hpa_status_desired_replicasDesired replica count
kube_hpa_status_conditionHPA condition (ScalingActive, AbleToScale)

Alert rules should be configured for:

  • HPA at maximum replicas for more than 30 minutes
  • HPA unable to scale (metrics unavailable)
  • Rapid scaling events (more than 5 scale changes in 10 minutes)