Horizontal Pod Autoscaler

The Horizontal Pod Autoscaler (HPA) automatically scales the number of pod replicas for MATIH services based on CPU utilization, memory utilization, or custom metrics. HPAs are defined in the Helm chart templates and configured through values files for each environment.

HPA Configuration

The AI Service HPA template in infrastructure/helm/ai-service/templates/hpa.yaml uses the autoscaling/v2 API:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ai-service
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ai-service
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Service HPA Configurations

Service	Min	Max	CPU Target	Memory Target	Custom Metric
AI Service	2	10	70%	80%	LLM queue depth
Query Engine	2	8	70%	N/A	Query backlog
API Gateway	2	10	60%	N/A	Request rate
ML Service	1	6	70%	80%	N/A
BI Service	1	4	70%	N/A	N/A
Catalog Service	1	3	70%	N/A	N/A

Scaling Behavior

The MATIH platform uses asymmetric scaling behavior for all HPAs:

Scale Up (Aggressive)

Parameter	Value	Rationale
Stabilization window	0 seconds	React immediately to load spikes
Max pods per 15s	4	Add capacity quickly
Max percentage per 15s	100%	Double capacity if needed
Policy selection	Max	Use whichever policy adds more pods

Scale Down (Conservative)

Parameter	Value	Rationale
Stabilization window	300 seconds	Wait 5 minutes before scaling down
Max percentage per 60s	10%	Scale down gradually

Helm Values Configuration

HPA settings are configured in the Helm values files:

# values.yaml (base defaults)
autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70
  targetMemoryUtilizationPercentage: 80
 
# values-dev.yaml (development overrides)
autoscaling:
  enabled: false  # Disabled in dev to save resources
  minReplicas: 1
  maxReplicas: 3

Prerequisites

HPA requires the Metrics Server to be deployed in the cluster:

Kubernetes Metrics Server --> kubelet cAdvisor --> Pod CPU/Memory metrics
         |
         v
  HPA Controller --> Scale Deployment replicas

Troubleshooting

Issue	Symptom	Resolution
HPA not scaling	`TARGETS: unknown/70%`	Verify Metrics Server is running
Scaling too slow	Pods not added during load spike	Reduce stabilization window
Scaling oscillation	Replicas rapidly increase and decrease	Increase stabilization window
At max replicas	HPA at maxReplicas during peak	Increase maxReplicas or optimize pod resources

Monitoring

Metric	Description
`kube_hpa_status_current_replicas`	Current replica count
`kube_hpa_status_desired_replicas`	Desired replica count
`kube_hpa_status_condition`	HPA condition (ScalingActive, AbleToScale)

Alert rules should be configured for:

HPA at maximum replicas for more than 30 minutes
HPA unable to scale (metrics unavailable)
Rapid scaling events (more than 5 scale changes in 10 minutes)

Overview Vertical Pod Autoscaler