Autoscaling Overview
The MATIH platform implements a multi-layered autoscaling strategy that automatically adjusts compute resources in response to workload demands. This includes Horizontal Pod Autoscaling (HPA) for pod replicas, Vertical Pod Autoscaling (VPA) for resource requests, custom metrics-based scaling for application-specific signals, and Cluster Autoscaler for node-level capacity management.
Autoscaling Layers
| Layer | Component | Scope | Response Time |
|---|---|---|---|
| Pod horizontal | HPA | Pod replicas | 15-60 seconds |
| Pod vertical | VPA | CPU/memory requests | Minutes (on restart) |
| Application | Custom metrics | Application-specific | 15-60 seconds |
| Cluster | Cluster Autoscaler | Node count | 2-5 minutes |
Autoscaling Architecture
Prometheus (metrics) --> Metrics Server / Prometheus Adapter
|
+-----------+-----------+
| | |
HPA VPA Custom Metrics
| | |
Pod Replicas Pod Resources Pod Replicas
| | |
+-----------+-----------+
|
Cluster Autoscaler
|
Node CountServices with Autoscaling
| Service | HPA | VPA | Custom Metrics | Min Replicas | Max Replicas |
|---|---|---|---|---|---|
| AI Service | Yes | No | LLM queue depth | 2 | 10 |
| Query Engine | Yes | No | Query queue depth | 2 | 8 |
| API Gateway | Yes | No | Request rate | 2 | 10 |
| ML Service | Yes | No | Training job count | 1 | 6 |
| BI Service | Yes | No | Active sessions | 1 | 4 |
| Catalog Service | Yes | No | None | 1 | 3 |
| Render Service | Yes | No | Render queue | 1 | 4 |
Scaling Behavior
All HPAs use controlled scaling behavior to prevent flapping:
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max| Setting | Scale Up | Scale Down |
|---|---|---|
| Stabilization window | 0 seconds (immediate) | 300 seconds (5 minutes) |
| Max change rate | 100% per 15s or 4 pods per 15s | 10% per 60s |
| Policy selection | Maximum (aggressive scale up) | Default |
Monitoring
| Metric | Description |
|---|---|
kube_hpa_status_current_replicas | Current replica count per HPA |
kube_hpa_status_desired_replicas | Target replica count per HPA |
kube_hpa_spec_min_replicas | Minimum replicas configured |
kube_hpa_spec_max_replicas | Maximum replicas configured |
Detailed Sections
| Section | Content |
|---|---|
| Horizontal Pod Autoscaler | CPU/memory-based pod scaling |
| Vertical Pod Autoscaler | Automatic resource right-sizing |
| Custom Metrics | Application-specific scaling signals |
| Cluster Autoscaler | Node-level capacity management |