Vertical Pod Autoscaler
The Vertical Pod Autoscaler (VPA) automatically adjusts CPU and memory resource requests for pods based on actual usage patterns. Unlike HPA which scales horizontally by adding replicas, VPA right-sizes individual pods to prevent over-provisioning or resource starvation.
VPA Modes
| Mode | Description | Pod Restart | Use Case |
|---|---|---|---|
Off | Recommendations only, no auto-update | No | Initial analysis, production stability |
Initial | Set resources only at pod creation | No (applies on next restart) | Batch workloads |
Auto | Automatically update running pods | Yes (eviction) | Non-critical services |
Recreate | Update by recreating pods | Yes (controlled) | Stateless services |
VPA Architecture
VPA Recommender --> Prometheus (historical metrics)
|
v
VPA Admission Controller --> Pod resource requests (on creation)
|
v
VPA Updater --> Evict + recreate pods with new resources (Auto mode)MATIH VPA Strategy
The platform uses VPA in Off mode for most services to generate recommendations without automatic updates. This avoids unexpected pod restarts in production:
| Service | VPA Mode | HPA Active | Rationale |
|---|---|---|---|
| AI Service | Off | Yes | Avoid conflicts with HPA |
| Query Engine | Off | Yes | Avoid conflicts with HPA |
| API Gateway | Off | Yes | Avoid conflicts with HPA |
| Data Infrastructure (Kafka, PG) | Off | No | Stability critical |
| ML Training (Ray workers) | Initial | No | Right-size for training jobs |
| Batch jobs | Initial | No | Optimize per-job resources |
VPA and HPA Conflict
VPA and HPA should not both target the same resource metric (CPU or memory) on the same deployment. The MATIH platform resolves this by:
- Using HPA for horizontal scaling based on CPU/memory
- Using VPA in
Offmode for resource recommendations only - Applying VPA recommendations manually during maintenance windows
VPA Manifest
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: ai-service-vpa
namespace: matih-data-plane
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: ai-service
updatePolicy:
updateMode: "Off"
resourcePolicy:
containerPolicies:
- containerName: ai-service
minAllowed:
cpu: 250m
memory: 512Mi
maxAllowed:
cpu: 4000m
memory: 8Gi
controlledResources: ["cpu", "memory"]Reading Recommendations
VPA recommendations can be viewed with:
# Get VPA recommendation (via platform-status.sh)
./scripts/tools/platform-status.shRecommendations include:
| Field | Description |
|---|---|
lowerBound | Minimum recommended resources |
target | Optimal recommended resources |
uncappedTarget | Target without min/max constraints |
upperBound | Maximum recommended resources |
Right-Sizing Process
- Deploy VPA in
Offmode for all services - Collect recommendations over 7+ days of production traffic
- Review recommendations against current resource requests
- Update Helm values with recommended resource requests
- Deploy updated values during maintenance window
- Monitor for performance impact after changes
Configuration
| Setting | Value | Description |
|---|---|---|
minAllowed.cpu | 250m | Minimum CPU per container |
minAllowed.memory | 512Mi | Minimum memory per container |
maxAllowed.cpu | 4000m | Maximum CPU per container |
maxAllowed.memory | 8Gi | Maximum memory per container |
| History window | 8 days | Historical data for recommendations |