Vertical Pod Autoscaler

The Vertical Pod Autoscaler (VPA) automatically adjusts CPU and memory resource requests for pods based on actual usage patterns. Unlike HPA which scales horizontally by adding replicas, VPA right-sizes individual pods to prevent over-provisioning or resource starvation.

VPA Modes

Mode	Description	Pod Restart	Use Case
`Off`	Recommendations only, no auto-update	No	Initial analysis, production stability
`Initial`	Set resources only at pod creation	No (applies on next restart)	Batch workloads
`Auto`	Automatically update running pods	Yes (eviction)	Non-critical services
`Recreate`	Update by recreating pods	Yes (controlled)	Stateless services

VPA Architecture

VPA Recommender --> Prometheus (historical metrics)
       |
       v
VPA Admission Controller --> Pod resource requests (on creation)
       |
       v
VPA Updater --> Evict + recreate pods with new resources (Auto mode)

MATIH VPA Strategy

The platform uses VPA in Off mode for most services to generate recommendations without automatic updates. This avoids unexpected pod restarts in production:

Service	VPA Mode	HPA Active	Rationale
AI Service	Off	Yes	Avoid conflicts with HPA
Query Engine	Off	Yes	Avoid conflicts with HPA
API Gateway	Off	Yes	Avoid conflicts with HPA
Data Infrastructure (Kafka, PG)	Off	No	Stability critical
ML Training (Ray workers)	Initial	No	Right-size for training jobs
Batch jobs	Initial	No	Optimize per-job resources

VPA and HPA Conflict

VPA and HPA should not both target the same resource metric (CPU or memory) on the same deployment. The MATIH platform resolves this by:

Using HPA for horizontal scaling based on CPU/memory
Using VPA in Off mode for resource recommendations only
Applying VPA recommendations manually during maintenance windows

VPA Manifest

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: ai-service-vpa
  namespace: matih-data-plane
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ai-service
  updatePolicy:
    updateMode: "Off"
  resourcePolicy:
    containerPolicies:
      - containerName: ai-service
        minAllowed:
          cpu: 250m
          memory: 512Mi
        maxAllowed:
          cpu: 4000m
          memory: 8Gi
        controlledResources: ["cpu", "memory"]

Reading Recommendations

VPA recommendations can be viewed with:

# Get VPA recommendation (via platform-status.sh)
./scripts/tools/platform-status.sh

Recommendations include:

Field	Description
`lowerBound`	Minimum recommended resources
`target`	Optimal recommended resources
`uncappedTarget`	Target without min/max constraints
`upperBound`	Maximum recommended resources

Right-Sizing Process

Deploy VPA in Off mode for all services
Collect recommendations over 7+ days of production traffic
Review recommendations against current resource requests
Update Helm values with recommended resource requests
Deploy updated values during maintenance window
Monitor for performance impact after changes

Configuration

Setting	Value	Description
`minAllowed.cpu`	250m	Minimum CPU per container
`minAllowed.memory`	512Mi	Minimum memory per container
`maxAllowed.cpu`	4000m	Maximum CPU per container
`maxAllowed.memory`	8Gi	Maximum memory per container
History window	8 days	Historical data for recommendations

Horizontal Pod Autoscaler Custom Metrics