MATIH Platform is in active MVP development. Documentation reflects current implementation status.
17. Kubernetes & Helm
Vertical Pod Autoscaler

Vertical Pod Autoscaler

The Vertical Pod Autoscaler (VPA) automatically adjusts CPU and memory resource requests for pods based on actual usage patterns. Unlike HPA which scales horizontally by adding replicas, VPA right-sizes individual pods to prevent over-provisioning or resource starvation.


VPA Modes

ModeDescriptionPod RestartUse Case
OffRecommendations only, no auto-updateNoInitial analysis, production stability
InitialSet resources only at pod creationNo (applies on next restart)Batch workloads
AutoAutomatically update running podsYes (eviction)Non-critical services
RecreateUpdate by recreating podsYes (controlled)Stateless services

VPA Architecture

VPA Recommender --> Prometheus (historical metrics)
       |
       v
VPA Admission Controller --> Pod resource requests (on creation)
       |
       v
VPA Updater --> Evict + recreate pods with new resources (Auto mode)

MATIH VPA Strategy

The platform uses VPA in Off mode for most services to generate recommendations without automatic updates. This avoids unexpected pod restarts in production:

ServiceVPA ModeHPA ActiveRationale
AI ServiceOffYesAvoid conflicts with HPA
Query EngineOffYesAvoid conflicts with HPA
API GatewayOffYesAvoid conflicts with HPA
Data Infrastructure (Kafka, PG)OffNoStability critical
ML Training (Ray workers)InitialNoRight-size for training jobs
Batch jobsInitialNoOptimize per-job resources

VPA and HPA Conflict

VPA and HPA should not both target the same resource metric (CPU or memory) on the same deployment. The MATIH platform resolves this by:

  1. Using HPA for horizontal scaling based on CPU/memory
  2. Using VPA in Off mode for resource recommendations only
  3. Applying VPA recommendations manually during maintenance windows

VPA Manifest

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: ai-service-vpa
  namespace: matih-data-plane
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ai-service
  updatePolicy:
    updateMode: "Off"
  resourcePolicy:
    containerPolicies:
      - containerName: ai-service
        minAllowed:
          cpu: 250m
          memory: 512Mi
        maxAllowed:
          cpu: 4000m
          memory: 8Gi
        controlledResources: ["cpu", "memory"]

Reading Recommendations

VPA recommendations can be viewed with:

# Get VPA recommendation (via platform-status.sh)
./scripts/tools/platform-status.sh

Recommendations include:

FieldDescription
lowerBoundMinimum recommended resources
targetOptimal recommended resources
uncappedTargetTarget without min/max constraints
upperBoundMaximum recommended resources

Right-Sizing Process

  1. Deploy VPA in Off mode for all services
  2. Collect recommendations over 7+ days of production traffic
  3. Review recommendations against current resource requests
  4. Update Helm values with recommended resource requests
  5. Deploy updated values during maintenance window
  6. Monitor for performance impact after changes

Configuration

SettingValueDescription
minAllowed.cpu250mMinimum CPU per container
minAllowed.memory512MiMinimum memory per container
maxAllowed.cpu4000mMaximum CPU per container
maxAllowed.memory8GiMaximum memory per container
History window8 daysHistorical data for recommendations