MATIH Platform is in active MVP development. Documentation reflects current implementation status.
17. Kubernetes & Helm
Overview

Autoscaling Overview

The MATIH platform implements a multi-layered autoscaling strategy that automatically adjusts compute resources in response to workload demands. This includes Horizontal Pod Autoscaling (HPA) for pod replicas, Vertical Pod Autoscaling (VPA) for resource requests, custom metrics-based scaling for application-specific signals, and Cluster Autoscaler for node-level capacity management.


Autoscaling Layers

LayerComponentScopeResponse Time
Pod horizontalHPAPod replicas15-60 seconds
Pod verticalVPACPU/memory requestsMinutes (on restart)
ApplicationCustom metricsApplication-specific15-60 seconds
ClusterCluster AutoscalerNode count2-5 minutes

Autoscaling Architecture

Prometheus (metrics) --> Metrics Server / Prometheus Adapter
                                |
                    +-----------+-----------+
                    |           |           |
                   HPA         VPA    Custom Metrics
                    |           |           |
               Pod Replicas  Pod Resources  Pod Replicas
                    |           |           |
                    +-----------+-----------+
                                |
                        Cluster Autoscaler
                                |
                           Node Count

Services with Autoscaling

ServiceHPAVPACustom MetricsMin ReplicasMax Replicas
AI ServiceYesNoLLM queue depth210
Query EngineYesNoQuery queue depth28
API GatewayYesNoRequest rate210
ML ServiceYesNoTraining job count16
BI ServiceYesNoActive sessions14
Catalog ServiceYesNoNone13
Render ServiceYesNoRender queue14

Scaling Behavior

All HPAs use controlled scaling behavior to prevent flapping:

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300
    policies:
      - type: Percent
        value: 10
        periodSeconds: 60
  scaleUp:
    stabilizationWindowSeconds: 0
    policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
    selectPolicy: Max
SettingScale UpScale Down
Stabilization window0 seconds (immediate)300 seconds (5 minutes)
Max change rate100% per 15s or 4 pods per 15s10% per 60s
Policy selectionMaximum (aggressive scale up)Default

Monitoring

MetricDescription
kube_hpa_status_current_replicasCurrent replica count per HPA
kube_hpa_status_desired_replicasTarget replica count per HPA
kube_hpa_spec_min_replicasMinimum replicas configured
kube_hpa_spec_max_replicasMaximum replicas configured

Detailed Sections

SectionContent
Horizontal Pod AutoscalerCPU/memory-based pod scaling
Vertical Pod AutoscalerAutomatic resource right-sizing
Custom MetricsApplication-specific scaling signals
Cluster AutoscalerNode-level capacity management