MATIH Platform is in active MVP development. Documentation reflects current implementation status.
17. Kubernetes & Helm
Custom Metrics

Custom Metrics

Custom metrics enable HPA to scale based on application-specific signals beyond CPU and memory. The MATIH platform uses Prometheus metrics exposed by services and made available to the HPA controller through the Prometheus Adapter, enabling scaling based on queue depth, active sessions, request rate, and other business-relevant metrics.


Custom Metrics Architecture

Application --> Prometheus (scrape metrics) --> Prometheus Adapter --> Kubernetes Custom Metrics API --> HPA

Prometheus Adapter Configuration

The Prometheus Adapter translates Prometheus queries into the Kubernetes custom metrics API format:

rules:
  - seriesQuery: 'ai_service_llm_queue_depth{namespace!="",pod!=""}'
    resources:
      overrides:
        namespace:
          resource: namespace
        pod:
          resource: pod
    name:
      matches: "^(.*)$"
      as: "ai_llm_queue_depth"
    metricsQuery: 'avg(ai_service_llm_queue_depth{<<.LabelMatchers>>})'
 
  - seriesQuery: 'ai_service_active_sessions{namespace!="",pod!=""}'
    resources:
      overrides:
        namespace:
          resource: namespace
        pod:
          resource: pod
    name:
      matches: "^(.*)$"
      as: "ai_active_sessions"
    metricsQuery: 'sum(ai_service_active_sessions{<<.LabelMatchers>>})'

Custom Metrics by Service

ServiceMetricPrometheus QueryHPA Target
AI ServiceLLM queue depthai_service_llm_queue_depth10 per pod
AI ServiceActive sessionsai_service_active_sessions50 per pod
Query EngineQuery backlogquery_engine_pending_queries20 per pod
Render ServiceRender queuerender_service_queue_size5 per pod
API GatewayRequest raterate(http_requests_total[1m])200 per pod

HPA with Custom Metrics

Example HPA using custom metrics alongside resource metrics:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ai-service
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ai-service
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Pods
      pods:
        metric:
          name: ai_llm_queue_depth
        target:
          type: AverageValue
          averageValue: "10"
    - type: Pods
      pods:
        metric:
          name: ai_active_sessions
        target:
          type: AverageValue
          averageValue: "50"

When multiple metrics are specified, HPA uses the metric that results in the highest replica count, ensuring adequate capacity for all scaling dimensions.

Metric Types

API TypeDescriptionExample
ResourceBuilt-in CPU/memoryCPU utilization percentage
PodsCustom metric per pod (average)Queue depth per pod
ObjectCustom metric on Kubernetes objectIngress request rate
ExternalMetric from external systemCloud queue depth

Application Metric Exposure

Services expose custom metrics via Prometheus format on their metrics endpoint:

# AI Service custom metrics
from prometheus_client import Gauge, Counter
 
llm_queue_depth = Gauge(
    'ai_service_llm_queue_depth',
    'Number of pending LLM requests',
    ['provider']
)
 
active_sessions = Gauge(
    'ai_service_active_sessions',
    'Number of active chat sessions',
    ['tenant_id']
)

Monitoring

Verify custom metrics are available via the Kubernetes API:

# Check custom metrics availability (via platform-status.sh)
./scripts/tools/platform-status.sh

Troubleshooting

IssueSymptomResolution
Metric not foundHPA shows unknown for custom metricVerify Prometheus Adapter rules
Stale metricScaling based on old dataCheck Prometheus scrape interval
Incorrect scalingToo many or too few podsAdjust target value
Adapter not runningAll custom metrics unavailableCheck prometheus-adapter deployment