Custom Metrics

Custom metrics enable HPA to scale based on application-specific signals beyond CPU and memory. The MATIH platform uses Prometheus metrics exposed by services and made available to the HPA controller through the Prometheus Adapter, enabling scaling based on queue depth, active sessions, request rate, and other business-relevant metrics.

Custom Metrics Architecture

Application --> Prometheus (scrape metrics) --> Prometheus Adapter --> Kubernetes Custom Metrics API --> HPA

Prometheus Adapter Configuration

The Prometheus Adapter translates Prometheus queries into the Kubernetes custom metrics API format:

rules:
  - seriesQuery: 'ai_service_llm_queue_depth{namespace!="",pod!=""}'
    resources:
      overrides:
        namespace:
          resource: namespace
        pod:
          resource: pod
    name:
      matches: "^(.*)$"
      as: "ai_llm_queue_depth"
    metricsQuery: 'avg(ai_service_llm_queue_depth{<<.LabelMatchers>>})'
 
  - seriesQuery: 'ai_service_active_sessions{namespace!="",pod!=""}'
    resources:
      overrides:
        namespace:
          resource: namespace
        pod:
          resource: pod
    name:
      matches: "^(.*)$"
      as: "ai_active_sessions"
    metricsQuery: 'sum(ai_service_active_sessions{<<.LabelMatchers>>})'

Custom Metrics by Service

Service	Metric	Prometheus Query	HPA Target
AI Service	LLM queue depth	`ai_service_llm_queue_depth`	10 per pod
AI Service	Active sessions	`ai_service_active_sessions`	50 per pod
Query Engine	Query backlog	`query_engine_pending_queries`	20 per pod
Render Service	Render queue	`render_service_queue_size`	5 per pod
API Gateway	Request rate	`rate(http_requests_total[1m])`	200 per pod

HPA with Custom Metrics

Example HPA using custom metrics alongside resource metrics:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ai-service
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ai-service
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Pods
      pods:
        metric:
          name: ai_llm_queue_depth
        target:
          type: AverageValue
          averageValue: "10"
    - type: Pods
      pods:
        metric:
          name: ai_active_sessions
        target:
          type: AverageValue
          averageValue: "50"

When multiple metrics are specified, HPA uses the metric that results in the highest replica count, ensuring adequate capacity for all scaling dimensions.

Metric Types

API Type	Description	Example
`Resource`	Built-in CPU/memory	CPU utilization percentage
`Pods`	Custom metric per pod (average)	Queue depth per pod
`Object`	Custom metric on Kubernetes object	Ingress request rate
`External`	Metric from external system	Cloud queue depth

Application Metric Exposure

Services expose custom metrics via Prometheus format on their metrics endpoint:

# AI Service custom metrics
from prometheus_client import Gauge, Counter
 
llm_queue_depth = Gauge(
    'ai_service_llm_queue_depth',
    'Number of pending LLM requests',
    ['provider']
)
 
active_sessions = Gauge(
    'ai_service_active_sessions',
    'Number of active chat sessions',
    ['tenant_id']
)

Monitoring

Verify custom metrics are available via the Kubernetes API:

# Check custom metrics availability (via platform-status.sh)
./scripts/tools/platform-status.sh

Troubleshooting

Issue	Symptom	Resolution
Metric not found	HPA shows `unknown` for custom metric	Verify Prometheus Adapter rules
Stale metric	Scaling based on old data	Check Prometheus scrape interval
Incorrect scaling	Too many or too few pods	Adjust target value
Adapter not running	All custom metrics unavailable	Check prometheus-adapter deployment

Vertical Pod Autoscaler Cluster Autoscaler