Custom Metrics
Custom metrics enable HPA to scale based on application-specific signals beyond CPU and memory. The MATIH platform uses Prometheus metrics exposed by services and made available to the HPA controller through the Prometheus Adapter, enabling scaling based on queue depth, active sessions, request rate, and other business-relevant metrics.
Custom Metrics Architecture
Application --> Prometheus (scrape metrics) --> Prometheus Adapter --> Kubernetes Custom Metrics API --> HPAPrometheus Adapter Configuration
The Prometheus Adapter translates Prometheus queries into the Kubernetes custom metrics API format:
rules:
- seriesQuery: 'ai_service_llm_queue_depth{namespace!="",pod!=""}'
resources:
overrides:
namespace:
resource: namespace
pod:
resource: pod
name:
matches: "^(.*)$"
as: "ai_llm_queue_depth"
metricsQuery: 'avg(ai_service_llm_queue_depth{<<.LabelMatchers>>})'
- seriesQuery: 'ai_service_active_sessions{namespace!="",pod!=""}'
resources:
overrides:
namespace:
resource: namespace
pod:
resource: pod
name:
matches: "^(.*)$"
as: "ai_active_sessions"
metricsQuery: 'sum(ai_service_active_sessions{<<.LabelMatchers>>})'Custom Metrics by Service
| Service | Metric | Prometheus Query | HPA Target |
|---|---|---|---|
| AI Service | LLM queue depth | ai_service_llm_queue_depth | 10 per pod |
| AI Service | Active sessions | ai_service_active_sessions | 50 per pod |
| Query Engine | Query backlog | query_engine_pending_queries | 20 per pod |
| Render Service | Render queue | render_service_queue_size | 5 per pod |
| API Gateway | Request rate | rate(http_requests_total[1m]) | 200 per pod |
HPA with Custom Metrics
Example HPA using custom metrics alongside resource metrics:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ai-service
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ai-service
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: ai_llm_queue_depth
target:
type: AverageValue
averageValue: "10"
- type: Pods
pods:
metric:
name: ai_active_sessions
target:
type: AverageValue
averageValue: "50"When multiple metrics are specified, HPA uses the metric that results in the highest replica count, ensuring adequate capacity for all scaling dimensions.
Metric Types
| API Type | Description | Example |
|---|---|---|
Resource | Built-in CPU/memory | CPU utilization percentage |
Pods | Custom metric per pod (average) | Queue depth per pod |
Object | Custom metric on Kubernetes object | Ingress request rate |
External | Metric from external system | Cloud queue depth |
Application Metric Exposure
Services expose custom metrics via Prometheus format on their metrics endpoint:
# AI Service custom metrics
from prometheus_client import Gauge, Counter
llm_queue_depth = Gauge(
'ai_service_llm_queue_depth',
'Number of pending LLM requests',
['provider']
)
active_sessions = Gauge(
'ai_service_active_sessions',
'Number of active chat sessions',
['tenant_id']
)Monitoring
Verify custom metrics are available via the Kubernetes API:
# Check custom metrics availability (via platform-status.sh)
./scripts/tools/platform-status.shTroubleshooting
| Issue | Symptom | Resolution |
|---|---|---|
| Metric not found | HPA shows unknown for custom metric | Verify Prometheus Adapter rules |
| Stale metric | Scaling based on old data | Check Prometheus scrape interval |
| Incorrect scaling | Too many or too few pods | Adjust target value |
| Adapter not running | All custom metrics unavailable | Check prometheus-adapter deployment |