Performance Monitoring
The Performance Monitoring module tracks model accuracy, latency, throughput, and resource utilization for deployed models. It provides real-time dashboards, historical trend analysis, and alerting when performance degrades below configured thresholds. The implementation is in src/monitoring/performance_monitoring_service.py.
Monitored Metrics
Prediction Quality
| Metric | Type | Description |
|---|---|---|
| Accuracy | Classification | Percentage of correct predictions |
| F1 Score | Classification | Harmonic mean of precision and recall |
| AUC-ROC | Classification | Area under the ROC curve |
| RMSE | Regression | Root mean squared error |
| MAE | Regression | Mean absolute error |
| MAPE | Regression | Mean absolute percentage error |
Serving Performance
| Metric | Type | Description |
|---|---|---|
| Latency (p50, p95, p99) | Histogram | End-to-end prediction latency |
| Throughput | Gauge | Requests per second |
| Error rate | Counter | Percentage of failed predictions |
| Queue depth | Gauge | Pending requests in serving queue |
Resource Utilization
| Metric | Type | Description |
|---|---|---|
| CPU utilization | Gauge | Serving pod CPU usage |
| Memory utilization | Gauge | Serving pod memory usage |
| GPU utilization | Gauge | GPU compute utilization (if applicable) |
| Model cache hit rate | Counter | In-memory model cache effectiveness |
Get Performance Summary
GET /api/v1/monitoring/performance/:model_idQuery Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| window | string | no | Time window (1h, 6h, 24h, 7d, 30d) |
| granularity | string | no | Metric granularity (minute, hour, day) |
Response
{
"model_id": "model-xyz789",
"window": "24h",
"quality_metrics": {
"accuracy": {"current": 0.91, "baseline": 0.912, "trend": "stable"},
"f1_score": {"current": 0.89, "baseline": 0.895, "trend": "slight_decline"}
},
"serving_metrics": {
"latency_p50_ms": 8.2,
"latency_p95_ms": 22.5,
"latency_p99_ms": 48.1,
"throughput_rps": 145,
"error_rate": 0.001
},
"resource_metrics": {
"cpu_utilization": 0.45,
"memory_utilization": 0.62,
"gpu_utilization": 0.0
}
}Baseline Comparison
Performance is compared against baselines established at deployment time:
| Metric | Baseline Source | Alert Threshold |
|---|---|---|
| Accuracy | Test set evaluation at deployment | 5% relative degradation |
| Latency p95 | First 24 hours in production | 50% increase |
| Error rate | First 24 hours in production | Above 1% |
| Throughput | Expected based on traffic forecast | 20% below forecast |
Performance Trends
The module tracks metric trends over time to detect gradual degradation:
{
"trends": {
"accuracy": {
"7_day_trend": "declining",
"slope": -0.002,
"projected_baseline_breach": "2025-03-22T00:00:00Z",
"confidence": 0.78
}
}
}Alerting Rules
{
"model_id": "model-xyz789",
"alert_rules": [
{
"metric": "accuracy",
"condition": "below",
"threshold": 0.85,
"window": "1h",
"severity": "critical"
},
{
"metric": "latency_p95",
"condition": "above",
"threshold": 100,
"window": "15m",
"severity": "warning"
},
{
"metric": "error_rate",
"condition": "above",
"threshold": 0.05,
"window": "5m",
"severity": "critical"
}
]
}Prometheus Integration
All metrics are exported as Prometheus metrics for Grafana dashboards:
| Prometheus Metric | Labels | Type |
|---|---|---|
ml_model_accuracy | model_id, tenant_id | Gauge |
ml_model_latency_seconds | model_id, quantile | Summary |
ml_model_predictions_total | model_id, status | Counter |
ml_model_errors_total | model_id, error_type | Counter |
Configuration
| Environment Variable | Default | Description |
|---|---|---|
PERF_MONITORING_INTERVAL | 60 | Metric aggregation interval in seconds |
PERF_ACCURACY_THRESHOLD | 0.05 | Accuracy degradation alert threshold |
PERF_LATENCY_P95_MAX_MS | 100 | Maximum acceptable p95 latency |
PERF_ERROR_RATE_MAX | 0.01 | Maximum acceptable error rate |