MATIH Platform is in active MVP development. Documentation reflects current implementation status.
13. ML Service & MLOps
Performance Monitoring

Performance Monitoring

The Performance Monitoring module tracks model accuracy, latency, throughput, and resource utilization for deployed models. It provides real-time dashboards, historical trend analysis, and alerting when performance degrades below configured thresholds. The implementation is in src/monitoring/performance_monitoring_service.py.


Monitored Metrics

Prediction Quality

MetricTypeDescription
AccuracyClassificationPercentage of correct predictions
F1 ScoreClassificationHarmonic mean of precision and recall
AUC-ROCClassificationArea under the ROC curve
RMSERegressionRoot mean squared error
MAERegressionMean absolute error
MAPERegressionMean absolute percentage error

Serving Performance

MetricTypeDescription
Latency (p50, p95, p99)HistogramEnd-to-end prediction latency
ThroughputGaugeRequests per second
Error rateCounterPercentage of failed predictions
Queue depthGaugePending requests in serving queue

Resource Utilization

MetricTypeDescription
CPU utilizationGaugeServing pod CPU usage
Memory utilizationGaugeServing pod memory usage
GPU utilizationGaugeGPU compute utilization (if applicable)
Model cache hit rateCounterIn-memory model cache effectiveness

Get Performance Summary

GET /api/v1/monitoring/performance/:model_id

Query Parameters

ParameterTypeRequiredDescription
windowstringnoTime window (1h, 6h, 24h, 7d, 30d)
granularitystringnoMetric granularity (minute, hour, day)

Response

{
  "model_id": "model-xyz789",
  "window": "24h",
  "quality_metrics": {
    "accuracy": {"current": 0.91, "baseline": 0.912, "trend": "stable"},
    "f1_score": {"current": 0.89, "baseline": 0.895, "trend": "slight_decline"}
  },
  "serving_metrics": {
    "latency_p50_ms": 8.2,
    "latency_p95_ms": 22.5,
    "latency_p99_ms": 48.1,
    "throughput_rps": 145,
    "error_rate": 0.001
  },
  "resource_metrics": {
    "cpu_utilization": 0.45,
    "memory_utilization": 0.62,
    "gpu_utilization": 0.0
  }
}

Baseline Comparison

Performance is compared against baselines established at deployment time:

MetricBaseline SourceAlert Threshold
AccuracyTest set evaluation at deployment5% relative degradation
Latency p95First 24 hours in production50% increase
Error rateFirst 24 hours in productionAbove 1%
ThroughputExpected based on traffic forecast20% below forecast

Performance Trends

The module tracks metric trends over time to detect gradual degradation:

{
  "trends": {
    "accuracy": {
      "7_day_trend": "declining",
      "slope": -0.002,
      "projected_baseline_breach": "2025-03-22T00:00:00Z",
      "confidence": 0.78
    }
  }
}

Alerting Rules

{
  "model_id": "model-xyz789",
  "alert_rules": [
    {
      "metric": "accuracy",
      "condition": "below",
      "threshold": 0.85,
      "window": "1h",
      "severity": "critical"
    },
    {
      "metric": "latency_p95",
      "condition": "above",
      "threshold": 100,
      "window": "15m",
      "severity": "warning"
    },
    {
      "metric": "error_rate",
      "condition": "above",
      "threshold": 0.05,
      "window": "5m",
      "severity": "critical"
    }
  ]
}

Prometheus Integration

All metrics are exported as Prometheus metrics for Grafana dashboards:

Prometheus MetricLabelsType
ml_model_accuracymodel_id, tenant_idGauge
ml_model_latency_secondsmodel_id, quantileSummary
ml_model_predictions_totalmodel_id, statusCounter
ml_model_errors_totalmodel_id, error_typeCounter

Configuration

Environment VariableDefaultDescription
PERF_MONITORING_INTERVAL60Metric aggregation interval in seconds
PERF_ACCURACY_THRESHOLD0.05Accuracy degradation alert threshold
PERF_LATENCY_P95_MAX_MS100Maximum acceptable p95 latency
PERF_ERROR_RATE_MAX0.01Maximum acceptable error rate