MATIH Platform is in active MVP development. Documentation reflects current implementation status.
13. ML Service & MLOps
Drift Detection

Drift Detection

The Drift Detection module provides multi-dimensional analysis of data and model drift, implementing statistical tests to detect when production data distributions diverge from training data. It supports feature drift, label drift, concept drift, and prediction drift with configurable severity thresholds and automated alerting.


Drift Categories

CategoryDescriptionDetection Method
Feature driftInput feature distributions changePSI, KS test, Chi-square
Label driftTarget variable distribution changesPSI, Chi-square
Concept driftRelationship between features and target changesDDM, Page-Hinkley, accuracy monitoring
Prediction driftModel output distribution changesPSI, KS test
Covariate driftMultivariate feature relationships changeJS divergence, Wasserstein

Detection Methods

MethodTypeBest For
PSI (Population Stability Index)StatisticalBinned numeric and categorical features
KS Test (Kolmogorov-Smirnov)StatisticalContinuous numeric features
Chi-SquareStatisticalCategorical features
JS Divergence (Jensen-Shannon)Information-theoreticDistribution comparison
WassersteinOptimal transportDistribution shape comparison
ADWIN (Adaptive Windowing)OnlineStreaming data concept drift
DDM (Drift Detection Method)OnlineError rate monitoring
Page-HinkleyOnlineMean shift detection

Severity Levels

SeverityPSI RangeKS p-valueAction
noneBelow 0.1Above 0.05No action needed
low0.1 - 0.150.01 - 0.05Log and monitor
medium0.15 - 0.250.001 - 0.01Alert team, investigate
high0.25 - 0.5Below 0.001Consider retraining
criticalAbove 0.5Below 0.0001Trigger automatic retraining

Run Drift Analysis

POST /api/v1/monitoring/drift/analyze
{
  "model_id": "model-xyz789",
  "reference_data": {
    "source": "sql",
    "query": "SELECT * FROM ml_features.customer_churn_training"
  },
  "production_data": {
    "source": "sql",
    "query": "SELECT * FROM ml_features.customer_churn_production WHERE date >= CURRENT_DATE - INTERVAL 7 DAY"
  },
  "methods": ["psi", "ks_test"],
  "features": ["tenure", "monthly_charges", "total_charges", "contract_type"]
}

Response

{
  "model_id": "model-xyz789",
  "overall_drift": "medium",
  "features": [
    {
      "name": "monthly_charges",
      "drift_severity": "high",
      "psi": 0.32,
      "ks_statistic": 0.15,
      "ks_p_value": 0.0003,
      "direction": "higher values in production"
    },
    {
      "name": "tenure",
      "drift_severity": "none",
      "psi": 0.04,
      "ks_statistic": 0.03,
      "ks_p_value": 0.42
    }
  ],
  "recommendations": [
    "Feature 'monthly_charges' shows significant drift (PSI=0.32)",
    "Consider retraining with recent data or investigating pricing changes"
  ]
}

Continuous Monitoring

Drift detection runs on a configurable schedule for deployed models:

{
  "model_id": "model-xyz789",
  "monitoring_config": {
    "enabled": true,
    "interval_hours": 1,
    "reference_window_days": 30,
    "production_window_days": 7,
    "methods": ["psi", "ks_test"],
    "alert_threshold": "medium"
  }
}

Drift Root Cause Analysis

When drift is detected, the service provides root cause analysis:

{
  "root_cause_analysis": {
    "primary_driver": "monthly_charges",
    "correlation_analysis": [
      {
        "feature": "monthly_charges",
        "contribution_to_drift": 0.45,
        "possible_cause": "Price increase in production data"
      }
    ],
    "temporal_analysis": {
      "drift_onset": "2025-03-10T00:00:00Z",
      "trend": "increasing"
    }
  }
}

Configuration

Environment VariableDefaultDescription
DRIFT_DETECTION_INTERVAL3600Check interval in seconds
DRIFT_PSI_THRESHOLD0.15PSI warning threshold
DRIFT_KS_ALPHA0.05KS test significance level
DRIFT_REFERENCE_WINDOW30Reference data window in days
DRIFT_PRODUCTION_WINDOW7Production data window in days