Drift Detection

The Drift Detection module provides multi-dimensional analysis of data and model drift, implementing statistical tests to detect when production data distributions diverge from training data. It supports feature drift, label drift, concept drift, and prediction drift with configurable severity thresholds and automated alerting.

Drift Categories

Category	Description	Detection Method
Feature drift	Input feature distributions change	PSI, KS test, Chi-square
Label drift	Target variable distribution changes	PSI, Chi-square
Concept drift	Relationship between features and target changes	DDM, Page-Hinkley, accuracy monitoring
Prediction drift	Model output distribution changes	PSI, KS test
Covariate drift	Multivariate feature relationships change	JS divergence, Wasserstein

Detection Methods

Method	Type	Best For
PSI (Population Stability Index)	Statistical	Binned numeric and categorical features
KS Test (Kolmogorov-Smirnov)	Statistical	Continuous numeric features
Chi-Square	Statistical	Categorical features
JS Divergence (Jensen-Shannon)	Information-theoretic	Distribution comparison
Wasserstein	Optimal transport	Distribution shape comparison
ADWIN (Adaptive Windowing)	Online	Streaming data concept drift
DDM (Drift Detection Method)	Online	Error rate monitoring
Page-Hinkley	Online	Mean shift detection

Severity Levels

Severity	PSI Range	KS p-value	Action
`none`	Below 0.1	Above 0.05	No action needed
`low`	0.1 - 0.15	0.01 - 0.05	Log and monitor
`medium`	0.15 - 0.25	0.001 - 0.01	Alert team, investigate
`high`	0.25 - 0.5	Below 0.001	Consider retraining
`critical`	Above 0.5	Below 0.0001	Trigger automatic retraining

Run Drift Analysis

POST /api/v1/monitoring/drift/analyze

{
  "model_id": "model-xyz789",
  "reference_data": {
    "source": "sql",
    "query": "SELECT * FROM ml_features.customer_churn_training"
  },
  "production_data": {
    "source": "sql",
    "query": "SELECT * FROM ml_features.customer_churn_production WHERE date >= CURRENT_DATE - INTERVAL 7 DAY"
  },
  "methods": ["psi", "ks_test"],
  "features": ["tenure", "monthly_charges", "total_charges", "contract_type"]
}

Response

{
  "model_id": "model-xyz789",
  "overall_drift": "medium",
  "features": [
    {
      "name": "monthly_charges",
      "drift_severity": "high",
      "psi": 0.32,
      "ks_statistic": 0.15,
      "ks_p_value": 0.0003,
      "direction": "higher values in production"
    },
    {
      "name": "tenure",
      "drift_severity": "none",
      "psi": 0.04,
      "ks_statistic": 0.03,
      "ks_p_value": 0.42
    }
  ],
  "recommendations": [
    "Feature 'monthly_charges' shows significant drift (PSI=0.32)",
    "Consider retraining with recent data or investigating pricing changes"
  ]
}

Continuous Monitoring

Drift detection runs on a configurable schedule for deployed models:

{
  "model_id": "model-xyz789",
  "monitoring_config": {
    "enabled": true,
    "interval_hours": 1,
    "reference_window_days": 30,
    "production_window_days": 7,
    "methods": ["psi", "ks_test"],
    "alert_threshold": "medium"
  }
}

Drift Root Cause Analysis

When drift is detected, the service provides root cause analysis:

{
  "root_cause_analysis": {
    "primary_driver": "monthly_charges",
    "correlation_analysis": [
      {
        "feature": "monthly_charges",
        "contribution_to_drift": 0.45,
        "possible_cause": "Price increase in production data"
      }
    ],
    "temporal_analysis": {
      "drift_onset": "2025-03-10T00:00:00Z",
      "trend": "increasing"
    }
  }
}

Configuration

Environment Variable	Default	Description
`DRIFT_DETECTION_INTERVAL`	`3600`	Check interval in seconds
`DRIFT_PSI_THRESHOLD`	`0.15`	PSI warning threshold
`DRIFT_KS_ALPHA`	`0.05`	KS test significance level
`DRIFT_REFERENCE_WINDOW`	`30`	Reference data window in days
`DRIFT_PRODUCTION_WINDOW`	`7`	Production data window in days

Monitoring Overview Performance Monitoring