Drift Detection
The Drift Detection module provides multi-dimensional analysis of data and model drift, implementing statistical tests to detect when production data distributions diverge from training data. It supports feature drift, label drift, concept drift, and prediction drift with configurable severity thresholds and automated alerting.
Drift Categories
| Category | Description | Detection Method |
|---|---|---|
| Feature drift | Input feature distributions change | PSI, KS test, Chi-square |
| Label drift | Target variable distribution changes | PSI, Chi-square |
| Concept drift | Relationship between features and target changes | DDM, Page-Hinkley, accuracy monitoring |
| Prediction drift | Model output distribution changes | PSI, KS test |
| Covariate drift | Multivariate feature relationships change | JS divergence, Wasserstein |
Detection Methods
| Method | Type | Best For |
|---|---|---|
| PSI (Population Stability Index) | Statistical | Binned numeric and categorical features |
| KS Test (Kolmogorov-Smirnov) | Statistical | Continuous numeric features |
| Chi-Square | Statistical | Categorical features |
| JS Divergence (Jensen-Shannon) | Information-theoretic | Distribution comparison |
| Wasserstein | Optimal transport | Distribution shape comparison |
| ADWIN (Adaptive Windowing) | Online | Streaming data concept drift |
| DDM (Drift Detection Method) | Online | Error rate monitoring |
| Page-Hinkley | Online | Mean shift detection |
Severity Levels
| Severity | PSI Range | KS p-value | Action |
|---|---|---|---|
none | Below 0.1 | Above 0.05 | No action needed |
low | 0.1 - 0.15 | 0.01 - 0.05 | Log and monitor |
medium | 0.15 - 0.25 | 0.001 - 0.01 | Alert team, investigate |
high | 0.25 - 0.5 | Below 0.001 | Consider retraining |
critical | Above 0.5 | Below 0.0001 | Trigger automatic retraining |
Run Drift Analysis
POST /api/v1/monitoring/drift/analyze{
"model_id": "model-xyz789",
"reference_data": {
"source": "sql",
"query": "SELECT * FROM ml_features.customer_churn_training"
},
"production_data": {
"source": "sql",
"query": "SELECT * FROM ml_features.customer_churn_production WHERE date >= CURRENT_DATE - INTERVAL 7 DAY"
},
"methods": ["psi", "ks_test"],
"features": ["tenure", "monthly_charges", "total_charges", "contract_type"]
}Response
{
"model_id": "model-xyz789",
"overall_drift": "medium",
"features": [
{
"name": "monthly_charges",
"drift_severity": "high",
"psi": 0.32,
"ks_statistic": 0.15,
"ks_p_value": 0.0003,
"direction": "higher values in production"
},
{
"name": "tenure",
"drift_severity": "none",
"psi": 0.04,
"ks_statistic": 0.03,
"ks_p_value": 0.42
}
],
"recommendations": [
"Feature 'monthly_charges' shows significant drift (PSI=0.32)",
"Consider retraining with recent data or investigating pricing changes"
]
}Continuous Monitoring
Drift detection runs on a configurable schedule for deployed models:
{
"model_id": "model-xyz789",
"monitoring_config": {
"enabled": true,
"interval_hours": 1,
"reference_window_days": 30,
"production_window_days": 7,
"methods": ["psi", "ks_test"],
"alert_threshold": "medium"
}
}Drift Root Cause Analysis
When drift is detected, the service provides root cause analysis:
{
"root_cause_analysis": {
"primary_driver": "monthly_charges",
"correlation_analysis": [
{
"feature": "monthly_charges",
"contribution_to_drift": 0.45,
"possible_cause": "Price increase in production data"
}
],
"temporal_analysis": {
"drift_onset": "2025-03-10T00:00:00Z",
"trend": "increasing"
}
}
}Configuration
| Environment Variable | Default | Description |
|---|---|---|
DRIFT_DETECTION_INTERVAL | 3600 | Check interval in seconds |
DRIFT_PSI_THRESHOLD | 0.15 | PSI warning threshold |
DRIFT_KS_ALPHA | 0.05 | KS test significance level |
DRIFT_REFERENCE_WINDOW | 30 | Reference data window in days |
DRIFT_PRODUCTION_WINDOW | 7 | Production data window in days |