Retraining Triggers
The Retraining Trigger module automates the decision to retrain models based on drift detection, performance degradation, data freshness, and scheduled intervals. When a trigger fires, it initiates a retraining pipeline with the latest data while maintaining the current model in production until the new version is validated.
Trigger Types
| Trigger | Condition | Response |
|---|---|---|
| Drift-based | Feature or concept drift exceeds threshold | Retrain with recent data |
| Performance-based | Accuracy or F1 drops below baseline | Retrain with expanded dataset |
| Schedule-based | Configured time interval elapsed | Periodic retraining |
| Data-based | Significant new training data available | Retrain with augmented dataset |
| Manual | Engineer explicitly triggers retraining | On-demand retraining |
Trigger Configuration
PUT /api/v1/monitoring/config/:model_id{
"model_id": "model-xyz789",
"retraining_triggers": {
"drift_based": {
"enabled": true,
"drift_severity_threshold": "high",
"min_features_drifted": 2,
"cooldown_hours": 24
},
"performance_based": {
"enabled": true,
"accuracy_min": 0.85,
"f1_min": 0.80,
"evaluation_window_hours": 6,
"cooldown_hours": 12
},
"schedule_based": {
"enabled": true,
"interval_days": 7,
"preferred_time": "02:00",
"timezone": "UTC"
},
"data_based": {
"enabled": true,
"min_new_samples": 5000,
"check_interval_hours": 6
}
}
}Trigger Evaluation
The retraining trigger service evaluates conditions on a configurable schedule:
class RetrainingTriggerService:
async def evaluate(self, model_id: str) -> RetrainingDecision:
drift_status = await self.drift_service.get_status(model_id)
perf_status = await self.perf_service.get_status(model_id)
if drift_status.severity >= DriftSeverity.HIGH:
return RetrainingDecision(
trigger="drift_based",
reason=f"High drift in {drift_status.drifted_features} features",
urgency="high",
)
if perf_status.accuracy < self.config.accuracy_min:
return RetrainingDecision(
trigger="performance_based",
reason=f"Accuracy {perf_status.accuracy} below threshold",
urgency="critical",
)
return RetrainingDecision(trigger=None, reason="No retraining needed")Retraining Pipeline
When a trigger fires, the following pipeline executes:
- Data Collection: Gather training data based on trigger type
- Preprocessing: Apply feature engineering pipeline
- Training: Train model with same configuration as current version
- Evaluation: Evaluate on holdout set and compare with current model
- Validation: Run governance checks (fairness, explainability)
- Shadow: Deploy as shadow for production validation
- Promotion: Promote to production if criteria are met
Trigger Manual Retraining
POST /api/v1/monitoring/retrain/:model_id{
"reason": "New product category added to catalog",
"training_config_override": {
"data_query": "SELECT * FROM ml_features.customer_churn WHERE date >= '2025-01-01'",
"hyperparameters": null
}
}Response
{
"retraining_id": "retrain-abc123",
"model_id": "model-xyz789",
"trigger": "manual",
"status": "submitted",
"training_job_id": "train-def456",
"created_at": "2025-03-15T10:00:00Z"
}Cooldown Periods
To prevent excessive retraining, cooldown periods enforce minimum intervals between triggers:
| Trigger Type | Default Cooldown | Configurable |
|---|---|---|
| Drift-based | 24 hours | Yes |
| Performance-based | 12 hours | Yes |
| Schedule-based | N/A (interval-based) | Yes |
| Data-based | 6 hours | Yes |
| Manual | None | No |
Retraining History
GET /api/v1/monitoring/retrain/:model_id/history{
"retraining_events": [
{
"id": "retrain-abc123",
"trigger": "drift_based",
"reason": "High drift in monthly_charges (PSI=0.32)",
"old_version": "v2",
"new_version": "v3",
"accuracy_change": 0.013,
"status": "completed",
"created_at": "2025-03-15T10:00:00Z"
}
]
}Configuration
| Environment Variable | Default | Description |
|---|---|---|
RETRAINING_TRIGGER_INTERVAL | 3600 | Trigger evaluation interval in seconds |
RETRAINING_MAX_CONCURRENT | 2 | Max concurrent retraining jobs |
RETRAINING_AUTO_PROMOTE | false | Auto-promote after validation |
RETRAINING_COOLDOWN_HOURS | 24 | Default cooldown between retraining |