Retraining Triggers

The Retraining Trigger module automates the decision to retrain models based on drift detection, performance degradation, data freshness, and scheduled intervals. When a trigger fires, it initiates a retraining pipeline with the latest data while maintaining the current model in production until the new version is validated.

Trigger Types

Trigger	Condition	Response
Drift-based	Feature or concept drift exceeds threshold	Retrain with recent data
Performance-based	Accuracy or F1 drops below baseline	Retrain with expanded dataset
Schedule-based	Configured time interval elapsed	Periodic retraining
Data-based	Significant new training data available	Retrain with augmented dataset
Manual	Engineer explicitly triggers retraining	On-demand retraining

Trigger Configuration

PUT /api/v1/monitoring/config/:model_id

{
  "model_id": "model-xyz789",
  "retraining_triggers": {
    "drift_based": {
      "enabled": true,
      "drift_severity_threshold": "high",
      "min_features_drifted": 2,
      "cooldown_hours": 24
    },
    "performance_based": {
      "enabled": true,
      "accuracy_min": 0.85,
      "f1_min": 0.80,
      "evaluation_window_hours": 6,
      "cooldown_hours": 12
    },
    "schedule_based": {
      "enabled": true,
      "interval_days": 7,
      "preferred_time": "02:00",
      "timezone": "UTC"
    },
    "data_based": {
      "enabled": true,
      "min_new_samples": 5000,
      "check_interval_hours": 6
    }
  }
}

Trigger Evaluation

The retraining trigger service evaluates conditions on a configurable schedule:

class RetrainingTriggerService:
    async def evaluate(self, model_id: str) -> RetrainingDecision:
        drift_status = await self.drift_service.get_status(model_id)
        perf_status = await self.perf_service.get_status(model_id)
 
        if drift_status.severity >= DriftSeverity.HIGH:
            return RetrainingDecision(
                trigger="drift_based",
                reason=f"High drift in {drift_status.drifted_features} features",
                urgency="high",
            )
 
        if perf_status.accuracy < self.config.accuracy_min:
            return RetrainingDecision(
                trigger="performance_based",
                reason=f"Accuracy {perf_status.accuracy} below threshold",
                urgency="critical",
            )
 
        return RetrainingDecision(trigger=None, reason="No retraining needed")

Retraining Pipeline

When a trigger fires, the following pipeline executes:

Data Collection: Gather training data based on trigger type
Preprocessing: Apply feature engineering pipeline
Training: Train model with same configuration as current version
Evaluation: Evaluate on holdout set and compare with current model
Validation: Run governance checks (fairness, explainability)
Shadow: Deploy as shadow for production validation
Promotion: Promote to production if criteria are met

Trigger Manual Retraining

POST /api/v1/monitoring/retrain/:model_id

{
  "reason": "New product category added to catalog",
  "training_config_override": {
    "data_query": "SELECT * FROM ml_features.customer_churn WHERE date >= '2025-01-01'",
    "hyperparameters": null
  }
}

Response

{
  "retraining_id": "retrain-abc123",
  "model_id": "model-xyz789",
  "trigger": "manual",
  "status": "submitted",
  "training_job_id": "train-def456",
  "created_at": "2025-03-15T10:00:00Z"
}

Cooldown Periods

To prevent excessive retraining, cooldown periods enforce minimum intervals between triggers:

Trigger Type	Default Cooldown	Configurable
Drift-based	24 hours	Yes
Performance-based	12 hours	Yes
Schedule-based	N/A (interval-based)	Yes
Data-based	6 hours	Yes
Manual	None	No

Retraining History

GET /api/v1/monitoring/retrain/:model_id/history

{
  "retraining_events": [
    {
      "id": "retrain-abc123",
      "trigger": "drift_based",
      "reason": "High drift in monthly_charges (PSI=0.32)",
      "old_version": "v2",
      "new_version": "v3",
      "accuracy_change": 0.013,
      "status": "completed",
      "created_at": "2025-03-15T10:00:00Z"
    }
  ]
}

Configuration

Environment Variable	Default	Description
`RETRAINING_TRIGGER_INTERVAL`	`3600`	Trigger evaluation interval in seconds
`RETRAINING_MAX_CONCURRENT`	`2`	Max concurrent retraining jobs
`RETRAINING_AUTO_PROMOTE`	`false`	Auto-promote after validation
`RETRAINING_COOLDOWN_HOURS`	`24`	Default cooldown between retraining

Performance Monitoring ML Governance Overview