MATIH Platform is in active MVP development. Documentation reflects current implementation status.
13. ML Service & MLOps
Retraining Triggers

Retraining Triggers

The Retraining Trigger module automates the decision to retrain models based on drift detection, performance degradation, data freshness, and scheduled intervals. When a trigger fires, it initiates a retraining pipeline with the latest data while maintaining the current model in production until the new version is validated.


Trigger Types

TriggerConditionResponse
Drift-basedFeature or concept drift exceeds thresholdRetrain with recent data
Performance-basedAccuracy or F1 drops below baselineRetrain with expanded dataset
Schedule-basedConfigured time interval elapsedPeriodic retraining
Data-basedSignificant new training data availableRetrain with augmented dataset
ManualEngineer explicitly triggers retrainingOn-demand retraining

Trigger Configuration

PUT /api/v1/monitoring/config/:model_id
{
  "model_id": "model-xyz789",
  "retraining_triggers": {
    "drift_based": {
      "enabled": true,
      "drift_severity_threshold": "high",
      "min_features_drifted": 2,
      "cooldown_hours": 24
    },
    "performance_based": {
      "enabled": true,
      "accuracy_min": 0.85,
      "f1_min": 0.80,
      "evaluation_window_hours": 6,
      "cooldown_hours": 12
    },
    "schedule_based": {
      "enabled": true,
      "interval_days": 7,
      "preferred_time": "02:00",
      "timezone": "UTC"
    },
    "data_based": {
      "enabled": true,
      "min_new_samples": 5000,
      "check_interval_hours": 6
    }
  }
}

Trigger Evaluation

The retraining trigger service evaluates conditions on a configurable schedule:

class RetrainingTriggerService:
    async def evaluate(self, model_id: str) -> RetrainingDecision:
        drift_status = await self.drift_service.get_status(model_id)
        perf_status = await self.perf_service.get_status(model_id)
 
        if drift_status.severity >= DriftSeverity.HIGH:
            return RetrainingDecision(
                trigger="drift_based",
                reason=f"High drift in {drift_status.drifted_features} features",
                urgency="high",
            )
 
        if perf_status.accuracy < self.config.accuracy_min:
            return RetrainingDecision(
                trigger="performance_based",
                reason=f"Accuracy {perf_status.accuracy} below threshold",
                urgency="critical",
            )
 
        return RetrainingDecision(trigger=None, reason="No retraining needed")

Retraining Pipeline

When a trigger fires, the following pipeline executes:

  1. Data Collection: Gather training data based on trigger type
  2. Preprocessing: Apply feature engineering pipeline
  3. Training: Train model with same configuration as current version
  4. Evaluation: Evaluate on holdout set and compare with current model
  5. Validation: Run governance checks (fairness, explainability)
  6. Shadow: Deploy as shadow for production validation
  7. Promotion: Promote to production if criteria are met

Trigger Manual Retraining

POST /api/v1/monitoring/retrain/:model_id
{
  "reason": "New product category added to catalog",
  "training_config_override": {
    "data_query": "SELECT * FROM ml_features.customer_churn WHERE date >= '2025-01-01'",
    "hyperparameters": null
  }
}

Response

{
  "retraining_id": "retrain-abc123",
  "model_id": "model-xyz789",
  "trigger": "manual",
  "status": "submitted",
  "training_job_id": "train-def456",
  "created_at": "2025-03-15T10:00:00Z"
}

Cooldown Periods

To prevent excessive retraining, cooldown periods enforce minimum intervals between triggers:

Trigger TypeDefault CooldownConfigurable
Drift-based24 hoursYes
Performance-based12 hoursYes
Schedule-basedN/A (interval-based)Yes
Data-based6 hoursYes
ManualNoneNo

Retraining History

GET /api/v1/monitoring/retrain/:model_id/history
{
  "retraining_events": [
    {
      "id": "retrain-abc123",
      "trigger": "drift_based",
      "reason": "High drift in monthly_charges (PSI=0.32)",
      "old_version": "v2",
      "new_version": "v3",
      "accuracy_change": 0.013,
      "status": "completed",
      "created_at": "2025-03-15T10:00:00Z"
    }
  ]
}

Configuration

Environment VariableDefaultDescription
RETRAINING_TRIGGER_INTERVAL3600Trigger evaluation interval in seconds
RETRAINING_MAX_CONCURRENT2Max concurrent retraining jobs
RETRAINING_AUTO_PROMOTEfalseAuto-promote after validation
RETRAINING_COOLDOWN_HOURS24Default cooldown between retraining