MATIH Platform is in active MVP development. Documentation reflects current implementation status.
13. ML Service & MLOps
Inference & Serving
Ensemble Routing

Ensemble Routing

The Ensemble Routing module provides intelligent traffic routing across multiple model versions and variants. It supports A/B testing, canary deployments, multi-armed bandit optimization, feature-based routing, and ensemble aggregation. The implementation is in src/inference/ensemble_router.py.


Routing Strategies

StrategyDescriptionUse Case
weightedStatic traffic split by configured weightsGradual traffic migration
round_robinEven distribution across modelsLoad balancing
randomRandom model selectionBaseline comparison
canarySmall percentage to new versionSafe rollouts
feature_basedRoute by input featuresSpecialized models per segment
banditMulti-armed bandit optimizationAutomatic best model selection
latency_basedRoute to fastest responding modelLatency-sensitive workloads
shadowMirror traffic without serving responsePre-production validation

Aggregation Methods

When multiple models are queried (ensemble mode), outputs are combined:

MethodDescriptionBest For
voteMajority voting for classificationClassification ensembles
averageSimple average of predictionsRegression ensembles
weighted_averageWeighted by model confidenceConfidence-aware ensembles
max_confidenceUse prediction with highest confidenceRisk-averse applications
stackingMeta-model combines outputsAdvanced ensembles
cascadeSequential models, early exit on confidenceCost-efficient inference

Configuration

@dataclass
class RoutingConfig:
    strategy: RoutingStrategy = RoutingStrategy.WEIGHTED
    model_weights: dict[str, float] = field(default_factory=dict)
    canary_percentage: float = 10.0
    canary_model: str = ""
    baseline_model: str = ""
    feature_routing_rules: list[dict] = field(default_factory=list)
    bandit_exploration_rate: float = 0.1
    aggregation_method: AggregationMethod = AggregationMethod.AVERAGE

Create Ensemble

POST /api/v1/inference/ensemble
{
  "name": "churn-ensemble",
  "models": [
    {"model_id": "churn-xgb-v2", "weight": 0.6},
    {"model_id": "churn-lgbm-v1", "weight": 0.4}
  ],
  "strategy": "weighted",
  "aggregation": "weighted_average"
}

Canary Deployment

Route a small percentage of traffic to a new model version:

{
  "name": "churn-canary",
  "strategy": "canary",
  "baseline_model": "churn-xgb-v2",
  "canary_model": "churn-xgb-v3",
  "canary_percentage": 10.0,
  "promotion_criteria": {
    "metric": "accuracy",
    "min_improvement": 0.01,
    "min_samples": 1000
  }
}

Multi-Armed Bandit

The bandit strategy automatically optimizes traffic allocation based on reward signals:

class BanditRouter:
    def select_model(self, models: list[str]) -> str:
        if random.random() < self.exploration_rate:
            return random.choice(models)  # Explore
        return max(models, key=lambda m: self.rewards[m])  # Exploit
ParameterDefaultDescription
exploration_rate0.1Probability of exploring (epsilon)
reward_window1000Recent predictions for reward calculation
reward_metricaccuracyMetric used as reward signal

Feature-Based Routing

Route requests to specialized models based on input characteristics:

{
  "strategy": "feature_based",
  "rules": [
    {
      "condition": "contract_type == 'enterprise'",
      "model": "churn-enterprise-v1"
    },
    {
      "condition": "contract_type == 'consumer'",
      "model": "churn-consumer-v1"
    }
  ],
  "default_model": "churn-general-v2"
}

Monitoring

MetricTypeDescription
ensemble_routing_decisions_totalCounterDecisions by strategy and model
ensemble_model_latency_msHistogramPer-model prediction latency
ensemble_model_accuracyGaugePer-model accuracy (from feedback)
ensemble_canary_progressGaugeCanary traffic percentage