Ensemble Routing
The Ensemble Routing module provides intelligent traffic routing across multiple model versions and variants. It supports A/B testing, canary deployments, multi-armed bandit optimization, feature-based routing, and ensemble aggregation. The implementation is in src/inference/ensemble_router.py.
Routing Strategies
| Strategy | Description | Use Case |
|---|---|---|
weighted | Static traffic split by configured weights | Gradual traffic migration |
round_robin | Even distribution across models | Load balancing |
random | Random model selection | Baseline comparison |
canary | Small percentage to new version | Safe rollouts |
feature_based | Route by input features | Specialized models per segment |
bandit | Multi-armed bandit optimization | Automatic best model selection |
latency_based | Route to fastest responding model | Latency-sensitive workloads |
shadow | Mirror traffic without serving response | Pre-production validation |
Aggregation Methods
When multiple models are queried (ensemble mode), outputs are combined:
| Method | Description | Best For |
|---|---|---|
vote | Majority voting for classification | Classification ensembles |
average | Simple average of predictions | Regression ensembles |
weighted_average | Weighted by model confidence | Confidence-aware ensembles |
max_confidence | Use prediction with highest confidence | Risk-averse applications |
stacking | Meta-model combines outputs | Advanced ensembles |
cascade | Sequential models, early exit on confidence | Cost-efficient inference |
Configuration
@dataclass
class RoutingConfig:
strategy: RoutingStrategy = RoutingStrategy.WEIGHTED
model_weights: dict[str, float] = field(default_factory=dict)
canary_percentage: float = 10.0
canary_model: str = ""
baseline_model: str = ""
feature_routing_rules: list[dict] = field(default_factory=list)
bandit_exploration_rate: float = 0.1
aggregation_method: AggregationMethod = AggregationMethod.AVERAGECreate Ensemble
POST /api/v1/inference/ensemble{
"name": "churn-ensemble",
"models": [
{"model_id": "churn-xgb-v2", "weight": 0.6},
{"model_id": "churn-lgbm-v1", "weight": 0.4}
],
"strategy": "weighted",
"aggregation": "weighted_average"
}Canary Deployment
Route a small percentage of traffic to a new model version:
{
"name": "churn-canary",
"strategy": "canary",
"baseline_model": "churn-xgb-v2",
"canary_model": "churn-xgb-v3",
"canary_percentage": 10.0,
"promotion_criteria": {
"metric": "accuracy",
"min_improvement": 0.01,
"min_samples": 1000
}
}Multi-Armed Bandit
The bandit strategy automatically optimizes traffic allocation based on reward signals:
class BanditRouter:
def select_model(self, models: list[str]) -> str:
if random.random() < self.exploration_rate:
return random.choice(models) # Explore
return max(models, key=lambda m: self.rewards[m]) # Exploit| Parameter | Default | Description |
|---|---|---|
exploration_rate | 0.1 | Probability of exploring (epsilon) |
reward_window | 1000 | Recent predictions for reward calculation |
reward_metric | accuracy | Metric used as reward signal |
Feature-Based Routing
Route requests to specialized models based on input characteristics:
{
"strategy": "feature_based",
"rules": [
{
"condition": "contract_type == 'enterprise'",
"model": "churn-enterprise-v1"
},
{
"condition": "contract_type == 'consumer'",
"model": "churn-consumer-v1"
}
],
"default_model": "churn-general-v2"
}Monitoring
| Metric | Type | Description |
|---|---|---|
ensemble_routing_decisions_total | Counter | Decisions by strategy and model |
ensemble_model_latency_ms | Histogram | Per-model prediction latency |
ensemble_model_accuracy | Gauge | Per-model accuracy (from feedback) |
ensemble_canary_progress | Gauge | Canary traffic percentage |