Shadow Deployment
Shadow deployment enables validation of new model versions by mirroring production traffic without affecting live predictions. The shadow model receives the same inputs as the primary model, and its predictions are recorded for offline comparison but never returned to the client. This is the safest way to validate model changes before canary or full rollout.
Shadow Architecture
Client Request --> Primary Model --> Response to Client
|
(mirror)
|
Shadow Model --> Predictions Logged (not served)
|
Comparison AnalyticsCreate Shadow Deployment
POST /api/v1/inference/shadow{
"primary_model": "churn-xgb-v2",
"shadow_model": "churn-xgb-v3",
"sample_rate": 1.0,
"comparison_metrics": ["accuracy", "f1_score", "latency_ms"],
"duration_hours": 48,
"auto_promote": false,
"promotion_criteria": {
"metric": "f1_score",
"min_improvement": 0.005,
"max_latency_increase_ms": 10
}
}Response
{
"shadow_id": "shadow-abc123",
"status": "active",
"primary_model": "churn-xgb-v2",
"shadow_model": "churn-xgb-v3",
"started_at": "2025-03-15T10:00:00Z",
"expires_at": "2025-03-17T10:00:00Z"
}Get Shadow Comparison
GET /api/v1/inference/shadow/:shadow_id/comparison{
"shadow_id": "shadow-abc123",
"total_predictions": 15000,
"agreement_rate": 0.94,
"comparison": {
"primary": {
"model": "churn-xgb-v2",
"accuracy": 0.912,
"f1_score": 0.895,
"avg_latency_ms": 12.3
},
"shadow": {
"model": "churn-xgb-v3",
"accuracy": 0.925,
"f1_score": 0.908,
"avg_latency_ms": 13.1
},
"improvement": {
"accuracy": 0.013,
"f1_score": 0.013,
"latency_ms": 0.8
}
},
"promotion_eligible": true,
"recommendation": "Shadow model shows consistent improvement; eligible for canary promotion"
}Shadow Modes
| Mode | Description | Sample Rate |
|---|---|---|
| Full mirror | Every request is shadowed | 1.0 (100%) |
| Sampled | Random subset of requests | 0.01 - 0.99 |
| Conditional | Only shadow requests matching criteria | Rule-based |
Comparison Metrics
| Metric | Comparison Type | Alert If |
|---|---|---|
| Prediction agreement | Percentage of matching predictions | Below 90% |
| Accuracy | Absolute improvement | Degradation detected |
| F1 score | Absolute improvement | Degradation detected |
| Latency | Millisecond difference | Increase above threshold |
| Error rate | Percentage difference | Shadow errors higher |
Shadow Lifecycle
- Created: Shadow deployment configured but not yet active
- Active: Traffic is being mirrored to the shadow model
- Analyzing: Data collection complete, running comparison
- Completed: Analysis finished, promotion decision available
- Promoted: Shadow model promoted to primary (if auto-promote)
- Expired: Duration exceeded without promotion
Configuration
| Environment Variable | Default | Description |
|---|---|---|
SHADOW_MAX_ACTIVE | 3 | Max concurrent shadow deployments |
SHADOW_DEFAULT_DURATION_HOURS | 48 | Default shadow duration |
SHADOW_MIN_SAMPLES | 1000 | Minimum samples before comparison |
SHADOW_STORAGE_RETENTION_DAYS | 30 | Prediction log retention |