Shadow Deployment

Shadow deployment enables validation of new model versions by mirroring production traffic without affecting live predictions. The shadow model receives the same inputs as the primary model, and its predictions are recorded for offline comparison but never returned to the client. This is the safest way to validate model changes before canary or full rollout.

Shadow Architecture

Client Request --> Primary Model --> Response to Client
                        |
                   (mirror)
                        |
                  Shadow Model --> Predictions Logged (not served)
                                        |
                                  Comparison Analytics

Create Shadow Deployment

POST /api/v1/inference/shadow

{
  "primary_model": "churn-xgb-v2",
  "shadow_model": "churn-xgb-v3",
  "sample_rate": 1.0,
  "comparison_metrics": ["accuracy", "f1_score", "latency_ms"],
  "duration_hours": 48,
  "auto_promote": false,
  "promotion_criteria": {
    "metric": "f1_score",
    "min_improvement": 0.005,
    "max_latency_increase_ms": 10
  }
}

Response

{
  "shadow_id": "shadow-abc123",
  "status": "active",
  "primary_model": "churn-xgb-v2",
  "shadow_model": "churn-xgb-v3",
  "started_at": "2025-03-15T10:00:00Z",
  "expires_at": "2025-03-17T10:00:00Z"
}

Get Shadow Comparison

GET /api/v1/inference/shadow/:shadow_id/comparison

{
  "shadow_id": "shadow-abc123",
  "total_predictions": 15000,
  "agreement_rate": 0.94,
  "comparison": {
    "primary": {
      "model": "churn-xgb-v2",
      "accuracy": 0.912,
      "f1_score": 0.895,
      "avg_latency_ms": 12.3
    },
    "shadow": {
      "model": "churn-xgb-v3",
      "accuracy": 0.925,
      "f1_score": 0.908,
      "avg_latency_ms": 13.1
    },
    "improvement": {
      "accuracy": 0.013,
      "f1_score": 0.013,
      "latency_ms": 0.8
    }
  },
  "promotion_eligible": true,
  "recommendation": "Shadow model shows consistent improvement; eligible for canary promotion"
}

Shadow Modes

Mode	Description	Sample Rate
Full mirror	Every request is shadowed	1.0 (100%)
Sampled	Random subset of requests	0.01 - 0.99
Conditional	Only shadow requests matching criteria	Rule-based

Comparison Metrics

Metric	Comparison Type	Alert If
Prediction agreement	Percentage of matching predictions	Below 90%
Accuracy	Absolute improvement	Degradation detected
F1 score	Absolute improvement	Degradation detected
Latency	Millisecond difference	Increase above threshold
Error rate	Percentage difference	Shadow errors higher

Shadow Lifecycle

Created: Shadow deployment configured but not yet active
Active: Traffic is being mirrored to the shadow model
Analyzing: Data collection complete, running comparison
Completed: Analysis finished, promotion decision available
Promoted: Shadow model promoted to primary (if auto-promote)
Expired: Duration exceeded without promotion

Configuration

Environment Variable	Default	Description
`SHADOW_MAX_ACTIVE`	`3`	Max concurrent shadow deployments
`SHADOW_DEFAULT_DURATION_HOURS`	`48`	Default shadow duration
`SHADOW_MIN_SAMPLES`	`1000`	Minimum samples before comparison
`SHADOW_STORAGE_RETENTION_DAYS`	`30`	Prediction log retention

Ensemble Routing Inference Optimization