LLM Router
Production - Multi-provider routing with 5 strategies, tenant-level config, budget limits
The LLM Router manages multiple LLM providers with tenant-specific configuration, routing strategies, budget limits, and usage tracking. Defined in data-plane/ai-service/src/llm/providers/router.py.
12.6.1.1Routing Strategies
| Strategy | Selection Logic |
|---|---|
cost_optimized | vLLM > Gemini Flash > Bedrock Haiku > Claude Haiku > GPT-4o-mini |
latency_optimized | vLLM > Gemini Flash > GPT-4o-mini > Claude Haiku |
quality_optimized | Claude Opus > GPT-4 > Gemini Pro > Bedrock Sonnet |
round_robin | Cycles through enabled providers |
fallback | Uses default; falls back to next on failure |
Tenant Configuration
config = TenantLLMConfig(
tenant_id="acme-corp",
providers={
ProviderType.OPENAI: {"api_key": "sk-...", "default_model": "gpt-4o"},
ProviderType.ANTHROPIC: {"api_key": "sk-ant-...", "default_model": "claude-3-5-sonnet"},
ProviderType.VLLM: {"base_url": "http://vllm:8000"},
},
default_provider=ProviderType.OPENAI,
routing_strategy=RoutingStrategy.COST_OPTIMIZED,
budget_limit_usd=500.0, # Monthly budget cap
)
router.register_tenant_config(config)12.6.1.2Usage Tracking
Every LLM call records a UsageRecord:
@dataclass
class UsageRecord:
tenant_id: str
provider: ProviderType
model: str
input_tokens: int
output_tokens: int
cost_usd: float
latency_ms: float
timestamp: datetime# Get usage summary
curl "http://localhost:8000/api/v1/llm/router/usage?tenant_id=acme-corp&period_days=30"{
"tenant_id": "acme-corp",
"period_days": 30,
"total_requests": 12450,
"total_tokens": 45200000,
"total_cost_usd": 128.45,
"by_provider": {
"openai": {"requests": 8200, "tokens": 32000000, "cost_usd": 96.00},
"vllm": {"requests": 3100, "tokens": 11000000, "cost_usd": 0.00},
"anthropic": {"requests": 1150, "tokens": 2200000, "cost_usd": 32.45}
}
}# Health check all providers
curl "http://localhost:8000/api/v1/llm/router/health?tenant_id=acme-corp"