LLM Router

Production - Multi-provider routing with 5 strategies, tenant-level config, budget limits

The LLM Router manages multiple LLM providers with tenant-specific configuration, routing strategies, budget limits, and usage tracking. Defined in data-plane/ai-service/src/llm/providers/router.py.

12.6.1.1Routing Strategies

Strategy	Selection Logic
`cost_optimized`	vLLM > Gemini Flash > Bedrock Haiku > Claude Haiku > GPT-4o-mini
`latency_optimized`	vLLM > Gemini Flash > GPT-4o-mini > Claude Haiku
`quality_optimized`	Claude Opus > GPT-4 > Gemini Pro > Bedrock Sonnet
`round_robin`	Cycles through enabled providers
`fallback`	Uses default; falls back to next on failure

Tenant Configuration

config = TenantLLMConfig(
    tenant_id="acme-corp",
    providers={
        ProviderType.OPENAI: {"api_key": "sk-...", "default_model": "gpt-4o"},
        ProviderType.ANTHROPIC: {"api_key": "sk-ant-...", "default_model": "claude-3-5-sonnet"},
        ProviderType.VLLM: {"base_url": "http://vllm:8000"},
    },
    default_provider=ProviderType.OPENAI,
    routing_strategy=RoutingStrategy.COST_OPTIMIZED,
    budget_limit_usd=500.0,  # Monthly budget cap
)
router.register_tenant_config(config)

12.6.1.2Usage Tracking

Every LLM call records a UsageRecord:

@dataclass
class UsageRecord:
    tenant_id: str
    provider: ProviderType
    model: str
    input_tokens: int
    output_tokens: int
    cost_usd: float
    latency_ms: float
    timestamp: datetime

# Get usage summary
curl "http://localhost:8000/api/v1/llm/router/usage?tenant_id=acme-corp&period_days=30"

{
  "tenant_id": "acme-corp",
  "period_days": 30,
  "total_requests": 12450,
  "total_tokens": 45200000,
  "total_cost_usd": 128.45,
  "by_provider": {
    "openai": {"requests": 8200, "tokens": 32000000, "cost_usd": 96.00},
    "vllm": {"requests": 3100, "tokens": 11000000, "cost_usd": 0.00},
    "anthropic": {"requests": 1150, "tokens": 2200000, "cost_usd": 32.45}
  }
}

# Health check all providers
curl "http://localhost:8000/api/v1/llm/router/health?tenant_id=acme-corp"

LLM Infrastructure Overview Response Cache