LLM Router Endpoints
The LLM Router API provides endpoints for managing multi-provider LLM routing, viewing routing decisions, configuring provider priorities, and monitoring provider health. The router wraps the core LLMRouter with persistent storage, analytics, and alert generation via the MultiLLMRouterService.
Route Request
Sends a completion request through the LLM router, which selects the optimal provider based on the active routing strategy.
| Property | Value |
|---|---|
| Method | POST |
| Path | /api/v1/llm/route |
| Auth | JWT required |
Request Body
{
"messages": [
{"role": "system", "content": "You are a data analyst."},
{"role": "user", "content": "Explain this SQL query..."}
],
"strategy": "cost_optimized",
"max_tokens": 2048,
"temperature": 0.7,
"tenant_id": "acme-corp"
}Response
{
"content": "This SQL query performs...",
"provider": "openai",
"model": "gpt-4",
"tokens_used": {
"prompt": 150,
"completion": 420,
"total": 570
},
"latency_ms": 1230,
"cost_estimate_usd": 0.0285,
"routing_decision_id": "rd-abc123"
}Get Router Configuration
Returns the current LLM router configuration including provider priorities and strategies.
| Property | Value |
|---|---|
| Method | GET |
| Path | /api/v1/llm/config |
| Auth | JWT required (admin) |
Response
{
"default_strategy": "cost_optimized",
"providers": [
{
"name": "openai",
"enabled": true,
"priority": 1,
"models": ["gpt-4", "gpt-4-turbo", "gpt-3.5-turbo"],
"rate_limit_rpm": 500,
"max_tokens": 8192
},
{
"name": "anthropic",
"enabled": true,
"priority": 2,
"models": ["claude-3-opus", "claude-3-sonnet"],
"rate_limit_rpm": 300,
"max_tokens": 200000
}
],
"fallback_chain": ["openai", "anthropic", "vllm"]
}Update Router Configuration
Updates routing strategy or provider configuration.
| Property | Value |
|---|---|
| Method | PUT |
| Path | /api/v1/llm/config |
| Auth | JWT required (admin) |
Request Body
{
"default_strategy": "quality_optimized",
"providers": [
{
"name": "openai",
"priority": 1,
"enabled": true
}
]
}Routing Strategies
| Strategy | Description |
|---|---|
cost_optimized | Routes to the cheapest available provider |
latency_optimized | Routes to the provider with lowest average latency |
quality_optimized | Routes to the provider with highest quality scores |
round_robin | Distributes requests evenly across providers |
fallback | Uses providers in priority order, falling back on failure |
Get Provider Health
Returns health status and latency metrics for all configured LLM providers.
| Property | Value |
|---|---|
| Method | GET |
| Path | /api/v1/llm/health |
| Auth | JWT required |
Response
{
"providers": [
{
"name": "openai",
"healthy": true,
"avg_latency_ms": 850,
"p99_latency_ms": 2400,
"success_rate": 0.997,
"requests_last_hour": 1250
}
]
}List Routing Decisions
Returns recent routing decisions for analytics and debugging.
| Property | Value |
|---|---|
| Method | GET |
| Path | /api/v1/llm/decisions |
| Auth | JWT required (admin) |
Query Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| limit | integer | no | Max results (default 50) |
| provider | string | no | Filter by provider |
| strategy | string | no | Filter by strategy |
| outcome | string | no | Filter by outcome (success, failure, fallback) |
Get Usage Summary
Returns token usage and cost summary across providers.
| Property | Value |
|---|---|
| Method | GET |
| Path | /api/v1/llm/usage |
| Auth | JWT required (admin) |
Query Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| period | string | no | Time period (hour, day, week, month) |
| tenant_id | string | no | Filter by tenant |
Response
{
"period": "day",
"total_requests": 5420,
"total_tokens": 2150000,
"total_cost_usd": 85.40,
"by_provider": {
"openai": {"requests": 4200, "tokens": 1700000, "cost_usd": 68.00},
"anthropic": {"requests": 1220, "tokens": 450000, "cost_usd": 17.40}
}
}