LLM Router Endpoints

The LLM Router API provides endpoints for managing multi-provider LLM routing, viewing routing decisions, configuring provider priorities, and monitoring provider health. The router wraps the core LLMRouter with persistent storage, analytics, and alert generation via the MultiLLMRouterService.

Route Request

Sends a completion request through the LLM router, which selects the optimal provider based on the active routing strategy.

Property	Value
Method	`POST`
Path	`/api/v1/llm/route`
Auth	JWT required

Request Body

{
  "messages": [
    {"role": "system", "content": "You are a data analyst."},
    {"role": "user", "content": "Explain this SQL query..."}
  ],
  "strategy": "cost_optimized",
  "max_tokens": 2048,
  "temperature": 0.7,
  "tenant_id": "acme-corp"
}

Response

{
  "content": "This SQL query performs...",
  "provider": "openai",
  "model": "gpt-4",
  "tokens_used": {
    "prompt": 150,
    "completion": 420,
    "total": 570
  },
  "latency_ms": 1230,
  "cost_estimate_usd": 0.0285,
  "routing_decision_id": "rd-abc123"
}

Get Router Configuration

Returns the current LLM router configuration including provider priorities and strategies.

Property	Value
Method	`GET`
Path	`/api/v1/llm/config`
Auth	JWT required (admin)

Response

{
  "default_strategy": "cost_optimized",
  "providers": [
    {
      "name": "openai",
      "enabled": true,
      "priority": 1,
      "models": ["gpt-4", "gpt-4-turbo", "gpt-3.5-turbo"],
      "rate_limit_rpm": 500,
      "max_tokens": 8192
    },
    {
      "name": "anthropic",
      "enabled": true,
      "priority": 2,
      "models": ["claude-3-opus", "claude-3-sonnet"],
      "rate_limit_rpm": 300,
      "max_tokens": 200000
    }
  ],
  "fallback_chain": ["openai", "anthropic", "vllm"]
}

Update Router Configuration

Updates routing strategy or provider configuration.

Property	Value
Method	`PUT`
Path	`/api/v1/llm/config`
Auth	JWT required (admin)

Request Body

{
  "default_strategy": "quality_optimized",
  "providers": [
    {
      "name": "openai",
      "priority": 1,
      "enabled": true
    }
  ]
}

Routing Strategies

Strategy	Description
`cost_optimized`	Routes to the cheapest available provider
`latency_optimized`	Routes to the provider with lowest average latency
`quality_optimized`	Routes to the provider with highest quality scores
`round_robin`	Distributes requests evenly across providers
`fallback`	Uses providers in priority order, falling back on failure

Get Provider Health

Returns health status and latency metrics for all configured LLM providers.

Property	Value
Method	`GET`
Path	`/api/v1/llm/health`
Auth	JWT required

Response

{
  "providers": [
    {
      "name": "openai",
      "healthy": true,
      "avg_latency_ms": 850,
      "p99_latency_ms": 2400,
      "success_rate": 0.997,
      "requests_last_hour": 1250
    }
  ]
}

List Routing Decisions

Returns recent routing decisions for analytics and debugging.

Property	Value
Method	`GET`
Path	`/api/v1/llm/decisions`
Auth	JWT required (admin)

Query Parameters

Parameter	Type	Required	Description
limit	integer	no	Max results (default 50)
provider	string	no	Filter by provider
strategy	string	no	Filter by strategy
outcome	string	no	Filter by outcome (success, failure, fallback)

Get Usage Summary

Returns token usage and cost summary across providers.

Property	Value
Method	`GET`
Path	`/api/v1/llm/usage`
Auth	JWT required (admin)

Query Parameters

Parameter	Type	Required	Description
period	string	no	Time period (hour, day, week, month)
tenant_id	string	no	Filter by tenant

Response

{
  "period": "day",
  "total_requests": 5420,
  "total_tokens": 2150000,
  "total_cost_usd": 85.40,
  "by_provider": {
    "openai": {"requests": 4200, "tokens": 1700000, "cost_usd": 68.00},
    "anthropic": {"requests": 1220, "tokens": 450000, "cost_usd": 17.40}
  }
}

Search Endpoints ML Endpoints