Performance Monitoring

Production - Latency tracking, throughput monitoring, provider health

The Performance Monitoring system tracks LLM request latency, throughput, error rates, and provider health across all tenants and providers.

12.6.7.1Performance Metrics

# Get performance summary
curl "http://localhost:8000/api/v1/llm/performance/summary?tenant_id=acme-corp&window_minutes=60"

{
  "summary": {
    "total_requests": 1247,
    "avg_latency_ms": 1850,
    "p50_latency_ms": 1200,
    "p95_latency_ms": 4500,
    "p99_latency_ms": 8200,
    "error_rate": 0.02,
    "tokens_per_second": 45.2,
    "by_provider": {
      "openai": {"avg_latency_ms": 1500, "error_rate": 0.01},
      "anthropic": {"avg_latency_ms": 2200, "error_rate": 0.03},
      "vllm": {"avg_latency_ms": 800, "error_rate": 0.005}
    }
  }
}

Model Context Protocol LLM Providers