Performance Monitoring
Production - Latency tracking, throughput monitoring, provider health
The Performance Monitoring system tracks LLM request latency, throughput, error rates, and provider health across all tenants and providers.
12.6.7.1Performance Metrics
# Get performance summary
curl "http://localhost:8000/api/v1/llm/performance/summary?tenant_id=acme-corp&window_minutes=60"{
"summary": {
"total_requests": 1247,
"avg_latency_ms": 1850,
"p50_latency_ms": 1200,
"p95_latency_ms": 4500,
"p99_latency_ms": 8200,
"error_rate": 0.02,
"tokens_per_second": 45.2,
"by_provider": {
"openai": {"avg_latency_ms": 1500, "error_rate": 0.01},
"anthropic": {"avg_latency_ms": 2200, "error_rate": 0.03},
"vllm": {"avg_latency_ms": 800, "error_rate": 0.005}
}
}
}