LLM Infrastructure Overview
Production - Multi-provider router, response cache, context management, validation, MCP, performance monitoring
The LLM Infrastructure provides a comprehensive layer for managing Large Language Model interactions across the AI Service. It includes a multi-provider router with tenant-level configuration, response caching with cost tracking, context window optimization, input/output validation, Model Context Protocol (MCP) integration, and performance monitoring.
12.6.1LLM Architecture
API Layer
Router APICompletion APICache APIContext APIMCP APIGraphQL API
Services
LLMRouterResponseCacheServiceContextOptimizerValidationServiceMCPServicePerformanceService
Providers
OpenAIAnthropicAzure OpenAIVertex AI (Gemini)AWS BedrockvLLM (Self-hosted)
Storage
CacheStore (Redis)UsageRecordsValidationHistoryPerformanceMetrics
Provider Configuration
# From src/config/settings.py
llm_provider: str = "openai" # Default provider
openai_api_key: str = "" # OpenAI key
openai_model: str = "gpt-4o" # Default model
anthropic_api_key: str = "" # Anthropic key
anthropic_model: str = "claude-opus-4-5-20250120"
azure_openai_endpoint: str = "" # Azure endpoint
vllm_base_url: str = "http://vllm.matih-data-plane.svc.cluster.local:8000/v1"
# Extended thinking support
extended_thinking_enabled: bool = False
extended_thinking_budget_tokens: int = 1000012.6.2Section Pages
| Page | Description |
|---|---|
| LLM Router | Multi-provider routing with 5 strategies |
| Response Cache | Tenant-scoped caching with cost tracking |
| Context Management | Context window optimization |
| Context Intelligence | Intelligent context analysis and truncation |
| Validation | Input/output validation and safety filters |
| Model Context Protocol | MCP integration for external tools |
| Performance | Latency tracking, throughput monitoring |
| Providers | Provider-specific configuration |