LLM Infrastructure Overview

Production - Multi-provider router, response cache, context management, validation, MCP, performance monitoring

The LLM Infrastructure provides a comprehensive layer for managing Large Language Model interactions across the AI Service. It includes a multi-provider router with tenant-level configuration, response caching with cost tracking, context window optimization, input/output validation, Model Context Protocol (MCP) integration, and performance monitoring.

12.6.1LLM Architecture

API Layer

Router APICompletion APICache APIContext APIMCP APIGraphQL API

Services

LLMRouterResponseCacheServiceContextOptimizerValidationServiceMCPServicePerformanceService

Providers

OpenAIAnthropicAzure OpenAIVertex AI (Gemini)AWS BedrockvLLM (Self-hosted)

Storage

CacheStore (Redis)UsageRecordsValidationHistoryPerformanceMetrics

Provider Configuration

# From src/config/settings.py
llm_provider: str = "openai"           # Default provider
openai_api_key: str = ""               # OpenAI key
openai_model: str = "gpt-4o"           # Default model
anthropic_api_key: str = ""            # Anthropic key
anthropic_model: str = "claude-opus-4-5-20250120"
azure_openai_endpoint: str = ""        # Azure endpoint
vllm_base_url: str = "http://vllm.matih-data-plane.svc.cluster.local:8000/v1"
 
# Extended thinking support
extended_thinking_enabled: bool = False
extended_thinking_budget_tokens: int = 10000

12.6.2Section Pages

Page	Description
LLM Router	Multi-provider routing with 5 strategies
Response Cache	Tenant-scoped caching with cost tracking
Context Management	Context window optimization
Context Intelligence	Intelligent context analysis and truncation
Validation	Input/output validation and safety filters
Model Context Protocol	MCP integration for external tools
Performance	Latency tracking, throughput monitoring
Providers	Provider-specific configuration

Semantic SQL Generation LLM Router