AI Service Chart
The AI Service is the core intelligence engine of MATIH, providing natural language to SQL conversion, conversational analytics, LLM orchestration, and agent-based workflows. It is the most complex service chart with GPU support, multi-provider LLM configuration, and extensive infrastructure connections.
Chart Configuration
# From infrastructure/helm/ai-service/values.yaml
billing:
costCenter: "CC-ML"
application: "data-plane"
team: "ml-engineering"
workloadType: "api"
service: "ai-service"
replicaCount: 2
image:
registry: matihlabsacr.azurecr.io
repository: matih/ai-service
tag: ""
pullPolicy: Always
service:
type: ClusterIP
port: 8000
targetPort: 8000
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 4Gi
nvidia.com/gpu: 0 # Set to 1 for GPU inferenceAutoscaling Configuration
The AI service uses a conservative HPA profile with custom Prometheus metrics:
autoscaling:
enabled: true
profile: conservative
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 60
targetMemoryUtilizationPercentage: 70
prometheusMetrics:
- name: ai_service_inference_requests_per_second
targetAverageValue: "20"
- name: ai_service_inference_latency_seconds_p95
targetAverageValue: "2"
- name: ai_service_active_requests
targetAverageValue: "15"
- name: ai_service_llm_token_usage_rate
targetAverageValue: "5000"
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Pods
value: 2
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 120LLM Provider Configuration
The AI service supports multiple LLM providers with cloud-native authentication:
providers:
azure:
enabled: true
deploymentName: "gpt-4o"
deploymentMini: "gpt-4o-mini"
deploymentEmbedding: "text-embedding-3-large"
openai:
enabled: true
defaultModel: "gpt-4-turbo-preview"
anthropic:
enabled: true
defaultModel: "claude-3-5-sonnet-20241022"
vertexai:
enabled: false
useWorkloadIdentity: true
bedrock:
enabled: false
useIRSA: true
vllm:
enabled: false
baseUrl: "http://vllm:8000"API keys are sourced from Kubernetes secrets, never hardcoded:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: ai-service-secrets
key: openai-api-key
- name: AZURE_OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: ai-service-secrets
key: azure-api-keyInfrastructure Connections
The AI service connects to numerous infrastructure components:
| Component | Host (FQDN) | Port | Purpose |
|---|---|---|---|
| PostgreSQL | postgresql.matih-data-plane.svc | 5432 | Persistent storage |
| Redis | redis-master.matih-data-plane.svc | 6379 | Cache, sessions |
| Kafka | strimzi-kafka-kafka-bootstrap...svc | 9093 | Event streaming (TLS) |
| Qdrant | qdrant.matih-data-plane.svc | 6333 | Vector embeddings |
| Trino | trino.matih-data-plane.svc | 8080 | SQL execution |
| ClickHouse | clickhouse.matih-data-plane.svc | 8123 | OLAP queries |
| Spark Connect | spark-connect.matih-data-plane.svc | 15002 | Complex analytics |
| Polaris | polaris.matih-data-plane.svc | 8181 | Iceberg catalog |
| OpenMetadata | openmetadata.matih-data-plane.svc | 8585 | Data catalog |
| Query Engine | query-engine.matih-data-plane.svc | 8080 | SQL routing |
| Semantic Layer | semantic-layer.matih-data-plane.svc | 8086 | Semantic models |
| IAM Service | iam-service.matih-control-plane.svc | 8081 | Auth (cross-namespace) |
Kafka Topics
The AI service creates Strimzi KafkaTopic CRDs for domain events:
kafkaTopics:
enabled: true
clusterName: "strimzi-kafka"
topics:
stateChanges:
name: "matih.ai.state-changes"
partitions: 12
retentionMs: "2592000000" # 30 days
agentTraces:
name: "matih.ai.agent-traces"
partitions: 12
evaluations:
name: "matih.ai.evaluations"
partitions: 6
retentionMs: "7776000000" # 90 days
llmOps:
name: "matih.ai.llm-ops"
partitions: 12
retentionMs: "604800000" # 7 daysVPA Configuration
vpa:
enabled: true
updateMode: "Off" # Recommendations only - never auto-restart AI services
minAllowed:
cpu: "500m"
memory: "2Gi"
maxAllowed:
cpu: "8"
memory: "32Gi"Init Container: Database Migration
When using PostgreSQL backend, an Alembic migration init container runs before the main application:
initContainers:
- name: alembic-migrate
image: "{{ image }}"
command: ["python", "-m", "alembic", "upgrade", "head"]
env:
- name: DATABASE_URL
value: "postgresql+asyncpg://..."Module Feature Flags
The AI service supports modular deployment with feature flags:
modules:
core: true # Agents, LLM, guardrails
biPlatform: true # BI analytics
mlPlatform: true # ML training/serving
dataPlatform: true # dbt, quality, pipeline
contextGraph: true # Context graph, ontology
enterprise: true # Security, performance
supplementary: true # FDME, search, DNN builder