AI Service Chart

The AI Service is the core intelligence engine of MATIH, providing natural language to SQL conversion, conversational analytics, LLM orchestration, and agent-based workflows. It is the most complex service chart with GPU support, multi-provider LLM configuration, and extensive infrastructure connections.

Chart Configuration

# From infrastructure/helm/ai-service/values.yaml
billing:
  costCenter: "CC-ML"
  application: "data-plane"
  team: "ml-engineering"
  workloadType: "api"
  service: "ai-service"
 
replicaCount: 2
 
image:
  registry: matihlabsacr.azurecr.io
  repository: matih/ai-service
  tag: ""
  pullPolicy: Always
 
service:
  type: ClusterIP
  port: 8000
  targetPort: 8000
 
resources:
  requests:
    cpu: 500m
    memory: 1Gi
  limits:
    cpu: 2000m
    memory: 4Gi
    nvidia.com/gpu: 0  # Set to 1 for GPU inference

Autoscaling Configuration

The AI service uses a conservative HPA profile with custom Prometheus metrics:

autoscaling:
  enabled: true
  profile: conservative
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 60
  targetMemoryUtilizationPercentage: 70
  prometheusMetrics:
    - name: ai_service_inference_requests_per_second
      targetAverageValue: "20"
    - name: ai_service_inference_latency_seconds_p95
      targetAverageValue: "2"
    - name: ai_service_active_requests
      targetAverageValue: "15"
    - name: ai_service_llm_token_usage_rate
      targetAverageValue: "5000"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
        - type: Pods
          value: 2
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 10
          periodSeconds: 120

LLM Provider Configuration

The AI service supports multiple LLM providers with cloud-native authentication:

providers:
  azure:
    enabled: true
    deploymentName: "gpt-4o"
    deploymentMini: "gpt-4o-mini"
    deploymentEmbedding: "text-embedding-3-large"
  openai:
    enabled: true
    defaultModel: "gpt-4-turbo-preview"
  anthropic:
    enabled: true
    defaultModel: "claude-3-5-sonnet-20241022"
  vertexai:
    enabled: false
    useWorkloadIdentity: true
  bedrock:
    enabled: false
    useIRSA: true
  vllm:
    enabled: false
    baseUrl: "http://vllm:8000"

API keys are sourced from Kubernetes secrets, never hardcoded:

- name: OPENAI_API_KEY
  valueFrom:
    secretKeyRef:
      name: ai-service-secrets
      key: openai-api-key
- name: AZURE_OPENAI_API_KEY
  valueFrom:
    secretKeyRef:
      name: ai-service-secrets
      key: azure-api-key

Infrastructure Connections

The AI service connects to numerous infrastructure components:

Component	Host (FQDN)	Port	Purpose
PostgreSQL	postgresql.matih-data-plane.svc	5432	Persistent storage
Redis	redis-master.matih-data-plane.svc	6379	Cache, sessions
Kafka	strimzi-kafka-kafka-bootstrap...svc	9093	Event streaming (TLS)
Qdrant	qdrant.matih-data-plane.svc	6333	Vector embeddings
Trino	trino.matih-data-plane.svc	8080	SQL execution
ClickHouse	clickhouse.matih-data-plane.svc	8123	OLAP queries
Spark Connect	spark-connect.matih-data-plane.svc	15002	Complex analytics
Polaris	polaris.matih-data-plane.svc	8181	Iceberg catalog
OpenMetadata	openmetadata.matih-data-plane.svc	8585	Data catalog
Query Engine	query-engine.matih-data-plane.svc	8080	SQL routing
Semantic Layer	semantic-layer.matih-data-plane.svc	8086	Semantic models
IAM Service	iam-service.matih-control-plane.svc	8081	Auth (cross-namespace)

Kafka Topics

The AI service creates Strimzi KafkaTopic CRDs for domain events:

kafkaTopics:
  enabled: true
  clusterName: "strimzi-kafka"
  topics:
    stateChanges:
      name: "matih.ai.state-changes"
      partitions: 12
      retentionMs: "2592000000"  # 30 days
    agentTraces:
      name: "matih.ai.agent-traces"
      partitions: 12
    evaluations:
      name: "matih.ai.evaluations"
      partitions: 6
      retentionMs: "7776000000"  # 90 days
    llmOps:
      name: "matih.ai.llm-ops"
      partitions: 12
      retentionMs: "604800000"   # 7 days

VPA Configuration

vpa:
  enabled: true
  updateMode: "Off"  # Recommendations only - never auto-restart AI services
  minAllowed:
    cpu: "500m"
    memory: "2Gi"
  maxAllowed:
    cpu: "8"
    memory: "32Gi"

Init Container: Database Migration

When using PostgreSQL backend, an Alembic migration init container runs before the main application:

initContainers:
  - name: alembic-migrate
    image: "{{ image }}"
    command: ["python", "-m", "alembic", "upgrade", "head"]
    env:
      - name: DATABASE_URL
        value: "postgresql+asyncpg://..."

Module Feature Flags

The AI service supports modular deployment with feature flags:

modules:
  core: true           # Agents, LLM, guardrails
  biPlatform: true     # BI analytics
  mlPlatform: true     # ML training/serving
  dataPlatform: true   # dbt, quality, pipeline
  contextGraph: true   # Context graph, ontology
  enterprise: true     # Security, performance
  supplementary: true  # FDME, search, DNN builder

Overview Query Engine