MATIH Platform is in active MVP development. Documentation reflects current implementation status.
17. Kubernetes & Helm
Monitoring Stack
Prometheus

Prometheus

Prometheus collects metrics from all MATIH services via ServiceMonitor CRDs, with separate instances for the control plane and data plane.


Deployment

Each plane has a dedicated Prometheus instance:

InstanceNamespaceRetentionStorage
Control Planematih-monitoring-control-plane15 days50Gi SSD
Data Planematih-monitoring-data-plane15 days100Gi SSD

ServiceMonitor Pattern

Every MATIH service deploys a ServiceMonitor:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: ai-service
  labels:
    app.kubernetes.io/name: ai-service
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: ai-service
  endpoints:
    - port: http
      path: /metrics          # Python services
      interval: 30s
      scrapeTimeout: 10s

For Java Spring Boot services, the metrics path is /actuator/prometheus.


Scrape Configuration

Service TypeMetrics PathPortInterval
Java Spring Boot/actuator/prometheus808030s
Python FastAPI/metrics800030s
Node.js/metrics300030s
Trino/v1/status808030s
Kafka (JMX)/metrics940430s

Key Metrics

MetricTypePurpose
http_requests_totalCounterRequest rate per endpoint
http_request_duration_secondsHistogramLatency distribution
ai_service_inference_requests_per_secondGaugeAI inference throughput
ai_service_llm_token_usage_rateGaugeLLM token consumption
kafka_consumer_lagGaugeKafka consumer lag
trino_query_duration_secondsHistogramQuery execution time

Pod Annotations

Services expose metrics via pod annotations for legacy scraping:

podAnnotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "8000"
  prometheus.io/path: "/metrics"