Monitoring Namespaces
MATIH uses separate monitoring namespaces for the control plane and data plane, each hosting its own Prometheus instance and ServiceMonitor resources. This separation ensures that monitoring workloads do not compete with application workloads for resources.
Namespace Layout
| Namespace | Purpose | Key Components |
|---|---|---|
| matih-monitoring-control-plane | Control plane observability | Prometheus, Alertmanager, ServiceMonitors |
| matih-monitoring-data-plane | Data plane observability | Prometheus, Alertmanager, ServiceMonitors |
| matih-observability | Shared observability stack | Grafana, Loki, Tempo, OTEL Collector |
Prometheus Per Plane
Each plane has a dedicated Prometheus instance that scrapes only its own services:
# Control plane Prometheus
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus-control-plane
namespace: matih-monitoring-control-plane
spec:
serviceMonitorNamespaceSelector:
matchLabels:
name: matih-control-plane
serviceMonitorSelector: {}
resources:
requests:
cpu: 500m
memory: 2Gi
retention: 15d
storage:
volumeClaimTemplate:
spec:
storageClassName: gp3
resources:
requests:
storage: 50GiServiceMonitor Pattern
Every MATIH service deploys a ServiceMonitor CRD for automatic metric scraping:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: ai-service
namespace: matih-data-plane
labels:
app.kubernetes.io/name: ai-service
spec:
selector:
matchLabels:
app.kubernetes.io/name: ai-service
endpoints:
- port: http
path: /metrics
interval: 30s
scrapeTimeout: 10sCross-Namespace Scraping
Network policies explicitly allow Prometheus to scrape metrics across namespaces:
# From matih-data-plane network policies
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: matih-data-plane-prometheus-scrape
spec:
podSelector:
matchLabels:
app.kubernetes.io/part-of: matih-platform
ingress:
- from:
- namespaceSelector:
matchLabels:
name: matih-monitoring
- podSelector:
matchLabels:
app: prometheus
ports:
- protocol: TCP
port: 8080OTEL Collector
The OpenTelemetry Collector runs in the shared observability namespace and receives traces from all services:
# Tracing endpoint configuration from base chart
tracing:
enabled: true
endpoint: "http://monitoring-control-plane-otel-collector.matih-monitoring-control-plane.svc.cluster.local:4317"Services configure the OTEL endpoint via environment variables:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://otel-collector.matih-observability.svc.cluster.local:4317"