MATIH Platform is in active MVP development. Documentation reflects current implementation status.
17. Kubernetes & Helm
Monitoring Stack
Alertmanager

Alertmanager

Alertmanager handles alert deduplication, grouping, routing, and notification delivery for all Prometheus-generated alerts.


Alert Routing

route:
  receiver: "default"
  group_by: ["alertname", "namespace", "service"]
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  routes:
    - match:
        severity: critical
      receiver: "pagerduty-critical"
      repeat_interval: 1h
    - match:
        severity: warning
      receiver: "slack-warnings"
      repeat_interval: 4h

Receivers

ReceiverChannelSeverity
pagerduty-criticalPagerDutyCritical
slack-warningsSlackWarning
email-dailyEmailInfo (daily digest)
webhook-customHTTP POSTCustom integrations

Key Alert Rules

AlertConditionSeverity
ServiceDownInstance unreachable for 5mCritical
HighErrorRateError rate > 5% for 10mWarning
HighLatencyP95 latency > 5s for 10mWarning
KafkaConsumerLagLag > 10000 for 15mWarning
PostgresReplicationLagLag > 30sCritical
DiskSpaceLowDisk usage > 85%Warning
MemoryPressureMemory usage > 90%Critical
PodCrashLooping> 3 restarts in 10mCritical