MATIH Platform is in active MVP development. Documentation reflects current implementation status.
19. Observability & Operations
Alertmanager Setup

Alertmanager Setup

Alertmanager receives alerts from Prometheus, deduplicates them, groups related alerts, applies silencing and inhibition rules, and routes notifications to the configured channels. It is deployed as part of the kube-prometheus-stack.


Configuration

alertmanager:
  config:
    global:
      resolve_timeout: 5m
      pagerduty_url: https://events.pagerduty.com/v2/enqueue
 
    route:
      receiver: default
      group_by: [alertname, namespace, job]
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 4h
 
      routes:
        - match:
            severity: critical
          receiver: pagerduty-critical
          group_wait: 10s
          repeat_interval: 1h
 
        - match:
            severity: warning
          receiver: slack-warnings
          repeat_interval: 4h
 
        - match:
            category: provisioning
          receiver: slack-provisioning
          group_by: [alertname, tenant]
 
    receivers:
      - name: default
        slack_configs:
          - channel: "#matih-alerts"
            send_resolved: true
 
      - name: pagerduty-critical
        pagerduty_configs:
          - service_key_file: /etc/alertmanager/secrets/pagerduty-key
 
      - name: slack-warnings
        slack_configs:
          - channel: "#matih-warnings"
            send_resolved: true
 
      - name: slack-provisioning
        slack_configs:
          - channel: "#matih-provisioning"
            send_resolved: true

Routing Rules

MatchReceiverGroup WaitRepeat
severity: criticalPagerDuty10s1h
severity: warningSlack warnings30s4h
category: provisioningSlack provisioning30s4h
DefaultSlack alerts30s4h

Grouping

Alerts are grouped by alertname, namespace, and job to reduce notification noise. For example, if 10 pods of the same service are failing, they are grouped into a single notification.


Silencing

Temporarily silence alerts during maintenance windows:

# Create a silence via the Alertmanager API
amtool silence add alertname="ServiceDown" --duration=2h --comment="Planned maintenance"

Inhibition Rules

Prevent lower-severity alerts when a higher-severity alert is already firing:

inhibit_rules:
  - source_match:
      severity: critical
    target_match:
      severity: warning
    equal: [alertname, namespace]

Accessing Alertmanager

kubectl port-forward svc/monitoring-alertmanager 9093:9093 -n matih-monitoring

Then access http://localhost:9093 for the Alertmanager UI.