Alertmanager Setup
Alertmanager receives alerts from Prometheus, deduplicates them, groups related alerts, applies silencing and inhibition rules, and routes notifications to the configured channels. It is deployed as part of the kube-prometheus-stack.
Configuration
alertmanager:
config:
global:
resolve_timeout: 5m
pagerduty_url: https://events.pagerduty.com/v2/enqueue
route:
receiver: default
group_by: [alertname, namespace, job]
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
routes:
- match:
severity: critical
receiver: pagerduty-critical
group_wait: 10s
repeat_interval: 1h
- match:
severity: warning
receiver: slack-warnings
repeat_interval: 4h
- match:
category: provisioning
receiver: slack-provisioning
group_by: [alertname, tenant]
receivers:
- name: default
slack_configs:
- channel: "#matih-alerts"
send_resolved: true
- name: pagerduty-critical
pagerduty_configs:
- service_key_file: /etc/alertmanager/secrets/pagerduty-key
- name: slack-warnings
slack_configs:
- channel: "#matih-warnings"
send_resolved: true
- name: slack-provisioning
slack_configs:
- channel: "#matih-provisioning"
send_resolved: trueRouting Rules
| Match | Receiver | Group Wait | Repeat |
|---|---|---|---|
severity: critical | PagerDuty | 10s | 1h |
severity: warning | Slack warnings | 30s | 4h |
category: provisioning | Slack provisioning | 30s | 4h |
| Default | Slack alerts | 30s | 4h |
Grouping
Alerts are grouped by alertname, namespace, and job to reduce notification noise. For example, if 10 pods of the same service are failing, they are grouped into a single notification.
Silencing
Temporarily silence alerts during maintenance windows:
# Create a silence via the Alertmanager API
amtool silence add alertname="ServiceDown" --duration=2h --comment="Planned maintenance"Inhibition Rules
Prevent lower-severity alerts when a higher-severity alert is already firing:
inhibit_rules:
- source_match:
severity: critical
target_match:
severity: warning
equal: [alertname, namespace]Accessing Alertmanager
kubectl port-forward svc/monitoring-alertmanager 9093:9093 -n matih-monitoringThen access http://localhost:9093 for the Alertmanager UI.