Config Backup
Configuration backup covers Kubernetes resources, Helm release values, ConfigMaps, and Secrets. These are critical for rebuilding the platform from scratch in a disaster recovery scenario.
What to Back Up
| Resource Type | Backup Method | Frequency |
|---|---|---|
| Helm values files | Git repository | Every commit |
| Kubernetes ConfigMaps | Velero | Daily |
| Kubernetes Secrets | External Secrets Operator (synced from vault) | Continuous |
| CRDs (ServiceMonitors, Certificates) | Velero | Daily |
| Terraform state | Remote backend (Azure Storage / S3) | Every apply |
Helm Values
All Helm values files are stored in Git and represent the declarative state of the platform:
| Values File | Service | Path |
|---|---|---|
values.yaml | Base defaults | infrastructure/helm/{service}/values.yaml |
values-dev.yaml | Dev overrides | infrastructure/helm/{service}/values-dev.yaml |
values-prod.yaml | Prod overrides | infrastructure/helm/{service}/values-prod.yaml |
Since these are in Git, they are automatically backed up and version-controlled.
ConfigMap Backup
ConfigMaps containing runtime configuration should be backed up via Velero (see Velero):
| ConfigMap | Namespace | Content |
|---|---|---|
| Grafana dashboards | matih-monitoring | Dashboard JSON definitions |
| Prometheus rules | matih-monitoring | Alerting and recording rules |
| Tenant configurations | matih-control-plane | Per-tenant settings |
Secret Management
Secrets are never stored in Git. They are managed through:
| Method | Environment | Description |
|---|---|---|
| External Secrets Operator | Production | Syncs from Azure Key Vault / AWS Secrets Manager |
dev-secrets.sh | Development | Creates dev secrets from templates |
| Velero | All | Cluster-level backup includes encrypted secrets |
Terraform State
Terraform state is stored in a remote backend:
| Backend | Environment | Path |
|---|---|---|
| Azure Storage | Azure | matih-tfstate container |
| S3 | AWS | matih-terraform-state bucket |
State locking prevents concurrent modifications.
Recovery Procedure
To rebuild the platform from backups:
- Provision infrastructure using Terraform
- Restore Kubernetes cluster resources from Velero backup
- Verify secrets are synced from the vault via ESO
- Deploy services using Helm with values from Git
- Restore databases from PostgreSQL backups
- Verify all health checks pass
./scripts/disaster-recovery/health-check.sh