Velero Backup Operator
Velero provides cluster-level backup and restore for Kubernetes resources and persistent volumes. MATIH uses Velero for daily backups of all Kubernetes resources, including Deployments, Services, ConfigMaps, Secrets, CRDs, and PersistentVolumeClaims.
Installation
# Install Velero CLI
brew install velero
# Install Velero server with Azure plugin
velero install \
--provider azure \
--plugins velero/velero-plugin-for-microsoft-azure:v1.9.0 \
--bucket matih-velero-backups \
--backup-location-config resourceGroup=matih-rg,storageAccount=matihvelero \
--secret-file ./credentials-velero \
--use-node-agentBackup Schedule
apiVersion: velero.io/v1
kind: Schedule
metadata:
name: matih-daily-backup
namespace: velero
spec:
schedule: "0 2 * * *" # Daily at 2 AM UTC
template:
includedNamespaces:
- matih-control-plane
- matih-data-plane
- matih-monitoring
includedResources:
- deployments
- services
- configmaps
- secrets
- persistentvolumeclaims
- serviceaccounts
- ingresses
- certificates
- servicemonitors
ttl: 720h # 30 days retention
snapshotVolumes: trueBackup Types
| Type | Description | When to Use |
|---|---|---|
| Full cluster | All namespaces and resources | Daily scheduled backup |
| Namespace | Specific namespace only | Before risky changes |
| Resource | Specific resource types | Targeted backup |
| Volume snapshot | PersistentVolume snapshots | Database volumes |
Manual Backup
Create a manual backup before risky operations:
velero backup create pre-upgrade-backup \
--include-namespaces matih-control-plane,matih-data-plane \
--snapshot-volumes \
--waitRestore Procedures
Full Cluster Restore
velero restore create --from-backup matih-daily-backup-20250615 \
--include-namespaces matih-control-plane,matih-data-plane \
--waitNamespace Restore
velero restore create --from-backup matih-daily-backup-20250615 \
--include-namespaces matih-data-plane \
--waitSelective Resource Restore
velero restore create --from-backup matih-daily-backup-20250615 \
--include-resources configmaps,secrets \
--include-namespaces matih-monitoring \
--waitBackup Storage
| Provider | Storage | Encryption |
|---|---|---|
| Azure | Blob Storage | SSE with customer-managed keys |
| AWS | S3 | SSE-KMS |
| GCP | Cloud Storage | CMEK |
Monitoring
| Metric | Alert Condition | Description |
|---|---|---|
| Backup age | Over 48 hours | Last successful backup is too old |
| Backup size | Unexpected change | May indicate data loss or corruption |
| Backup failures | Any failure | Backup job did not complete |
Verification
Regularly test restores to verify backup integrity:
- Create a test namespace
- Restore a backup into the test namespace
- Verify resource counts and configurations
- Clean up the test namespace