Certificate Renewal
This runbook covers TLS certificate renewal procedures for the MATIH platform, including automated cert-manager renewals, manual certificate rotation, and troubleshooting certificate issues.
Symptoms
CertificateExpiringSoonalert firing- Browser TLS warnings when accessing the platform
- Services failing TLS handshake
- cert-manager Certificate resource showing
FalseReady condition
Impact
Expired certificates cause:
- HTTPS access failures for users
- Service-to-service communication failures
- Webhook failures for admission controllers
Automated Renewal (cert-manager)
MATIH uses cert-manager for automated TLS certificate management. Certificates are automatically renewed 30 days before expiry.
Check Certificate Status
Review cert-manager Certificate resources for the affected namespace. The Certificate resources will show their current status, expiry date, and any renewal issues.
Verify ClusterIssuer
MATIH uses DNS01 challenge via Azure DNS for certificate issuance:
| Issuer | Environment | Description |
|---|---|---|
letsencrypt-staging-dns01 | Development | Let's Encrypt staging (for testing) |
letsencrypt-prod-dns01 | Production | Let's Encrypt production |
Common Renewal Failures
| Issue | Cause | Resolution |
|---|---|---|
| DNS challenge failed | Azure DNS credentials expired | Rotate workload identity credentials |
| Rate limit exceeded | Too many certificate requests | Wait for rate limit reset (1 hour) |
| Domain validation failed | DNS zone delegation incorrect | Verify NS records in parent zone |
| Issuer not ready | cert-manager pod unhealthy | Restart cert-manager |
Manual Certificate Rotation
For certificates not managed by cert-manager:
1. Generate New Certificate
Follow your organization's certificate request process to obtain a new certificate and key.
2. Update the Kubernetes Secret
The new certificate and key must be stored in the appropriate Kubernetes secret. Use the platform's secret management scripts rather than manual creation.
3. Restart Affected Services
Services that mount TLS secrets may need a restart to pick up the new certificate:
./scripts/tools/service-build-deploy.sh <service-name>Verification
After certificate renewal:
- Verify the certificate is valid and not expired
- Test HTTPS access to the affected endpoints
- Check that services are successfully completing TLS handshakes
- Verify the alert has resolved
Prevention
- Ensure cert-manager is running and healthy at all times
- Set up
CertificateExpiringSoonalerts with a 14-day threshold - Monitor cert-manager logs for renewal failures
- Regularly validate that DNS01 challenge credentials are current