MATIH Platform is in active MVP development. Documentation reflects current implementation status.
19. Observability & Operations
Certificate Renewal

Certificate Renewal

This runbook covers TLS certificate renewal procedures for the MATIH platform, including automated cert-manager renewals, manual certificate rotation, and troubleshooting certificate issues.


Symptoms

  • CertificateExpiringSoon alert firing
  • Browser TLS warnings when accessing the platform
  • Services failing TLS handshake
  • cert-manager Certificate resource showing False Ready condition

Impact

Expired certificates cause:

  • HTTPS access failures for users
  • Service-to-service communication failures
  • Webhook failures for admission controllers

Automated Renewal (cert-manager)

MATIH uses cert-manager for automated TLS certificate management. Certificates are automatically renewed 30 days before expiry.

Check Certificate Status

Review cert-manager Certificate resources for the affected namespace. The Certificate resources will show their current status, expiry date, and any renewal issues.

Verify ClusterIssuer

MATIH uses DNS01 challenge via Azure DNS for certificate issuance:

IssuerEnvironmentDescription
letsencrypt-staging-dns01DevelopmentLet's Encrypt staging (for testing)
letsencrypt-prod-dns01ProductionLet's Encrypt production

Common Renewal Failures

IssueCauseResolution
DNS challenge failedAzure DNS credentials expiredRotate workload identity credentials
Rate limit exceededToo many certificate requestsWait for rate limit reset (1 hour)
Domain validation failedDNS zone delegation incorrectVerify NS records in parent zone
Issuer not readycert-manager pod unhealthyRestart cert-manager

Manual Certificate Rotation

For certificates not managed by cert-manager:

1. Generate New Certificate

Follow your organization's certificate request process to obtain a new certificate and key.

2. Update the Kubernetes Secret

The new certificate and key must be stored in the appropriate Kubernetes secret. Use the platform's secret management scripts rather than manual creation.

3. Restart Affected Services

Services that mount TLS secrets may need a restart to pick up the new certificate:

./scripts/tools/service-build-deploy.sh <service-name>

Verification

After certificate renewal:

  1. Verify the certificate is valid and not expired
  2. Test HTTPS access to the affected endpoints
  3. Check that services are successfully completing TLS handshakes
  4. Verify the alert has resolved

Prevention

  • Ensure cert-manager is running and healthy at all times
  • Set up CertificateExpiringSoon alerts with a 14-day threshold
  • Monitor cert-manager logs for renewal failures
  • Regularly validate that DNS01 challenge credentials are current