cert-manager
cert-manager automates TLS certificate provisioning and renewal for the MATIH platform. It integrates with Let's Encrypt for publicly trusted certificates and uses DNS-01 challenges via Azure DNS for validation, supporting both staging and production certificate issuers.
cert-manager Architecture
Certificate Resource --> cert-manager Controller --> ACME Challenge
|
DNS-01 (Azure DNS)
|
Let's Encrypt CA
|
TLS Secret Created
|
Ingress / ServiceClusterIssuers
The platform defines two ClusterIssuers for staging and production:
Staging Issuer
Located at infrastructure/k8s/cert-manager/cluster-issuer-dns01-staging.yaml:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging-dns01
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: platform-team@example.com
privateKeySecretRef:
name: letsencrypt-staging-dns01
solvers:
- dns01:
azureDNS:
subscriptionID: AZURE_SUBSCRIPTION_ID
resourceGroupName: matih-dns-rg
hostedZoneName: matih-dev.example.com
managedIdentity:
clientID: MANAGED_IDENTITY_CLIENT_IDProduction Issuer
Located at infrastructure/k8s/cert-manager/cluster-issuer-dns01.yaml:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod-dns01
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: platform-team@example.com
privateKeySecretRef:
name: letsencrypt-prod-dns01
solvers:
- dns01:
azureDNS:
subscriptionID: AZURE_SUBSCRIPTION_ID
resourceGroupName: matih-dns-rg
hostedZoneName: matih.ai
managedIdentity:
clientID: MANAGED_IDENTITY_CLIENT_IDCertificate Resources
Certificates are requested by creating Certificate resources or annotating Ingress resources:
Explicit Certificate
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: acme-matih-ai-tls
namespace: tenant-acme
spec:
secretName: acme-matih-ai-tls
issuerRef:
name: letsencrypt-prod-dns01
kind: ClusterIssuer
dnsNames:
- acme.matih.ai
- "*.acme.matih.ai"Ingress Annotation
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod-dns01
spec:
tls:
- hosts:
- acme.matih.ai
secretName: acme-matih-ai-tlsDNS-01 Challenge
The DNS-01 challenge method is used because:
| Advantage | Description |
|---|---|
| Wildcard support | Can issue wildcard certificates |
| No ingress required | Works without public HTTP endpoints |
| Azure DNS integration | Uses managed identity for secure access |
Certificate Lifecycle
| Event | Timing | Action |
|---|---|---|
| Initial request | On Certificate creation | ACME challenge + issuance |
| Renewal | 30 days before expiry | Automatic re-issuance |
| Rotation | On renewal | Secret updated, Ingress reloads |
| Failure | Challenge fails | cert-manager retries with backoff |
Dev vs. Production
| Aspect | Development | Production |
|---|---|---|
| Issuer | letsencrypt-staging-dns01 | letsencrypt-prod-dns01 |
| Domain | matih-dev.example.com | matih.ai |
| Trust | Not publicly trusted (staging CA) | Publicly trusted |
| Rate limits | Generous | 50 certs per domain per week |
Monitoring
| Metric | Description |
|---|---|
certmanager_certificate_ready_status | Certificate readiness (1 = ready) |
certmanager_certificate_expiration_timestamp_seconds | Expiration time |
certmanager_certificate_renewal_timestamp_seconds | Next renewal time |
Alerts should be configured for:
- Certificates expiring within 14 days
- Certificate renewal failures
- ACME challenge failures
Troubleshooting
| Issue | Symptom | Resolution |
|---|---|---|
| Certificate not ready | Ready: False status | Check Challenge and Order resources |
| DNS-01 challenge failed | Waiting for DNS propagation | Verify Azure DNS permissions |
| Rate limited | too many certificates error | Use staging issuer or wait |
| Secret not created | Ingress has no TLS | Check cert-manager logs for errors |