DNS Zone Management
The MATIH platform provides each tenant with a dedicated DNS subdomain, enabling tenant-branded URLs for all data plane services. DNS management is handled by the AzureDnsService, which creates child DNS zones, configures NS delegation, and manages A records pointing to tenant-specific LoadBalancer IP addresses.
DNS Architecture
matih.ai (Platform DNS Zone)
|
| Terraform-managed
| (infrastructure/terraform/modules/azure/networking)
|
+----------+----------+
| | |
v v v
acme.matih.ai beta.matih.ai gamma.matih.ai
(Tenant zone) (Tenant zone) (Tenant zone)
|
+-- A @ -> 20.85.123.45
+-- A * -> 20.85.123.45
+-- A api -> 20.85.123.45
+-- A bi -> 20.85.123.45DNS Hierarchy
| Level | Zone | Managed By | Example |
|---|---|---|---|
| Platform | matih.ai | Terraform | Root platform domain |
| Tenant | {slug}.matih.ai | AzureDnsService | acme.matih.ai |
| Service | {service}.{slug}.matih.ai | Kubernetes Ingress | bi.acme.matih.ai |
Key Components
| Component | Location | Description |
|---|---|---|
AzureDnsService | control-plane/tenant-service/.../service/AzureDnsService.java | Core DNS operations |
CustomDomainService | control-plane/tenant-service/.../service/CustomDomainService.java | Custom domain management |
| Platform DNS zone | infrastructure/terraform/modules/azure/networking/main.tf | Terraform-managed root zone |
| cert-manager ClusterIssuer | infrastructure/k8s/cert-manager/cluster-issuer-dns01*.yaml | DNS-01 challenge configuration |
Tenant DNS Zone Creation
The AzureDnsService.createTenantDnsZone() method creates a child DNS zone for each tenant:
public DnsZone createTenantDnsZone(String tenantSlug, String parentDomain,
String resourceGroup) {
String zoneName = tenantSlug + "." + parentDomain;
// Idempotent: return existing zone if already created
DnsZone existing = azureResourceManager.dnsZones()
.getByResourceGroup(resourceGroup, zoneName);
if (existing != null) {
return existing;
}
DnsZone zone = azureResourceManager.dnsZones()
.define(zoneName)
.withExistingResourceGroup(resourceGroup)
.create();
return zone;
}Idempotency
The DNS zone creation is idempotent. If the zone already exists (from a previous attempt or retry), it is returned without modification. This is critical for the retry-based error recovery of the provisioning orchestrator.
NS Delegation
After creating the child zone, the service configures NS delegation records in the parent zone. This enables DNS resolution for the tenant subdomain.
Delegation Flow
1. Create child zone: acme.matih.ai
-> Azure assigns nameservers: ns1-01.azure-dns.com, ns2-01.azure-dns.net, ...
2. Create NS records in parent zone: matih.ai
-> NS record: acme -> ns1-01.azure-dns.com
-> NS record: acme -> ns2-01.azure-dns.net
-> NS record: acme -> ns3-01.azure-dns.org
-> NS record: acme -> ns4-01.azure-dns.info
3. DNS resolution path:
Client -> matih.ai NS -> acme.matih.ai NS -> A record -> LoadBalancer IPNS Delegation Code
public void createNsDelegation(String parentDomain, String childName,
List<String> nameservers, String resourceGroup) {
DnsZone parentZone = azureResourceManager.dnsZones()
.getByResourceGroup(resourceGroup, parentDomain);
if (parentZone == null) {
throw new IllegalStateException(
"Parent DNS zone not found: " + parentDomain);
}
// Create or update NS record set
parentZone.update()
.defineNSRecordSet(childName)
.withNameServers(nameservers)
.attach()
.apply();
}A Record Management
Once the LoadBalancer IP is assigned (from Phase 6: DEPLOY_INGRESS_CONTROLLER), A records are created in the tenant zone.
Records Created
| Record | Type | Value | Purpose |
|---|---|---|---|
@ | A | 20.85.123.45 | Root domain (acme.matih.ai) |
* | A | 20.85.123.45 | Wildcard for all subdomains |
api | A | 20.85.123.45 | API gateway endpoint |
bi | A | 20.85.123.45 | BI workbench |
ml | A | 20.85.123.45 | ML workbench |
data | A | 20.85.123.45 | Data workbench |
The wildcard record ensures that any subdomain resolves to the tenant's LoadBalancer, allowing service-specific routing through the Ingress controller.
Custom Domain Support
Enterprise tier tenants can configure custom domains (e.g., analytics.acme.com) that point to their MATIH data plane:
Custom Domain Flow
1. Tenant registers custom domain: analytics.acme.com
2. AzureDnsService.createCustomDnsZone("analytics.acme.com", resourceGroup)
-> Creates DNS zone in Azure
3. MATIH returns nameservers to tenant
-> ns1-01.azure-dns.com, ns2-01.azure-dns.net, ...
4. Tenant updates NS records at their registrar
-> analytics.acme.com NS -> ns1-01.azure-dns.com, ...
5. MATIH creates A records in the custom zone
-> A @ -> 20.85.123.45
6. cert-manager provisions TLS certificate via DNS-01 challenge
7. Ingress resource updated with custom domain host ruleCustom Domain Verification
Before activating a custom domain, the service verifies DNS propagation:
GET /api/v1/tenants/{tenantId}/domains/{domain}/verify{
"domain": "analytics.acme.com",
"status": "VERIFIED",
"nsRecordsCorrect": true,
"aRecordResolved": true,
"tlsCertificateReady": true
}Dev vs. Production DNS
| Aspect | Development | Production |
|---|---|---|
| Platform domain | matih-dev.example.com or nip.io | matih.ai |
| TLS issuer | letsencrypt-staging-dns01 | letsencrypt-prod-dns01 |
| Child zones | Disabled by default | Created per tenant |
| A records | Point to dev cluster IP | Point to tenant LoadBalancer |
| Custom domains | Not supported | Supported for Enterprise tier |
In development environments, DNS zones are typically disabled and services are accessed via port forwarding or nip.io wildcard DNS.
TLS Certificate Management
TLS certificates are provisioned by cert-manager using the DNS-01 challenge type, which proves domain ownership by creating a TXT record in the DNS zone.
Certificate Resource
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: acme-tls
namespace: matih-acme
spec:
secretName: acme-tls-certificate
issuerRef:
name: letsencrypt-prod-dns01
kind: ClusterIssuer
dnsNames:
- acme.matih.ai
- "*.acme.matih.ai"Certificate Lifecycle
| Event | Action |
|---|---|
| Certificate created | cert-manager creates DNS-01 challenge TXT record |
| Challenge verified | Let's Encrypt issues certificate |
| Certificate stored | Saved as Kubernetes Secret in tenant namespace |
| 30 days before expiry | cert-manager auto-renews |
| Tenant deleted | Certificate and secret deleted |
DNS Cleanup on Tenant Deletion
When a tenant is deprovisioned, the DNS resources are cleaned up in reverse order:
- Delete Ingress resources (removes certificate dependency)
- Delete cert-manager Certificate
- Delete A records from tenant zone
- Delete NS delegation records from parent zone
- Delete tenant DNS zone
The cleanup is performed by the TenantDeletionService as part of the tenant decommissioning workflow.
Troubleshooting DNS Issues
Common DNS issues and their resolution:
| Symptom | Cause | Resolution |
|---|---|---|
NXDOMAIN for tenant domain | NS delegation missing | Check NS records in parent zone |
| Certificate pending | DNS-01 challenge failing | Verify TXT record propagation |
| Slow resolution | DNS propagation delay | Wait up to 48 hours for global propagation |
| Custom domain not resolving | NS records not updated at registrar | Verify registrar NS configuration |
| Wildcard not working | Missing wildcard A record | Check * record in tenant zone |
Validation Script
The platform provides a validation script for DNS troubleshooting:
./scripts/tools/validate-tenant-ingress.sh --tenant acmeThis script checks:
- DNS zone existence
- NS delegation correctness
- A record resolution
- TLS certificate status
- Ingress controller health
Next Steps
- Per-Tenant Ingress -- how ingress routes traffic to tenant services
- Provisioning Phases -- DNS in the context of the full provisioning flow
- API Reference -- DNS management endpoints