MATIH Platform is in active MVP development. Documentation reflects current implementation status.
7. Tenant Lifecycle
DNS Management

DNS Zone Management

The MATIH platform provides each tenant with a dedicated DNS subdomain, enabling tenant-branded URLs for all data plane services. DNS management is handled by the AzureDnsService, which creates child DNS zones, configures NS delegation, and manages A records pointing to tenant-specific LoadBalancer IP addresses.


DNS Architecture

                     matih.ai (Platform DNS Zone)
                         |
                         |  Terraform-managed
                         |  (infrastructure/terraform/modules/azure/networking)
                         |
              +----------+----------+
              |          |          |
              v          v          v
         acme.matih.ai  beta.matih.ai  gamma.matih.ai
         (Tenant zone)  (Tenant zone)  (Tenant zone)
              |
              +-- A @ -> 20.85.123.45
              +-- A * -> 20.85.123.45
              +-- A api -> 20.85.123.45
              +-- A bi -> 20.85.123.45

DNS Hierarchy

LevelZoneManaged ByExample
Platformmatih.aiTerraformRoot platform domain
Tenant{slug}.matih.aiAzureDnsServiceacme.matih.ai
Service{service}.{slug}.matih.aiKubernetes Ingressbi.acme.matih.ai

Key Components

ComponentLocationDescription
AzureDnsServicecontrol-plane/tenant-service/.../service/AzureDnsService.javaCore DNS operations
CustomDomainServicecontrol-plane/tenant-service/.../service/CustomDomainService.javaCustom domain management
Platform DNS zoneinfrastructure/terraform/modules/azure/networking/main.tfTerraform-managed root zone
cert-manager ClusterIssuerinfrastructure/k8s/cert-manager/cluster-issuer-dns01*.yamlDNS-01 challenge configuration

Tenant DNS Zone Creation

The AzureDnsService.createTenantDnsZone() method creates a child DNS zone for each tenant:

public DnsZone createTenantDnsZone(String tenantSlug, String parentDomain,
                                    String resourceGroup) {
    String zoneName = tenantSlug + "." + parentDomain;
 
    // Idempotent: return existing zone if already created
    DnsZone existing = azureResourceManager.dnsZones()
            .getByResourceGroup(resourceGroup, zoneName);
    if (existing != null) {
        return existing;
    }
 
    DnsZone zone = azureResourceManager.dnsZones()
            .define(zoneName)
            .withExistingResourceGroup(resourceGroup)
            .create();
 
    return zone;
}

Idempotency

The DNS zone creation is idempotent. If the zone already exists (from a previous attempt or retry), it is returned without modification. This is critical for the retry-based error recovery of the provisioning orchestrator.


NS Delegation

After creating the child zone, the service configures NS delegation records in the parent zone. This enables DNS resolution for the tenant subdomain.

Delegation Flow

1. Create child zone: acme.matih.ai
   -> Azure assigns nameservers: ns1-01.azure-dns.com, ns2-01.azure-dns.net, ...

2. Create NS records in parent zone: matih.ai
   -> NS record: acme -> ns1-01.azure-dns.com
   -> NS record: acme -> ns2-01.azure-dns.net
   -> NS record: acme -> ns3-01.azure-dns.org
   -> NS record: acme -> ns4-01.azure-dns.info

3. DNS resolution path:
   Client -> matih.ai NS -> acme.matih.ai NS -> A record -> LoadBalancer IP

NS Delegation Code

public void createNsDelegation(String parentDomain, String childName,
                                List<String> nameservers, String resourceGroup) {
    DnsZone parentZone = azureResourceManager.dnsZones()
            .getByResourceGroup(resourceGroup, parentDomain);
 
    if (parentZone == null) {
        throw new IllegalStateException(
            "Parent DNS zone not found: " + parentDomain);
    }
 
    // Create or update NS record set
    parentZone.update()
            .defineNSRecordSet(childName)
            .withNameServers(nameservers)
            .attach()
            .apply();
}

A Record Management

Once the LoadBalancer IP is assigned (from Phase 6: DEPLOY_INGRESS_CONTROLLER), A records are created in the tenant zone.

Records Created

RecordTypeValuePurpose
@A20.85.123.45Root domain (acme.matih.ai)
*A20.85.123.45Wildcard for all subdomains
apiA20.85.123.45API gateway endpoint
biA20.85.123.45BI workbench
mlA20.85.123.45ML workbench
dataA20.85.123.45Data workbench

The wildcard record ensures that any subdomain resolves to the tenant's LoadBalancer, allowing service-specific routing through the Ingress controller.


Custom Domain Support

Enterprise tier tenants can configure custom domains (e.g., analytics.acme.com) that point to their MATIH data plane:

Custom Domain Flow

1. Tenant registers custom domain: analytics.acme.com

2. AzureDnsService.createCustomDnsZone("analytics.acme.com", resourceGroup)
   -> Creates DNS zone in Azure

3. MATIH returns nameservers to tenant
   -> ns1-01.azure-dns.com, ns2-01.azure-dns.net, ...

4. Tenant updates NS records at their registrar
   -> analytics.acme.com NS -> ns1-01.azure-dns.com, ...

5. MATIH creates A records in the custom zone
   -> A @ -> 20.85.123.45

6. cert-manager provisions TLS certificate via DNS-01 challenge

7. Ingress resource updated with custom domain host rule

Custom Domain Verification

Before activating a custom domain, the service verifies DNS propagation:

GET /api/v1/tenants/{tenantId}/domains/{domain}/verify
{
  "domain": "analytics.acme.com",
  "status": "VERIFIED",
  "nsRecordsCorrect": true,
  "aRecordResolved": true,
  "tlsCertificateReady": true
}

Dev vs. Production DNS

AspectDevelopmentProduction
Platform domainmatih-dev.example.com or nip.iomatih.ai
TLS issuerletsencrypt-staging-dns01letsencrypt-prod-dns01
Child zonesDisabled by defaultCreated per tenant
A recordsPoint to dev cluster IPPoint to tenant LoadBalancer
Custom domainsNot supportedSupported for Enterprise tier

In development environments, DNS zones are typically disabled and services are accessed via port forwarding or nip.io wildcard DNS.


TLS Certificate Management

TLS certificates are provisioned by cert-manager using the DNS-01 challenge type, which proves domain ownership by creating a TXT record in the DNS zone.

Certificate Resource

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: acme-tls
  namespace: matih-acme
spec:
  secretName: acme-tls-certificate
  issuerRef:
    name: letsencrypt-prod-dns01
    kind: ClusterIssuer
  dnsNames:
    - acme.matih.ai
    - "*.acme.matih.ai"

Certificate Lifecycle

EventAction
Certificate createdcert-manager creates DNS-01 challenge TXT record
Challenge verifiedLet's Encrypt issues certificate
Certificate storedSaved as Kubernetes Secret in tenant namespace
30 days before expirycert-manager auto-renews
Tenant deletedCertificate and secret deleted

DNS Cleanup on Tenant Deletion

When a tenant is deprovisioned, the DNS resources are cleaned up in reverse order:

  1. Delete Ingress resources (removes certificate dependency)
  2. Delete cert-manager Certificate
  3. Delete A records from tenant zone
  4. Delete NS delegation records from parent zone
  5. Delete tenant DNS zone

The cleanup is performed by the TenantDeletionService as part of the tenant decommissioning workflow.


Troubleshooting DNS Issues

Common DNS issues and their resolution:

SymptomCauseResolution
NXDOMAIN for tenant domainNS delegation missingCheck NS records in parent zone
Certificate pendingDNS-01 challenge failingVerify TXT record propagation
Slow resolutionDNS propagation delayWait up to 48 hours for global propagation
Custom domain not resolvingNS records not updated at registrarVerify registrar NS configuration
Wildcard not workingMissing wildcard A recordCheck * record in tenant zone

Validation Script

The platform provides a validation script for DNS troubleshooting:

./scripts/tools/validate-tenant-ingress.sh --tenant acme

This script checks:

  • DNS zone existence
  • NS delegation correctness
  • A record resolution
  • TLS certificate status
  • Ingress controller health

Next Steps