10-Phase Provisioning Flow

The data plane provisioning flow consists of 10 distinct phases, each responsible for a specific aspect of tenant infrastructure setup. The phases execute sequentially, with each phase depending on the successful completion of the previous one. This section provides a detailed walkthrough of each phase, its inputs, outputs, failure modes, and rollback behavior.

Phase Overview

#	Phase	Duration	Description
1	CREATE_NAMESPACE	5-30s	Kubernetes namespace creation with labels and annotations
2	SETUP_DATABASE	30-120s	Tenant database schema and initial data
3	DEPLOY_CORE_SERVICES	60-300s	Essential services (query engine, AI service, catalog)
4	CONFIGURE_NETWORKING	10-30s	Network policies for tenant isolation
5	DEPLOY_DATA_SERVICES	60-300s	Data pipeline, semantic layer, and BI services
6	DEPLOY_INGRESS_CONTROLLER	30-120s	Per-tenant NGINX ingress controller
7	CREATE_DNS_ZONE	10-60s	Azure DNS child zone and NS delegation
8	CREATE_TENANT_INGRESS	10-30s	Kubernetes Ingress resources and TLS certificates
9	DEPLOY_MONITORING	30-60s	Prometheus, Grafana, and alerting rules
10	SETUP_OBSERVABILITY	10-30s	Log aggregation, distributed tracing configuration

Phase 1: CREATE_NAMESPACE

The first phase creates the Kubernetes namespace that will contain all tenant resources.

Operations

Create namespace with the naming convention matih-{tenant-slug}
Apply standard labels for resource identification
Apply annotations for cost attribution and monitoring
Create default ServiceAccount
Apply RBAC (Role and RoleBinding) restricting access to the namespace

Namespace Labels

metadata:
  name: matih-acme
  labels:
    matih.ai/tenant-id: "550e8400-e29b-41d4-a716-446655440000"
    matih.ai/tenant-slug: "acme"
    matih.ai/tier: "professional"
    matih.ai/region: "eastus"
    matih.ai/managed-by: "tenant-service"
  annotations:
    matih.ai/provisioned-at: "2026-02-12T10:00:00Z"
    matih.ai/provisioned-by: "admin@matih.ai"

Resource Quotas (Applied for Shared Tiers)

Tier	CPU Request	CPU Limit	Memory Request	Memory Limit	Pods
Free	2	4	4Gi	8Gi	20
Professional	8	16	16Gi	32Gi	50
Enterprise	Unlimited	Unlimited	Unlimited	Unlimited	Unlimited

Rollback

Delete the namespace and all resources within it.

Phase 2: SETUP_DATABASE

Provisions the tenant's database schema and loads initial reference data.

Operations

Create a tenant-specific database schema (or dedicated database for enterprise tier)
Run Flyway migrations to create all required tables
Seed initial reference data (default dashboard templates, system roles, configuration)
Create database credentials and store in Kubernetes secret
Verify database connectivity from within the namespace

Database Isolation Strategy

Tier	Strategy	Description
Free	Schema-per-tenant	Shared PostgreSQL instance, separate schemas
Professional	Schema-per-tenant	Shared instance, dedicated connection pool
Enterprise	Database-per-tenant	Dedicated PostgreSQL instance via Terraform

Kubernetes Secret Created

apiVersion: v1
kind: Secret
metadata:
  name: tenant-database-credentials
  namespace: matih-acme
type: Opaque
data:
  POSTGRES_HOST: <base64>
  POSTGRES_PORT: <base64>
  POSTGRES_DB: <base64>
  POSTGRES_USER: <base64>
  POSTGRES_PASSWORD: <base64>
  DATABASE_URL: <base64>

Rollback

Drop the tenant schema (or database) and delete the Kubernetes secret.

Phase 3: DEPLOY_CORE_SERVICES

Deploys the essential services that form the minimum viable data plane.

Services Deployed

Service	Helm Chart	Description
query-engine	`matih/query-engine`	SQL query execution and optimization
ai-service	`matih/ai-service`	Conversational AI and text-to-SQL
catalog-service	`matih/catalog-service`	Data catalog and metadata management

Helm Release Configuration

Each service is deployed as a Helm release with tenant-specific values:

# Example: ai-service tenant values
replicaCount: 1
image:
  repository: matih.azurecr.io/ai-service
  tag: "latest"
env:
  - name: TENANT_ID
    value: "550e8400-e29b-41d4-a716-446655440000"
  - name: DATABASE_URL
    valueFrom:
      secretKeyRef:
        name: tenant-database-credentials
        key: DATABASE_URL
resources:
  requests:
    cpu: 250m
    memory: 512Mi
  limits:
    cpu: 1000m
    memory: 1Gi

Health Verification

After deployment, the orchestrator waits for all pods to reach Running state and pass readiness probes before proceeding.

Rollback

Uninstall Helm releases in reverse order.

Phase 4: CONFIGURE_NETWORKING

Applies Kubernetes NetworkPolicies to enforce tenant isolation at the network level.

Network Policies

# Default deny all ingress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: matih-acme
spec:
  podSelector: {}
  policyTypes:
    - Ingress
 
# Allow ingress from same namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-same-namespace
  namespace: matih-acme
spec:
  podSelector: {}
  ingress:
    - from:
        - podSelector: {}
 
# Allow ingress from ingress controller
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-ingress-controller
  namespace: matih-acme
spec:
  podSelector: {}
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              matih.ai/tenant-slug: "acme"
          podSelector:
            matchLabels:
              app.kubernetes.io/name: ingress-nginx

Isolation Guarantees

Traffic Type	Policy
Tenant-to-tenant	Blocked (default deny)
Within tenant namespace	Allowed
From ingress controller	Allowed
To control plane services	Allowed (egress to control plane namespace)
To external databases	Allowed (egress to database endpoints)
To internet	Blocked (except for connector services)

Rollback

Delete all NetworkPolicy resources from the namespace.

Phase 5: DEPLOY_DATA_SERVICES

Deploys the remaining data plane services that provide the full analytics platform.

Services Deployed

Service	Helm Chart	Description
pipeline-service	`matih/pipeline-service`	Data pipeline orchestration
semantic-layer	`matih/semantic-layer`	Business metric definitions
bi-service	`matih/bi-service`	Business intelligence and dashboards
data-quality-service	`matih/data-quality-service`	Data quality monitoring
ml-service	`matih/ml-service`	Machine learning operations
render-service	`matih/render-service`	Chart and visualization rendering

Deployment Order

Services are deployed in dependency order:

Pipeline service (no dependencies beyond database)
Semantic layer (depends on catalog service from Phase 3)
BI service (depends on semantic layer and query engine)
Data quality service (depends on catalog service)
ML service (depends on query engine)
Render service (depends on BI service)

Rollback

Uninstall Helm releases in reverse dependency order.

Phase 6: DEPLOY_INGRESS_CONTROLLER

Deploys a dedicated NGINX ingress controller in the tenant's namespace. This gives each tenant their own LoadBalancer IP address.

Operations

Add the ingress-nginx Helm chart repository
Install the ingress-nginx chart with tenant-specific values
Configure a unique IngressClass for the tenant
Wait for the LoadBalancer IP to be assigned

Helm Values

controller:
  ingressClassResource:
    name: nginx-acme
    controllerValue: k8s.io/ingress-nginx-acme
  ingressClass: nginx-acme
  replicaCount: 2
  service:
    type: LoadBalancer
    annotations:
      service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: /healthz
  admissionWebhooks:
    enabled: false
  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      cpu: 500m
      memory: 256Mi

LoadBalancer IP Wait

The orchestrator polls the service until an external IP is assigned:

Attempt 1/60: No external IP yet...
Attempt 2/60: No external IP yet...
Attempt 3/60: External IP assigned: 20.85.123.45

Maximum wait time: 600 seconds. If no IP is assigned within this window, the phase fails and triggers retry.

Rollback

Uninstall the ingress-nginx Helm release. The LoadBalancer and its IP are automatically released.

Phase 7: CREATE_DNS_ZONE

Creates an Azure DNS child zone for the tenant and configures NS delegation from the parent zone.

Operations

Create child DNS zone (e.g., acme.matih.ai)
Retrieve nameservers from the new zone
Create NS delegation records in the parent zone (matih.ai)
Create A records pointing to the tenant's LoadBalancer IP
Create wildcard A record (*.acme.matih.ai) for service subdomains

DNS Record Structure

matih.ai (parent zone)
  |
  +-- NS acme -> ns1-01.azure-dns.com, ns2-01.azure-dns.net, ...

acme.matih.ai (child zone)
  |
  +-- A @ -> 20.85.123.45
  +-- A * -> 20.85.123.45
  +-- A api -> 20.85.123.45
  +-- A bi -> 20.85.123.45

See DNS Zone Management for detailed DNS architecture.

Rollback

Delete the child DNS zone and NS delegation records from the parent zone.

Phase 8: CREATE_TENANT_INGRESS

Creates Kubernetes Ingress resources and TLS certificates for the tenant's services.

Operations

Create cert-manager Certificate resource for the tenant domain
Wait for certificate issuance (Let's Encrypt DNS-01 challenge)
Create Ingress resource with TLS termination and routing rules

Ingress Resource

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: tenant-ingress
  namespace: matih-acme
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod-dns01
spec:
  ingressClassName: nginx-acme
  tls:
    - hosts:
        - acme.matih.ai
        - "*.acme.matih.ai"
      secretName: acme-tls-certificate
  rules:
    - host: acme.matih.ai
      http:
        paths:
          - path: /api
            pathType: Prefix
            backend:
              service:
                name: api-gateway
                port:
                  number: 8080
    - host: bi.acme.matih.ai
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: bi-workbench
                port:
                  number: 3000

Rollback

Delete the Ingress resource and Certificate. The TLS secret is cleaned up by cert-manager.

Phase 9: DEPLOY_MONITORING

Deploys tenant-specific monitoring and alerting infrastructure.

Operations

Deploy Prometheus instance with tenant-scoped service discovery
Configure Grafana dashboards for tenant metrics
Set up default alert rules (high CPU, memory, error rate)
Configure alert notification channels (email, Slack)

Default Alert Rules

Alert	Condition	Severity
High CPU	Pod CPU > 80% for 5 min	Warning
High Memory	Pod memory > 85% for 5 min	Warning
Pod Restart	Container restart count > 3 in 15 min	Critical
Error Rate	HTTP 5xx rate > 5% for 5 min	Critical
Latency	P95 latency > 2s for 5 min	Warning
Disk Usage	PVC usage > 85%	Warning

Rollback

Uninstall monitoring Helm releases and delete custom resources.

Phase 10: SETUP_OBSERVABILITY

Configures log aggregation and distributed tracing for the tenant.

Operations

Configure Loki log collection for the tenant namespace
Set up OpenTelemetry collector with tenant context propagation
Configure Tempo trace storage with tenant-scoped retention
Verify log and trace pipeline connectivity

Observability Stack Per Tenant

Component	Scope	Configuration
Loki	Log aggregation	Label: `namespace=matih-acme`
Tempo	Distributed tracing	Attribute: `tenant_id=acme`
OpenTelemetry Collector	Telemetry pipeline	Processor: tenant context injection
Grafana	Visualization	Data source: tenant-scoped Loki/Tempo

Rollback

Remove observability configuration. Historical data is retained for the configured retention period.

Phase Transition Diagram

Phase 1         Phase 2         Phase 3         Phase 4         Phase 5
CREATE          SETUP           DEPLOY          CONFIGURE       DEPLOY
NAMESPACE  -->  DATABASE   -->  CORE       -->  NETWORKING -->  DATA
                                SERVICES                        SERVICES
    |               |               |               |               |
    v               v               v               v               v
Phase 6         Phase 7         Phase 8         Phase 9         Phase 10
DEPLOY          CREATE          CREATE          DEPLOY          SETUP
INGRESS    -->  DNS        -->  TENANT     -->  MONITORING -->  OBSERVABILITY
CONTROLLER      ZONE            INGRESS

Provisioning Completion

When all 10 phases complete successfully, the ProvisioningOrchestrator performs the following finalization steps:

Update tenant status from PROVISIONING to ACTIVE
Set provisioningCompletedAt timestamp
Clear any provisioningError from previous attempts
Send completion notification to tenant admin
Publish tenant.provisioned event to Kafka
Log total provisioning duration

Typical Provisioning Durations

Tier	Total Duration	Bottleneck Phase
Free	3-5 minutes	DEPLOY_CORE_SERVICES
Professional	5-10 minutes	DEPLOY_DATA_SERVICES
Enterprise	15-45 minutes	PROVISIONING_INFRASTRUCTURE (Terraform)

Monitoring Provisioning Progress

The provisioning progress is available through multiple channels:

Channel	Endpoint/Mechanism	Update Frequency
REST API	`GET /api/v1/tenants/{id}/provisioning/status`	On-demand polling
WebSocket	`ws://tenant-service/ws/provisioning/{id}`	Real-time updates
Admin Dashboard	Control Plane UI provisioning view	Real-time via WebSocket
Notifications	Email on completion/failure	On state change
Audit Log	`provisioning_audit_logs` table	On every phase transition

Next Steps

DNS Zone Management -- deep dive into Phase 7
Per-Tenant Ingress -- deep dive into Phases 6 and 8
API Reference -- provisioning management endpoints

Ingress Two Tier Provisioning