10-Phase Provisioning Flow
The data plane provisioning flow consists of 10 distinct phases, each responsible for a specific aspect of tenant infrastructure setup. The phases execute sequentially, with each phase depending on the successful completion of the previous one. This section provides a detailed walkthrough of each phase, its inputs, outputs, failure modes, and rollback behavior.
Phase Overview
| # | Phase | Duration | Description |
|---|---|---|---|
| 1 | CREATE_NAMESPACE | 5-30s | Kubernetes namespace creation with labels and annotations |
| 2 | SETUP_DATABASE | 30-120s | Tenant database schema and initial data |
| 3 | DEPLOY_CORE_SERVICES | 60-300s | Essential services (query engine, AI service, catalog) |
| 4 | CONFIGURE_NETWORKING | 10-30s | Network policies for tenant isolation |
| 5 | DEPLOY_DATA_SERVICES | 60-300s | Data pipeline, semantic layer, and BI services |
| 6 | DEPLOY_INGRESS_CONTROLLER | 30-120s | Per-tenant NGINX ingress controller |
| 7 | CREATE_DNS_ZONE | 10-60s | Azure DNS child zone and NS delegation |
| 8 | CREATE_TENANT_INGRESS | 10-30s | Kubernetes Ingress resources and TLS certificates |
| 9 | DEPLOY_MONITORING | 30-60s | Prometheus, Grafana, and alerting rules |
| 10 | SETUP_OBSERVABILITY | 10-30s | Log aggregation, distributed tracing configuration |
Phase 1: CREATE_NAMESPACE
The first phase creates the Kubernetes namespace that will contain all tenant resources.
Operations
- Create namespace with the naming convention
matih-{tenant-slug} - Apply standard labels for resource identification
- Apply annotations for cost attribution and monitoring
- Create default ServiceAccount
- Apply RBAC (Role and RoleBinding) restricting access to the namespace
Namespace Labels
metadata:
name: matih-acme
labels:
matih.ai/tenant-id: "550e8400-e29b-41d4-a716-446655440000"
matih.ai/tenant-slug: "acme"
matih.ai/tier: "professional"
matih.ai/region: "eastus"
matih.ai/managed-by: "tenant-service"
annotations:
matih.ai/provisioned-at: "2026-02-12T10:00:00Z"
matih.ai/provisioned-by: "admin@matih.ai"Resource Quotas (Applied for Shared Tiers)
| Tier | CPU Request | CPU Limit | Memory Request | Memory Limit | Pods |
|---|---|---|---|---|---|
| Free | 2 | 4 | 4Gi | 8Gi | 20 |
| Professional | 8 | 16 | 16Gi | 32Gi | 50 |
| Enterprise | Unlimited | Unlimited | Unlimited | Unlimited | Unlimited |
Rollback
Delete the namespace and all resources within it.
Phase 2: SETUP_DATABASE
Provisions the tenant's database schema and loads initial reference data.
Operations
- Create a tenant-specific database schema (or dedicated database for enterprise tier)
- Run Flyway migrations to create all required tables
- Seed initial reference data (default dashboard templates, system roles, configuration)
- Create database credentials and store in Kubernetes secret
- Verify database connectivity from within the namespace
Database Isolation Strategy
| Tier | Strategy | Description |
|---|---|---|
| Free | Schema-per-tenant | Shared PostgreSQL instance, separate schemas |
| Professional | Schema-per-tenant | Shared instance, dedicated connection pool |
| Enterprise | Database-per-tenant | Dedicated PostgreSQL instance via Terraform |
Kubernetes Secret Created
apiVersion: v1
kind: Secret
metadata:
name: tenant-database-credentials
namespace: matih-acme
type: Opaque
data:
POSTGRES_HOST: <base64>
POSTGRES_PORT: <base64>
POSTGRES_DB: <base64>
POSTGRES_USER: <base64>
POSTGRES_PASSWORD: <base64>
DATABASE_URL: <base64>Rollback
Drop the tenant schema (or database) and delete the Kubernetes secret.
Phase 3: DEPLOY_CORE_SERVICES
Deploys the essential services that form the minimum viable data plane.
Services Deployed
| Service | Helm Chart | Description |
|---|---|---|
| query-engine | matih/query-engine | SQL query execution and optimization |
| ai-service | matih/ai-service | Conversational AI and text-to-SQL |
| catalog-service | matih/catalog-service | Data catalog and metadata management |
Helm Release Configuration
Each service is deployed as a Helm release with tenant-specific values:
# Example: ai-service tenant values
replicaCount: 1
image:
repository: matih.azurecr.io/ai-service
tag: "latest"
env:
- name: TENANT_ID
value: "550e8400-e29b-41d4-a716-446655440000"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: tenant-database-credentials
key: DATABASE_URL
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: 1000m
memory: 1GiHealth Verification
After deployment, the orchestrator waits for all pods to reach Running state and pass readiness probes before proceeding.
Rollback
Uninstall Helm releases in reverse order.
Phase 4: CONFIGURE_NETWORKING
Applies Kubernetes NetworkPolicies to enforce tenant isolation at the network level.
Network Policies
# Default deny all ingress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
namespace: matih-acme
spec:
podSelector: {}
policyTypes:
- Ingress
# Allow ingress from same namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-same-namespace
namespace: matih-acme
spec:
podSelector: {}
ingress:
- from:
- podSelector: {}
# Allow ingress from ingress controller
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-ingress-controller
namespace: matih-acme
spec:
podSelector: {}
ingress:
- from:
- namespaceSelector:
matchLabels:
matih.ai/tenant-slug: "acme"
podSelector:
matchLabels:
app.kubernetes.io/name: ingress-nginxIsolation Guarantees
| Traffic Type | Policy |
|---|---|
| Tenant-to-tenant | Blocked (default deny) |
| Within tenant namespace | Allowed |
| From ingress controller | Allowed |
| To control plane services | Allowed (egress to control plane namespace) |
| To external databases | Allowed (egress to database endpoints) |
| To internet | Blocked (except for connector services) |
Rollback
Delete all NetworkPolicy resources from the namespace.
Phase 5: DEPLOY_DATA_SERVICES
Deploys the remaining data plane services that provide the full analytics platform.
Services Deployed
| Service | Helm Chart | Description |
|---|---|---|
| pipeline-service | matih/pipeline-service | Data pipeline orchestration |
| semantic-layer | matih/semantic-layer | Business metric definitions |
| bi-service | matih/bi-service | Business intelligence and dashboards |
| data-quality-service | matih/data-quality-service | Data quality monitoring |
| ml-service | matih/ml-service | Machine learning operations |
| render-service | matih/render-service | Chart and visualization rendering |
Deployment Order
Services are deployed in dependency order:
- Pipeline service (no dependencies beyond database)
- Semantic layer (depends on catalog service from Phase 3)
- BI service (depends on semantic layer and query engine)
- Data quality service (depends on catalog service)
- ML service (depends on query engine)
- Render service (depends on BI service)
Rollback
Uninstall Helm releases in reverse dependency order.
Phase 6: DEPLOY_INGRESS_CONTROLLER
Deploys a dedicated NGINX ingress controller in the tenant's namespace. This gives each tenant their own LoadBalancer IP address.
Operations
- Add the ingress-nginx Helm chart repository
- Install the ingress-nginx chart with tenant-specific values
- Configure a unique IngressClass for the tenant
- Wait for the LoadBalancer IP to be assigned
Helm Values
controller:
ingressClassResource:
name: nginx-acme
controllerValue: k8s.io/ingress-nginx-acme
ingressClass: nginx-acme
replicaCount: 2
service:
type: LoadBalancer
annotations:
service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: /healthz
admissionWebhooks:
enabled: false
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256MiLoadBalancer IP Wait
The orchestrator polls the service until an external IP is assigned:
Attempt 1/60: No external IP yet...
Attempt 2/60: No external IP yet...
Attempt 3/60: External IP assigned: 20.85.123.45Maximum wait time: 600 seconds. If no IP is assigned within this window, the phase fails and triggers retry.
Rollback
Uninstall the ingress-nginx Helm release. The LoadBalancer and its IP are automatically released.
Phase 7: CREATE_DNS_ZONE
Creates an Azure DNS child zone for the tenant and configures NS delegation from the parent zone.
Operations
- Create child DNS zone (e.g.,
acme.matih.ai) - Retrieve nameservers from the new zone
- Create NS delegation records in the parent zone (
matih.ai) - Create A records pointing to the tenant's LoadBalancer IP
- Create wildcard A record (
*.acme.matih.ai) for service subdomains
DNS Record Structure
matih.ai (parent zone)
|
+-- NS acme -> ns1-01.azure-dns.com, ns2-01.azure-dns.net, ...
acme.matih.ai (child zone)
|
+-- A @ -> 20.85.123.45
+-- A * -> 20.85.123.45
+-- A api -> 20.85.123.45
+-- A bi -> 20.85.123.45See DNS Zone Management for detailed DNS architecture.
Rollback
Delete the child DNS zone and NS delegation records from the parent zone.
Phase 8: CREATE_TENANT_INGRESS
Creates Kubernetes Ingress resources and TLS certificates for the tenant's services.
Operations
- Create cert-manager Certificate resource for the tenant domain
- Wait for certificate issuance (Let's Encrypt DNS-01 challenge)
- Create Ingress resource with TLS termination and routing rules
Ingress Resource
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: tenant-ingress
namespace: matih-acme
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod-dns01
spec:
ingressClassName: nginx-acme
tls:
- hosts:
- acme.matih.ai
- "*.acme.matih.ai"
secretName: acme-tls-certificate
rules:
- host: acme.matih.ai
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: api-gateway
port:
number: 8080
- host: bi.acme.matih.ai
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: bi-workbench
port:
number: 3000Rollback
Delete the Ingress resource and Certificate. The TLS secret is cleaned up by cert-manager.
Phase 9: DEPLOY_MONITORING
Deploys tenant-specific monitoring and alerting infrastructure.
Operations
- Deploy Prometheus instance with tenant-scoped service discovery
- Configure Grafana dashboards for tenant metrics
- Set up default alert rules (high CPU, memory, error rate)
- Configure alert notification channels (email, Slack)
Default Alert Rules
| Alert | Condition | Severity |
|---|---|---|
| High CPU | Pod CPU > 80% for 5 min | Warning |
| High Memory | Pod memory > 85% for 5 min | Warning |
| Pod Restart | Container restart count > 3 in 15 min | Critical |
| Error Rate | HTTP 5xx rate > 5% for 5 min | Critical |
| Latency | P95 latency > 2s for 5 min | Warning |
| Disk Usage | PVC usage > 85% | Warning |
Rollback
Uninstall monitoring Helm releases and delete custom resources.
Phase 10: SETUP_OBSERVABILITY
Configures log aggregation and distributed tracing for the tenant.
Operations
- Configure Loki log collection for the tenant namespace
- Set up OpenTelemetry collector with tenant context propagation
- Configure Tempo trace storage with tenant-scoped retention
- Verify log and trace pipeline connectivity
Observability Stack Per Tenant
| Component | Scope | Configuration |
|---|---|---|
| Loki | Log aggregation | Label: namespace=matih-acme |
| Tempo | Distributed tracing | Attribute: tenant_id=acme |
| OpenTelemetry Collector | Telemetry pipeline | Processor: tenant context injection |
| Grafana | Visualization | Data source: tenant-scoped Loki/Tempo |
Rollback
Remove observability configuration. Historical data is retained for the configured retention period.
Phase Transition Diagram
Phase 1 Phase 2 Phase 3 Phase 4 Phase 5
CREATE SETUP DEPLOY CONFIGURE DEPLOY
NAMESPACE --> DATABASE --> CORE --> NETWORKING --> DATA
SERVICES SERVICES
| | | | |
v v v v v
Phase 6 Phase 7 Phase 8 Phase 9 Phase 10
DEPLOY CREATE CREATE DEPLOY SETUP
INGRESS --> DNS --> TENANT --> MONITORING --> OBSERVABILITY
CONTROLLER ZONE INGRESSProvisioning Completion
When all 10 phases complete successfully, the ProvisioningOrchestrator performs the following finalization steps:
- Update tenant status from
PROVISIONINGtoACTIVE - Set
provisioningCompletedAttimestamp - Clear any
provisioningErrorfrom previous attempts - Send completion notification to tenant admin
- Publish
tenant.provisionedevent to Kafka - Log total provisioning duration
Typical Provisioning Durations
| Tier | Total Duration | Bottleneck Phase |
|---|---|---|
| Free | 3-5 minutes | DEPLOY_CORE_SERVICES |
| Professional | 5-10 minutes | DEPLOY_DATA_SERVICES |
| Enterprise | 15-45 minutes | PROVISIONING_INFRASTRUCTURE (Terraform) |
Monitoring Provisioning Progress
The provisioning progress is available through multiple channels:
| Channel | Endpoint/Mechanism | Update Frequency |
|---|---|---|
| REST API | GET /api/v1/tenants/{id}/provisioning/status | On-demand polling |
| WebSocket | ws://tenant-service/ws/provisioning/{id} | Real-time updates |
| Admin Dashboard | Control Plane UI provisioning view | Real-time via WebSocket |
| Notifications | Email on completion/failure | On state change |
| Audit Log | provisioning_audit_logs table | On every phase transition |
Next Steps
- DNS Zone Management -- deep dive into Phase 7
- Per-Tenant Ingress -- deep dive into Phases 6 and 8
- API Reference -- provisioning management endpoints