MATIH Platform is in active MVP development. Documentation reflects current implementation status.
7. Tenant Lifecycle
Provisioning Phases

10-Phase Provisioning Flow

The data plane provisioning flow consists of 10 distinct phases, each responsible for a specific aspect of tenant infrastructure setup. The phases execute sequentially, with each phase depending on the successful completion of the previous one. This section provides a detailed walkthrough of each phase, its inputs, outputs, failure modes, and rollback behavior.


Phase Overview

#PhaseDurationDescription
1CREATE_NAMESPACE5-30sKubernetes namespace creation with labels and annotations
2SETUP_DATABASE30-120sTenant database schema and initial data
3DEPLOY_CORE_SERVICES60-300sEssential services (query engine, AI service, catalog)
4CONFIGURE_NETWORKING10-30sNetwork policies for tenant isolation
5DEPLOY_DATA_SERVICES60-300sData pipeline, semantic layer, and BI services
6DEPLOY_INGRESS_CONTROLLER30-120sPer-tenant NGINX ingress controller
7CREATE_DNS_ZONE10-60sAzure DNS child zone and NS delegation
8CREATE_TENANT_INGRESS10-30sKubernetes Ingress resources and TLS certificates
9DEPLOY_MONITORING30-60sPrometheus, Grafana, and alerting rules
10SETUP_OBSERVABILITY10-30sLog aggregation, distributed tracing configuration

Phase 1: CREATE_NAMESPACE

The first phase creates the Kubernetes namespace that will contain all tenant resources.

Operations

  1. Create namespace with the naming convention matih-{tenant-slug}
  2. Apply standard labels for resource identification
  3. Apply annotations for cost attribution and monitoring
  4. Create default ServiceAccount
  5. Apply RBAC (Role and RoleBinding) restricting access to the namespace

Namespace Labels

metadata:
  name: matih-acme
  labels:
    matih.ai/tenant-id: "550e8400-e29b-41d4-a716-446655440000"
    matih.ai/tenant-slug: "acme"
    matih.ai/tier: "professional"
    matih.ai/region: "eastus"
    matih.ai/managed-by: "tenant-service"
  annotations:
    matih.ai/provisioned-at: "2026-02-12T10:00:00Z"
    matih.ai/provisioned-by: "admin@matih.ai"

Resource Quotas (Applied for Shared Tiers)

TierCPU RequestCPU LimitMemory RequestMemory LimitPods
Free244Gi8Gi20
Professional81616Gi32Gi50
EnterpriseUnlimitedUnlimitedUnlimitedUnlimitedUnlimited

Rollback

Delete the namespace and all resources within it.


Phase 2: SETUP_DATABASE

Provisions the tenant's database schema and loads initial reference data.

Operations

  1. Create a tenant-specific database schema (or dedicated database for enterprise tier)
  2. Run Flyway migrations to create all required tables
  3. Seed initial reference data (default dashboard templates, system roles, configuration)
  4. Create database credentials and store in Kubernetes secret
  5. Verify database connectivity from within the namespace

Database Isolation Strategy

TierStrategyDescription
FreeSchema-per-tenantShared PostgreSQL instance, separate schemas
ProfessionalSchema-per-tenantShared instance, dedicated connection pool
EnterpriseDatabase-per-tenantDedicated PostgreSQL instance via Terraform

Kubernetes Secret Created

apiVersion: v1
kind: Secret
metadata:
  name: tenant-database-credentials
  namespace: matih-acme
type: Opaque
data:
  POSTGRES_HOST: <base64>
  POSTGRES_PORT: <base64>
  POSTGRES_DB: <base64>
  POSTGRES_USER: <base64>
  POSTGRES_PASSWORD: <base64>
  DATABASE_URL: <base64>

Rollback

Drop the tenant schema (or database) and delete the Kubernetes secret.


Phase 3: DEPLOY_CORE_SERVICES

Deploys the essential services that form the minimum viable data plane.

Services Deployed

ServiceHelm ChartDescription
query-enginematih/query-engineSQL query execution and optimization
ai-servicematih/ai-serviceConversational AI and text-to-SQL
catalog-servicematih/catalog-serviceData catalog and metadata management

Helm Release Configuration

Each service is deployed as a Helm release with tenant-specific values:

# Example: ai-service tenant values
replicaCount: 1
image:
  repository: matih.azurecr.io/ai-service
  tag: "latest"
env:
  - name: TENANT_ID
    value: "550e8400-e29b-41d4-a716-446655440000"
  - name: DATABASE_URL
    valueFrom:
      secretKeyRef:
        name: tenant-database-credentials
        key: DATABASE_URL
resources:
  requests:
    cpu: 250m
    memory: 512Mi
  limits:
    cpu: 1000m
    memory: 1Gi

Health Verification

After deployment, the orchestrator waits for all pods to reach Running state and pass readiness probes before proceeding.

Rollback

Uninstall Helm releases in reverse order.


Phase 4: CONFIGURE_NETWORKING

Applies Kubernetes NetworkPolicies to enforce tenant isolation at the network level.

Network Policies

# Default deny all ingress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: matih-acme
spec:
  podSelector: {}
  policyTypes:
    - Ingress
 
# Allow ingress from same namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-same-namespace
  namespace: matih-acme
spec:
  podSelector: {}
  ingress:
    - from:
        - podSelector: {}
 
# Allow ingress from ingress controller
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-ingress-controller
  namespace: matih-acme
spec:
  podSelector: {}
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              matih.ai/tenant-slug: "acme"
          podSelector:
            matchLabels:
              app.kubernetes.io/name: ingress-nginx

Isolation Guarantees

Traffic TypePolicy
Tenant-to-tenantBlocked (default deny)
Within tenant namespaceAllowed
From ingress controllerAllowed
To control plane servicesAllowed (egress to control plane namespace)
To external databasesAllowed (egress to database endpoints)
To internetBlocked (except for connector services)

Rollback

Delete all NetworkPolicy resources from the namespace.


Phase 5: DEPLOY_DATA_SERVICES

Deploys the remaining data plane services that provide the full analytics platform.

Services Deployed

ServiceHelm ChartDescription
pipeline-servicematih/pipeline-serviceData pipeline orchestration
semantic-layermatih/semantic-layerBusiness metric definitions
bi-servicematih/bi-serviceBusiness intelligence and dashboards
data-quality-servicematih/data-quality-serviceData quality monitoring
ml-servicematih/ml-serviceMachine learning operations
render-servicematih/render-serviceChart and visualization rendering

Deployment Order

Services are deployed in dependency order:

  1. Pipeline service (no dependencies beyond database)
  2. Semantic layer (depends on catalog service from Phase 3)
  3. BI service (depends on semantic layer and query engine)
  4. Data quality service (depends on catalog service)
  5. ML service (depends on query engine)
  6. Render service (depends on BI service)

Rollback

Uninstall Helm releases in reverse dependency order.


Phase 6: DEPLOY_INGRESS_CONTROLLER

Deploys a dedicated NGINX ingress controller in the tenant's namespace. This gives each tenant their own LoadBalancer IP address.

Operations

  1. Add the ingress-nginx Helm chart repository
  2. Install the ingress-nginx chart with tenant-specific values
  3. Configure a unique IngressClass for the tenant
  4. Wait for the LoadBalancer IP to be assigned

Helm Values

controller:
  ingressClassResource:
    name: nginx-acme
    controllerValue: k8s.io/ingress-nginx-acme
  ingressClass: nginx-acme
  replicaCount: 2
  service:
    type: LoadBalancer
    annotations:
      service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: /healthz
  admissionWebhooks:
    enabled: false
  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      cpu: 500m
      memory: 256Mi

LoadBalancer IP Wait

The orchestrator polls the service until an external IP is assigned:

Attempt 1/60: No external IP yet...
Attempt 2/60: No external IP yet...
Attempt 3/60: External IP assigned: 20.85.123.45

Maximum wait time: 600 seconds. If no IP is assigned within this window, the phase fails and triggers retry.

Rollback

Uninstall the ingress-nginx Helm release. The LoadBalancer and its IP are automatically released.


Phase 7: CREATE_DNS_ZONE

Creates an Azure DNS child zone for the tenant and configures NS delegation from the parent zone.

Operations

  1. Create child DNS zone (e.g., acme.matih.ai)
  2. Retrieve nameservers from the new zone
  3. Create NS delegation records in the parent zone (matih.ai)
  4. Create A records pointing to the tenant's LoadBalancer IP
  5. Create wildcard A record (*.acme.matih.ai) for service subdomains

DNS Record Structure

matih.ai (parent zone)
  |
  +-- NS acme -> ns1-01.azure-dns.com, ns2-01.azure-dns.net, ...

acme.matih.ai (child zone)
  |
  +-- A @ -> 20.85.123.45
  +-- A * -> 20.85.123.45
  +-- A api -> 20.85.123.45
  +-- A bi -> 20.85.123.45

See DNS Zone Management for detailed DNS architecture.

Rollback

Delete the child DNS zone and NS delegation records from the parent zone.


Phase 8: CREATE_TENANT_INGRESS

Creates Kubernetes Ingress resources and TLS certificates for the tenant's services.

Operations

  1. Create cert-manager Certificate resource for the tenant domain
  2. Wait for certificate issuance (Let's Encrypt DNS-01 challenge)
  3. Create Ingress resource with TLS termination and routing rules

Ingress Resource

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: tenant-ingress
  namespace: matih-acme
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod-dns01
spec:
  ingressClassName: nginx-acme
  tls:
    - hosts:
        - acme.matih.ai
        - "*.acme.matih.ai"
      secretName: acme-tls-certificate
  rules:
    - host: acme.matih.ai
      http:
        paths:
          - path: /api
            pathType: Prefix
            backend:
              service:
                name: api-gateway
                port:
                  number: 8080
    - host: bi.acme.matih.ai
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: bi-workbench
                port:
                  number: 3000

Rollback

Delete the Ingress resource and Certificate. The TLS secret is cleaned up by cert-manager.


Phase 9: DEPLOY_MONITORING

Deploys tenant-specific monitoring and alerting infrastructure.

Operations

  1. Deploy Prometheus instance with tenant-scoped service discovery
  2. Configure Grafana dashboards for tenant metrics
  3. Set up default alert rules (high CPU, memory, error rate)
  4. Configure alert notification channels (email, Slack)

Default Alert Rules

AlertConditionSeverity
High CPUPod CPU > 80% for 5 minWarning
High MemoryPod memory > 85% for 5 minWarning
Pod RestartContainer restart count > 3 in 15 minCritical
Error RateHTTP 5xx rate > 5% for 5 minCritical
LatencyP95 latency > 2s for 5 minWarning
Disk UsagePVC usage > 85%Warning

Rollback

Uninstall monitoring Helm releases and delete custom resources.


Phase 10: SETUP_OBSERVABILITY

Configures log aggregation and distributed tracing for the tenant.

Operations

  1. Configure Loki log collection for the tenant namespace
  2. Set up OpenTelemetry collector with tenant context propagation
  3. Configure Tempo trace storage with tenant-scoped retention
  4. Verify log and trace pipeline connectivity

Observability Stack Per Tenant

ComponentScopeConfiguration
LokiLog aggregationLabel: namespace=matih-acme
TempoDistributed tracingAttribute: tenant_id=acme
OpenTelemetry CollectorTelemetry pipelineProcessor: tenant context injection
GrafanaVisualizationData source: tenant-scoped Loki/Tempo

Rollback

Remove observability configuration. Historical data is retained for the configured retention period.


Phase Transition Diagram

Phase 1         Phase 2         Phase 3         Phase 4         Phase 5
CREATE          SETUP           DEPLOY          CONFIGURE       DEPLOY
NAMESPACE  -->  DATABASE   -->  CORE       -->  NETWORKING -->  DATA
                                SERVICES                        SERVICES
    |               |               |               |               |
    v               v               v               v               v
Phase 6         Phase 7         Phase 8         Phase 9         Phase 10
DEPLOY          CREATE          CREATE          DEPLOY          SETUP
INGRESS    -->  DNS        -->  TENANT     -->  MONITORING -->  OBSERVABILITY
CONTROLLER      ZONE            INGRESS

Provisioning Completion

When all 10 phases complete successfully, the ProvisioningOrchestrator performs the following finalization steps:

  1. Update tenant status from PROVISIONING to ACTIVE
  2. Set provisioningCompletedAt timestamp
  3. Clear any provisioningError from previous attempts
  4. Send completion notification to tenant admin
  5. Publish tenant.provisioned event to Kafka
  6. Log total provisioning duration

Typical Provisioning Durations

TierTotal DurationBottleneck Phase
Free3-5 minutesDEPLOY_CORE_SERVICES
Professional5-10 minutesDEPLOY_DATA_SERVICES
Enterprise15-45 minutesPROVISIONING_INFRASTRUCTURE (Terraform)

Monitoring Provisioning Progress

The provisioning progress is available through multiple channels:

ChannelEndpoint/MechanismUpdate Frequency
REST APIGET /api/v1/tenants/{id}/provisioning/statusOn-demand polling
WebSocketws://tenant-service/ws/provisioning/{id}Real-time updates
Admin DashboardControl Plane UI provisioning viewReal-time via WebSocket
NotificationsEmail on completion/failureOn state change
Audit Logprovisioning_audit_logs tableOn every phase transition

Next Steps