MATIH Platform is in active MVP development. Documentation reflects current implementation status.
7. Tenant Lifecycle
Two Tier Provisioning

Two-Tier Provisioning

MATIH's provisioning architecture separates tenant setup into two distinct tiers: the control plane tier and the data plane tier. This separation is a deliberate design choice that isolates fast, lightweight metadata operations from slow, resource-intensive infrastructure operations. It ensures that control plane responsiveness is never blocked by data plane provisioning, and that each tier can fail and recover independently.


Architectural Overview

                    Tenant Registration Request
                              |
                              v
                 +---------------------------+
                 |   CONTROL PLANE TIER      |
                 |   (Synchronous, fast)     |
                 |                           |
                 |   1. Create tenant record  |
                 |   2. Assign admin user     |
                 |   3. Configure billing     |
                 |   4. Set default roles     |
                 |   5. Initialize settings   |
                 +---------------------------+
                              |
                       Responds to user
                       (tenant created)
                              |
                              v
                 +---------------------------+
                 |   DATA PLANE TIER         |
                 |   (Asynchronous, slow)    |
                 |                           |
                 |   6. Create namespace      |
                 |   7. Deploy database       |
                 |   8. Deploy core services  |
                 |   9. Configure networking  |
                 |  10. Deploy data services  |
                 |  11. Deploy ingress        |
                 |  12. Create DNS zone       |
                 |  13. Configure ingress     |
                 |  14. Deploy monitoring     |
                 |  15. Setup observability   |
                 +---------------------------+

Control Plane Tier

The control plane tier handles tenant metadata creation. These operations interact only with the control plane PostgreSQL database and are completed synchronously within the HTTP request lifecycle.

Operations

StepServiceDurationDescription
Create tenant recordTenantServiceunder 100msInsert tenant entity with configuration
Create admin userIamServiceClientunder 200msCall IAM service to create initial admin
Configure billingBillingPlanServiceunder 100msAssign default billing plan and subscription
Set default rolesRoleService< 100msCreate tenant-specific roles (Admin, Analyst, Viewer)
Initialize settingsSettingsService< 100msSet default tenant configuration values
Create API keyApiKeyService< 100msGenerate initial admin API key

Characteristics

PropertyValue
Execution modelSynchronous, within HTTP request
Typical duration200-500ms
DependenciesPostgreSQL, IAM Service
Failure recoveryDatabase transaction rollback
User feedbackImmediate HTTP response

Tenant Record After Control Plane Tier

After the control plane tier completes, the tenant record has:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "name": "Acme Corporation",
  "slug": "acme",
  "tier": "PROFESSIONAL",
  "status": "PROVISIONING",
  "region": "eastus",
  "adminEmail": "admin@acme.com",
  "provisioningStartedAt": "2026-02-12T10:00:00Z",
  "deploymentType": null,
  "kubernetesNamespace": null,
  "azureAksClusterName": null
}

The tenant is usable for basic control plane operations (viewing settings, managing users) but cannot run queries or access data plane features until provisioning completes.


Data Plane Tier

The data plane tier handles infrastructure provisioning and service deployment. These operations are asynchronous and managed by the ProvisioningOrchestrator with a state machine that tracks progress.

Operations

PhaseServiceDurationDescription
CREATE_NAMESPACEProvisioningService5-30sCreate Kubernetes namespace and RBAC
SETUP_DATABASEProvisioningService30-120sProvision tenant database schema
DEPLOY_CORE_SERVICESTenantHelmService60-300sDeploy essential services (query engine, AI service)
CONFIGURE_NETWORKINGProvisioningService10-30sApply network policies
DEPLOY_DATA_SERVICESTenantHelmService60-300sDeploy data pipeline and catalog services
DEPLOY_INGRESS_CONTROLLERTenantIngressService30-120sDeploy per-tenant NGINX
CREATE_DNS_ZONEAzureDnsService10-60sCreate child DNS zone with NS delegation
CREATE_TENANT_INGRESSTenantIngressService10-30sCreate Ingress resources and TLS certificates
DEPLOY_MONITORINGTenantHelmService30-60sDeploy Prometheus, Grafana dashboards
SETUP_OBSERVABILITYProvisioningService10-30sConfigure log aggregation and tracing

Characteristics

PropertyValue
Execution modelAsynchronous (@Async with custom thread pool)
Typical duration5-15 minutes (shared), 15-45 minutes (dedicated)
DependenciesKubernetes, Helm, Azure, Terraform
Failure recoveryState machine retry with exponential backoff
User feedbackWebSocket status updates, polling endpoint

Tier-Specific Provisioning Paths

The data plane tier follows different paths depending on the tenant tier:

Free Tier Path

Free tier tenants are provisioned on the shared cluster with resource quotas:

INITIAL
  |
  v
VALIDATING_INPUT
  |
  v
CREATING_TENANT_RECORD
  |
  v
ALLOCATING_SHARED_CLUSTER    <-- Select shared cluster, create namespace
  |
  v
CONFIGURING_QUOTAS           <-- Apply ResourceQuota and LimitRange
  |
  v
DEPLOYING_SERVICES           <-- Deploy subset of services via Helm
  |
  v
VERIFYING_CONNECTIVITY       <-- Health checks on deployed services
  |
  v
COMPLETED

Free tier resource quotas:

ResourceLimit
CPU requests2 cores
CPU limits4 cores
Memory requests4 Gi
Memory limits8 Gi
Pods20
Services10
PVCs5

Professional Tier Path

Professional tier tenants get a dedicated namespace on the shared cluster with higher resource quotas and full service deployment:

INITIAL
  |
  v
VALIDATING_INPUT
  |
  v
CREATING_TENANT_RECORD
  |
  v
ALLOCATING_SHARED_CLUSTER    <-- Dedicated namespace in shared cluster
  |
  v
CONFIGURING_QUOTAS           <-- Higher quotas
  |
  v
DEPLOYING_SERVICES           <-- Full service stack
  |
  v
VERIFYING_CONNECTIVITY
  |
  v
COMPLETED

Enterprise Tier Path

Enterprise tier tenants get a fully dedicated Kubernetes cluster provisioned through Terraform:

INITIAL
  |
  v
VALIDATING_INPUT
  |
  v
CREATING_TENANT_RECORD
  |
  v
VALIDATING_SERVICE_PRINCIPAL    <-- Validate Azure credentials
  |
  v
ACQUIRING_TERRAFORM_LOCK        <-- Distributed lock for Terraform state
  |
  v
PROVISIONING_INFRASTRUCTURE     <-- Terraform: AKS, networking, storage
  |
  v
CREATING_KUBERNETES_RESOURCES   <-- Namespace, RBAC, network policies
  |
  v
DEPLOYING_SERVICES              <-- Full service stack + custom config
  |
  v
VERIFYING_CONNECTIVITY
  |
  v
COMPLETED

Provisioning Job Entity

Each provisioning attempt is tracked by a TenantProvisioningJob entity:

FieldTypeDescription
idUUIDJob identifier
tenantIdUUIDTarget tenant
tierTenantTierTier being provisioned
currentStateTenantProvisioningStateCurrent state machine state
initiatedByUUIDUser who triggered provisioning
startedAtInstantJob start time
completedAtInstantJob completion time
errorMessageStringLast error message
retryCountIntegerNumber of retry attempts
maxRetriesIntegerMaximum allowed retries (default: 3)
nextRetryAtInstantScheduled retry time
contextMapKey-value metadata (tenant slug, cluster name, etc.)

Provisioning Status API

Clients can track provisioning progress through a polling endpoint:

GET /api/v1/tenants/{tenantId}/provisioning/status
Authorization: Bearer {admin_token}
{
  "tenantId": "550e8400-e29b-41d4-a716-446655440000",
  "status": "PROVISIONING",
  "currentPhase": "DEPLOYING_SERVICES",
  "completedPhases": [
    "VALIDATING_INPUT",
    "CREATING_TENANT_RECORD",
    "ALLOCATING_SHARED_CLUSTER",
    "CONFIGURING_QUOTAS"
  ],
  "remainingPhases": [
    "VERIFYING_CONNECTIVITY"
  ],
  "progress": 80,
  "startedAt": "2026-02-12T10:00:00Z",
  "estimatedCompletion": "2026-02-12T10:08:00Z",
  "retryCount": 0
}

Failure Handling Comparison

AspectControl Plane TierData Plane Tier
Failure scopeTransaction rollbackState-machine-based retry
Retry strategyImmediate re-attemptExponential backoff (60s, 120s, 240s)
Max retries1 (within transaction)3 (configurable)
RollbackAutomatic (DB transaction)Step-by-step reverse operations
User notificationImmediate error responseAsync notification (email, webhook)
Admin visibilityError in HTTP responseProvisioning dashboard with phase details

Benefits of Two-Tier Separation

Responsiveness. The control plane tier responds to the user within milliseconds. The user can start configuring their tenant (settings, users, roles) while infrastructure provisioning runs in the background.

Resilience. A failure in Kubernetes or Terraform does not prevent the tenant record from being created. The data plane tier can retry independently.

Observability. Each tier has its own monitoring. Control plane operations are tracked through standard HTTP metrics. Data plane provisioning has dedicated dashboards with phase-level progress.

Scalability. Control plane operations scale with the database. Data plane operations scale with the number of available provisioning workers. These can be scaled independently.

Testability. Control plane logic can be tested with database integration tests. Data plane logic can be tested with Kubernetes test clusters or mocked clients.


Next Steps