MATIH Platform is in active MVP development. Documentation reflects current implementation status.
2. Architecture
Tenant Architecture

Tenant Service Architecture

Production - Port 8082 - Tenant lifecycle, provisioning state machine

The Tenant Service manages the complete lifecycle of tenants from initial creation through provisioning, configuration, scaling, suspension, and eventual decommissioning. It is the most operationally complex Control Plane service, orchestrating interactions with the IAM service, infrastructure service, Kubernetes API, DNS providers, and all Data Plane services.


2.3.C.1Provisioning State Machine

Tenant provisioning is implemented as a state machine with 8 phases. Each phase is idempotent, meaning it can be safely retried if it fails partway through. The state machine persists its current phase to PostgreSQL, enabling recovery from crashes.

Phase Diagram

   +----------+     +-----------+     +----------+     +----------+
   | VALIDATE |---->| CREATE    |---->| DEPLOY   |---->| DEPLOY   |
   |          |     | NAMESPACE |     | SECRETS  |     | DATABASES|
   +----------+     +-----------+     +----------+     +----------+
                                                            |
   +----------+     +-----------+     +----------+     +----v-----+
   | ACTIVATE |<----| VERIFY    |<----| CONFIGURE|<----| DEPLOY   |
   |          |     |           |     |          |     | SERVICES |
   +----------+     +-----------+     +----------+     +----------+
                                           ^
                                      +----+-----+
                                      | DEPLOY   |
                                      | INGRESS  |
                                      | (5.5)    |
                                      +----------+

Phase Details

Phase 1: VALIDATE

Validates tenant creation request:

  • Tenant name uniqueness check
  • Slug format validation (lowercase, alphanumeric, hyphens)
  • Tier validation against available plans
  • Admin email format and uniqueness
  • Cloud provider availability check

Phase 2: CREATE_NAMESPACE

Creates the tenant's Kubernetes namespace with security boundaries:

# Created resources:
# 1. Namespace
apiVersion: v1
kind: Namespace
metadata:
  name: matih-data-plane-{tenant-slug}
  labels:
    matih.io/tenant: "{tenant-slug}"
    matih.io/tier: "{tier}"
    matih.io/managed-by: "tenant-service"
 
# 2. NetworkPolicy (restrict cross-namespace traffic)
# 3. ResourceQuota (CPU, memory, pod limits per tier)
# 4. ServiceAccount (for pod identity)
# 5. RBAC RoleBindings

Phase 3: DEPLOY_SECRETS

Creates Kubernetes secrets for database credentials, service tokens, and external integrations:

SecretContentsUsed By
:tenant-db-credentialsPostgreSQL username, passwordAll Java services
:tenant-redis-credentialsRedis passwordAll services
:tenant-kafka-credentialsKafka SASL credentialsEvent-producing services
:tenant-jwt-secretJWT signing keyIAM service proxy
:tenant-llm-api-keyOpenAI/Azure API keyAI service

Phase 4: DEPLOY_DATABASES

Provisions per-tenant PostgreSQL schemas for each Data Plane service:

-- Schema creation for each service
CREATE SCHEMA IF NOT EXISTS {tenant_slug}_ai;
CREATE SCHEMA IF NOT EXISTS {tenant_slug}_bi;
CREATE SCHEMA IF NOT EXISTS {tenant_slug}_query;
CREATE SCHEMA IF NOT EXISTS {tenant_slug}_catalog;
CREATE SCHEMA IF NOT EXISTS {tenant_slug}_pipeline;
CREATE SCHEMA IF NOT EXISTS {tenant_slug}_ml;
CREATE SCHEMA IF NOT EXISTS {tenant_slug}_quality;
CREATE SCHEMA IF NOT EXISTS {tenant_slug}_ontology;
CREATE SCHEMA IF NOT EXISTS {tenant_slug}_governance;

Phase 5: DEPLOY_SERVICES

Deploys all 14 Data Plane services via Helm:

For each service in [query-engine, catalog-service, semantic-layer, bi-service,
                     pipeline-service, ai-service, ml-service, data-quality-service,
                     render-service, data-plane-agent, ontology-service,
                     governance-service, ops-agent-service, auth-proxy]:
    helm install {tenant}-{service} infrastructure/helm/{service}/ \
        --namespace matih-data-plane-{tenant-slug} \
        --values values.yaml \
        --values values-{environment}.yaml \
        --set tenant.id={tenant-slug} \
        --set tenant.tier={tier}

Phase 5.5: DEPLOY_INGRESS

Provisions the tenant's ingress infrastructure:

  1. Deploy NGINX ingress controller in tenant namespace (dedicated LoadBalancer IP)
  2. Create Azure DNS child zone (e.g., acme.matih.ai) with NS delegation
  3. Create A records pointing to the tenant's LoadBalancer IP
  4. Create cert-manager Certificate for TLS (DNS01 challenge)
  5. Create Kubernetes Ingress resource with TLS termination

Phase 6: CONFIGURE

Applies tenant-specific configuration overrides through the config-service:

  • AI model preferences (GPT-4, Claude, custom models)
  • Query timeout limits
  • Dashboard branding
  • Feature flag overrides based on tier

Phase 7: VERIFY

Health-checks all deployed services:

for (Service service : deployedServices) {
    String healthUrl = String.format(
        "http://%s.%s.svc.cluster.local:%d/health",
        service.getName(),
        tenantNamespace,
        service.getPort()
    );
    HealthStatus status = healthChecker.check(healthUrl, Duration.ofSeconds(30));
    if (status != HealthStatus.HEALTHY) {
        throw new ProvisioningException(
            "Service " + service.getName() + " failed health check"
        );
    }
}

Phase 8: ACTIVATE

Finalizes provisioning:

  • Creates tenant admin user via IAM service
  • Sets tenant status to ACTIVE
  • Publishes TENANT_PROVISIONED event to Kafka
  • Sends welcome notification to tenant admin

2.3.C.2State Persistence and Recovery

The provisioning state is persisted to PostgreSQL after each phase transition:

@Entity
@Table(name = "tenant_provisioning_state")
public class ProvisioningState {
    @Id
    private UUID tenantId;
    private ProvisioningPhase currentPhase;
    private ProvisioningStatus status;     // IN_PROGRESS, COMPLETED, FAILED
    private String failureReason;
    private Map<String, Object> phaseOutputs;  // Results from each phase
    private Instant startedAt;
    private Instant lastUpdatedAt;
    private int retryCount;
}

If the tenant-service crashes during provisioning, the state machine resumes from the last completed phase on restart. Each phase checks whether its work has already been done (idempotency) before executing.


2.3.C.3Tenant Tiers and Quotas

Each tenant is assigned a tier that determines resource quotas and feature access:

TierCPU QuotaMemory QuotaMax PodsMax UsersFeatures
Free2 cores4Gi205Basic analytics, limited AI
Professional8 cores16Gi5050Full analytics, AI chat, ML
EnterpriseCustomCustomCustomUnlimitedAll features, custom models, SLA

Tier-specific quotas are enforced via Kubernetes ResourceQuotas in the tenant namespace. Feature gating is enforced through the config-service, which evaluates tenant tier when resolving feature flags.


2.3.C.4Tenant Lifecycle Operations

Suspension

Tenant suspension preserves all data but stops all services:

1. Set tenant status to SUSPENDED
2. Scale all Data Plane deployments to 0 replicas
3. Revoke all active JWT tokens (add to blacklist)
4. Publish TENANT_SUSPENDED event
5. Send notification to tenant admin

Reactivation

1. Scale all Data Plane deployments back to configured replicas
2. Wait for all services to pass health checks
3. Set tenant status to ACTIVE
4. Publish TENANT_ACTIVATED event

Deletion

Tenant deletion is a destructive, multi-step process:

1. Set tenant status to DELETING
2. Export audit logs to long-term storage (compliance requirement)
3. Delete all Helm releases in tenant namespace
4. Delete tenant PostgreSQL schemas
5. Delete tenant Kubernetes namespace (cascades all resources)
6. Delete DNS zone and ingress records
7. Delete tenant record from Control Plane database
8. Publish TENANT_DELETED event

Deletion requires platform_admin role and a confirmation code.


Related Sections