MATIH Platform is in active MVP development. Documentation reflects current implementation status.
7. Tenant Lifecycle
Tenant Service Architecture

Tenant Service Architecture

The tenant service (tenant-service) is the largest microservice in the MATIH control plane, responsible for managing the complete lifecycle of tenant organizations. It coordinates infrastructure provisioning, billing, compliance, monitoring, and integration management across the platform.


Service Overview

PropertyValue
Artifactcom.matih:tenant-service
FrameworkSpring Boot 3.2
LanguageJava 21
Port8082
DatabasePostgreSQL via Spring Data JPA / Flyway
State MachineSpring Statemachine
Azure SDKAzure Resource Manager
KubernetesFabric8 Kubernetes Client
HelmCustom CLI wrapper (HelmClientImpl)
AsyncSpring @Async with custom thread pools

Package Structure

com.matih.tenant/
  activity/                  # Activity feed service
  admin/
    health/                  # System health monitoring
    operations/              # Admin operations and configuration
    reports/                 # Operational reporting
  alerting/                  # Alert rules and incident management
  analytics/                 # Cross-tenant analytics
  apikeys/                   # API key management
  audit/
    compliance/              # Compliance reporting
    retention/               # Audit data retention policies
  billing/
    invoice/                 # Invoice generation and payments
    plans/                   # Subscription plans and features
    usage/                   # Usage metering and aggregation
  bookmark/                  # User bookmark service
  branding/                  # White-label customization
  client/                    # IAM service client (Feign)
  compliance/                # Compliance frameworks and evidence
  config/                    # Spring configuration classes
  entity/                    # JPA entities
  provisioning/              # Logging context for provisioning
  repository/                # Spring Data JPA repositories
  security/                  # JWT filter, IP allowlists, encryption
  service/
    chart/                   # Helm chart repository management
    cost/                    # Cost attribution and collection
    deployment/              # Deployment diagnostics
    helm/                    # Helm client abstraction
    migration/               # Tier migration orchestration
    provisioning/            # Provisioning orchestrator
    sp/                      # Service principal management
    terraform/               # Terraform execution
    upgrade/                 # Service upgrade orchestration
  statemachine/              # State machine configurations
  webhooks/                  # Webhook management and delivery
  workspace/                 # Workspace service

Core Entities

Tenant Entity

The Tenant entity is the root aggregate for all tenant-related operations:

FieldTypeDescription
idUUIDPrimary key
nameStringDisplay name
slugStringURL-safe identifier (e.g., acme)
adminEmailStringPrimary admin email
tierTenantTierFREE, PROFESSIONAL, or ENTERPRISE
statusTenantStatusCurrent lifecycle status
regionStringDeployment region
deploymentTypeDeploymentTypeSHARED or DEDICATED
kubernetesNamespaceStringK8s namespace
azureAksClusterNameStringAKS cluster name
provisioningStartedAtInstantProvisioning start time
provisioningCompletedAtInstantProvisioning completion time
provisioningErrorStringLast provisioning error message

Tenant Status Transitions

PENDING_APPROVAL
       |
       v
   APPROVED
       |
       v
  PROVISIONING -----> FAILED
       |                  |
       v                  v
    ACTIVE          DEPROVISIONING
       |                  |
       +---> SUSPENDED    v
       |         |     DELETED
       v         v
  UPGRADING  REACTIVATING
       |         |
       v         v
    ACTIVE    ACTIVE

Tenant Tiers

TierDeploymentInfrastructureFeatures
FREEShared clusterNamespace with resource quotasBasic features, limited users
PROFESSIONALShared clusterDedicated namespace, higher quotasFull features, SSO, advanced analytics
ENTERPRISEDedicated clusterFull Terraform-provisioned infrastructureAll features, custom domains, SLA

State Machines

The tenant service uses Spring Statemachine to manage long-running operations. Each state machine defines valid states, transitions, guards, and actions.

Provisioning State Machine

Manages the tenant provisioning lifecycle:

StateDescription
INITIALJob created, not yet started
VALIDATING_INPUTValidating tenant configuration
CREATING_TENANT_RECORDCreating tenant record in database
ALLOCATING_SHARED_CLUSTER(Free tier) Allocating shared cluster namespace
CONFIGURING_QUOTAS(Free tier) Setting resource quotas
VALIDATING_SERVICE_PRINCIPAL(Dedicated tier) Validating Azure SP
ACQUIRING_TERRAFORM_LOCK(Dedicated tier) Acquiring Terraform state lock
PROVISIONING_INFRASTRUCTURE(Dedicated tier) Running Terraform
CREATING_KUBERNETES_RESOURCESCreating namespaces, RBAC, network policies
DEPLOYING_SERVICESDeploying Helm releases
VERIFYING_CONNECTIVITYRunning health checks
COMPLETEDProvisioning successful
ROLLING_BACKUndoing completed steps
ROLLED_BACKRollback completed
FAILEDUnrecoverable failure

Upgrade State Machine

Manages service version upgrades:

StateDescription
PENDINGUpgrade scheduled
PRE_CHECKRunning pre-upgrade health checks
BACKING_UPCreating backup of current state
UPGRADINGRolling out new versions
POST_CHECKRunning post-upgrade verification
COMPLETEDUpgrade successful
ROLLING_BACKReverting to previous version
FAILEDUpgrade failed

Tier Migration State Machine

Manages transitions between tenant tiers:

StateDescription
REQUESTEDMigration requested
VALIDATINGChecking migration compatibility
PROVISIONING_TARGETCreating target infrastructure
MIGRATING_DATAMoving data to new infrastructure
SWITCHING_TRAFFICUpdating DNS and routing
VERIFYINGRunning validation checks
COMPLETEDMigration successful
ROLLING_BACKReverting migration
FAILEDMigration failed

Integration Points

Inbound Dependencies

ConsumerProtocolPurpose
API GatewayHTTPTenant management API
Control Plane UIHTTPAdmin dashboard
IAM ServiceHTTPTenant context for user operations
Billing webhooksHTTPPayment processor callbacks

Outbound Dependencies

DependencyProtocolPurpose
PostgreSQLJDBCPersistent storage
IAM ServiceHTTP (Feign)User creation, role assignment
Notification ServiceHTTP/KafkaEmail and Slack notifications
Audit ServiceHTTP/KafkaAudit event publishing
Azure Resource ManagerHTTPSDNS zones, AKS management
Kubernetes APIHTTPS (Fabric8)Namespace and resource management
Helm CLIProcess execChart deployment and management
Terraform CLIProcess execInfrastructure provisioning
Key VaultHTTPSSecret management

Helm Client Architecture

The tenant service wraps the Helm CLI with a typed Java abstraction that provides retry logic, timeout management, and structured result parsing:

TenantHelmService (business logic)
       |
       v
RetryableHelmService (retry with exponential backoff)
       |
       v
HelmReleaseManager (release lifecycle)
       |
       v
HelmClientImpl (CLI execution)
       |
       v
helm CLI (process execution)
ClassResponsibility
TenantHelmServiceTenant-specific Helm operations (deploy, upgrade, rollback)
RetryableHelmServiceWraps operations with configurable retry (3 attempts, exponential backoff)
HelmReleaseManagerManages release lifecycle (install, upgrade, rollback, status)
HelmClientImplExecutes helm CLI commands as subprocesses
ChartRepositoryServiceManages chart repository configuration

Terraform Integration

For enterprise-tier tenants, the service orchestrates Terraform to provision dedicated infrastructure:

ComponentDescription
TerraformExecutorExecutes Terraform commands (init, plan, apply, destroy)
TerraformStateManagerManages remote state backend configuration
TenantTemplateGeneratorGenerates tenant-specific Terraform configurations

The Terraform execution acquires a distributed lock to prevent concurrent modifications to the same state file.


Scheduled Tasks

TaskScheduleDescription
Provisioning retryEvery 60sRetries failed provisioning jobs
Stale job cleanupHourlyMarks stuck jobs as failed (4-hour timeout)
Cost collectionEvery 15 minCollects Kubernetes resource usage
Service principal expiry checkDailyAlerts on expiring Azure SPs
Audit retentionDailyArchives and purges old audit records
Trial expirationDailyHandles expired free tier trials

Error Handling and Recovery

The ProvisioningOrchestrator implements a robust error handling strategy:

  1. Automatic retry -- failed steps are retried up to 3 times with exponential backoff (60s, 120s, 240s)
  2. State-aware recovery -- the orchestrator can resume from any state after a restart
  3. Rollback -- if retries are exhausted, completed steps are rolled back in reverse order
  4. Notification -- failures trigger notifications to platform operators and tenant admins
  5. Manual intervention -- permanently failed jobs can be retried or reset by administrators

Next Steps