Tenant Service Architecture
The tenant service (tenant-service) is the largest microservice in the MATIH control plane, responsible for managing the complete lifecycle of tenant organizations. It coordinates infrastructure provisioning, billing, compliance, monitoring, and integration management across the platform.
Service Overview
| Property | Value |
|---|---|
| Artifact | com.matih:tenant-service |
| Framework | Spring Boot 3.2 |
| Language | Java 21 |
| Port | 8082 |
| Database | PostgreSQL via Spring Data JPA / Flyway |
| State Machine | Spring Statemachine |
| Azure SDK | Azure Resource Manager |
| Kubernetes | Fabric8 Kubernetes Client |
| Helm | Custom CLI wrapper (HelmClientImpl) |
| Async | Spring @Async with custom thread pools |
Package Structure
com.matih.tenant/
activity/ # Activity feed service
admin/
health/ # System health monitoring
operations/ # Admin operations and configuration
reports/ # Operational reporting
alerting/ # Alert rules and incident management
analytics/ # Cross-tenant analytics
apikeys/ # API key management
audit/
compliance/ # Compliance reporting
retention/ # Audit data retention policies
billing/
invoice/ # Invoice generation and payments
plans/ # Subscription plans and features
usage/ # Usage metering and aggregation
bookmark/ # User bookmark service
branding/ # White-label customization
client/ # IAM service client (Feign)
compliance/ # Compliance frameworks and evidence
config/ # Spring configuration classes
entity/ # JPA entities
provisioning/ # Logging context for provisioning
repository/ # Spring Data JPA repositories
security/ # JWT filter, IP allowlists, encryption
service/
chart/ # Helm chart repository management
cost/ # Cost attribution and collection
deployment/ # Deployment diagnostics
helm/ # Helm client abstraction
migration/ # Tier migration orchestration
provisioning/ # Provisioning orchestrator
sp/ # Service principal management
terraform/ # Terraform execution
upgrade/ # Service upgrade orchestration
statemachine/ # State machine configurations
webhooks/ # Webhook management and delivery
workspace/ # Workspace serviceCore Entities
Tenant Entity
The Tenant entity is the root aggregate for all tenant-related operations:
| Field | Type | Description |
|---|---|---|
id | UUID | Primary key |
name | String | Display name |
slug | String | URL-safe identifier (e.g., acme) |
adminEmail | String | Primary admin email |
tier | TenantTier | FREE, PROFESSIONAL, or ENTERPRISE |
status | TenantStatus | Current lifecycle status |
region | String | Deployment region |
deploymentType | DeploymentType | SHARED or DEDICATED |
kubernetesNamespace | String | K8s namespace |
azureAksClusterName | String | AKS cluster name |
provisioningStartedAt | Instant | Provisioning start time |
provisioningCompletedAt | Instant | Provisioning completion time |
provisioningError | String | Last provisioning error message |
Tenant Status Transitions
PENDING_APPROVAL
|
v
APPROVED
|
v
PROVISIONING -----> FAILED
| |
v v
ACTIVE DEPROVISIONING
| |
+---> SUSPENDED v
| | DELETED
v v
UPGRADING REACTIVATING
| |
v v
ACTIVE ACTIVETenant Tiers
| Tier | Deployment | Infrastructure | Features |
|---|---|---|---|
FREE | Shared cluster | Namespace with resource quotas | Basic features, limited users |
PROFESSIONAL | Shared cluster | Dedicated namespace, higher quotas | Full features, SSO, advanced analytics |
ENTERPRISE | Dedicated cluster | Full Terraform-provisioned infrastructure | All features, custom domains, SLA |
State Machines
The tenant service uses Spring Statemachine to manage long-running operations. Each state machine defines valid states, transitions, guards, and actions.
Provisioning State Machine
Manages the tenant provisioning lifecycle:
| State | Description |
|---|---|
INITIAL | Job created, not yet started |
VALIDATING_INPUT | Validating tenant configuration |
CREATING_TENANT_RECORD | Creating tenant record in database |
ALLOCATING_SHARED_CLUSTER | (Free tier) Allocating shared cluster namespace |
CONFIGURING_QUOTAS | (Free tier) Setting resource quotas |
VALIDATING_SERVICE_PRINCIPAL | (Dedicated tier) Validating Azure SP |
ACQUIRING_TERRAFORM_LOCK | (Dedicated tier) Acquiring Terraform state lock |
PROVISIONING_INFRASTRUCTURE | (Dedicated tier) Running Terraform |
CREATING_KUBERNETES_RESOURCES | Creating namespaces, RBAC, network policies |
DEPLOYING_SERVICES | Deploying Helm releases |
VERIFYING_CONNECTIVITY | Running health checks |
COMPLETED | Provisioning successful |
ROLLING_BACK | Undoing completed steps |
ROLLED_BACK | Rollback completed |
FAILED | Unrecoverable failure |
Upgrade State Machine
Manages service version upgrades:
| State | Description |
|---|---|
PENDING | Upgrade scheduled |
PRE_CHECK | Running pre-upgrade health checks |
BACKING_UP | Creating backup of current state |
UPGRADING | Rolling out new versions |
POST_CHECK | Running post-upgrade verification |
COMPLETED | Upgrade successful |
ROLLING_BACK | Reverting to previous version |
FAILED | Upgrade failed |
Tier Migration State Machine
Manages transitions between tenant tiers:
| State | Description |
|---|---|
REQUESTED | Migration requested |
VALIDATING | Checking migration compatibility |
PROVISIONING_TARGET | Creating target infrastructure |
MIGRATING_DATA | Moving data to new infrastructure |
SWITCHING_TRAFFIC | Updating DNS and routing |
VERIFYING | Running validation checks |
COMPLETED | Migration successful |
ROLLING_BACK | Reverting migration |
FAILED | Migration failed |
Integration Points
Inbound Dependencies
| Consumer | Protocol | Purpose |
|---|---|---|
| API Gateway | HTTP | Tenant management API |
| Control Plane UI | HTTP | Admin dashboard |
| IAM Service | HTTP | Tenant context for user operations |
| Billing webhooks | HTTP | Payment processor callbacks |
Outbound Dependencies
| Dependency | Protocol | Purpose |
|---|---|---|
| PostgreSQL | JDBC | Persistent storage |
| IAM Service | HTTP (Feign) | User creation, role assignment |
| Notification Service | HTTP/Kafka | Email and Slack notifications |
| Audit Service | HTTP/Kafka | Audit event publishing |
| Azure Resource Manager | HTTPS | DNS zones, AKS management |
| Kubernetes API | HTTPS (Fabric8) | Namespace and resource management |
| Helm CLI | Process exec | Chart deployment and management |
| Terraform CLI | Process exec | Infrastructure provisioning |
| Key Vault | HTTPS | Secret management |
Helm Client Architecture
The tenant service wraps the Helm CLI with a typed Java abstraction that provides retry logic, timeout management, and structured result parsing:
TenantHelmService (business logic)
|
v
RetryableHelmService (retry with exponential backoff)
|
v
HelmReleaseManager (release lifecycle)
|
v
HelmClientImpl (CLI execution)
|
v
helm CLI (process execution)| Class | Responsibility |
|---|---|
TenantHelmService | Tenant-specific Helm operations (deploy, upgrade, rollback) |
RetryableHelmService | Wraps operations with configurable retry (3 attempts, exponential backoff) |
HelmReleaseManager | Manages release lifecycle (install, upgrade, rollback, status) |
HelmClientImpl | Executes helm CLI commands as subprocesses |
ChartRepositoryService | Manages chart repository configuration |
Terraform Integration
For enterprise-tier tenants, the service orchestrates Terraform to provision dedicated infrastructure:
| Component | Description |
|---|---|
TerraformExecutor | Executes Terraform commands (init, plan, apply, destroy) |
TerraformStateManager | Manages remote state backend configuration |
TenantTemplateGenerator | Generates tenant-specific Terraform configurations |
The Terraform execution acquires a distributed lock to prevent concurrent modifications to the same state file.
Scheduled Tasks
| Task | Schedule | Description |
|---|---|---|
| Provisioning retry | Every 60s | Retries failed provisioning jobs |
| Stale job cleanup | Hourly | Marks stuck jobs as failed (4-hour timeout) |
| Cost collection | Every 15 min | Collects Kubernetes resource usage |
| Service principal expiry check | Daily | Alerts on expiring Azure SPs |
| Audit retention | Daily | Archives and purges old audit records |
| Trial expiration | Daily | Handles expired free tier trials |
Error Handling and Recovery
The ProvisioningOrchestrator implements a robust error handling strategy:
- Automatic retry -- failed steps are retried up to 3 times with exponential backoff (60s, 120s, 240s)
- State-aware recovery -- the orchestrator can resume from any state after a restart
- Rollback -- if retries are exhausted, completed steps are rolled back in reverse order
- Notification -- failures trigger notifications to platform operators and tenant admins
- Manual intervention -- permanently failed jobs can be retried or reset by administrators
Next Steps
- Two-Tier Provisioning -- the foundational design pattern
- 10-Phase Provisioning Flow -- detailed phase walkthrough
- API Reference -- complete endpoint catalog