Control Plane Architecture
The Control Plane is the management backbone of the MATIH platform. It consists of 10 Java/Spring Boot 3.2 services deployed in the matih-control-plane Kubernetes namespace. These services collectively handle identity management, tenant provisioning, configuration distribution, billing metering, audit logging, infrastructure orchestration, observability aggregation, and platform registry operations.
All Control Plane services share a common technology stack and coding conventions enforced through the commons-java library. They are tenant-aware -- they know about tenants and manage their lifecycle -- but they never process tenant business data directly.
2.3.1Technology Stack
Every Control Plane service is built on the same foundation:
| Component | Technology | Version |
|---|---|---|
| Language | Java | 21 (LTS) |
| Framework | Spring Boot | 3.2 |
| Security | Spring Security + custom JWT (commons-java) | |
| Persistence | Spring Data JPA + Hibernate | 6.4 |
| Multi-tenancy | Hibernate schema-based (TenantIdentifierResolver) | |
| Caching | Redis via TenantAwareCacheManager | |
| Messaging | Apache Kafka via KafkaEventStreamingService | |
| Build tool | Gradle | 8.x |
| Container base | Eclipse Temurin / Distroless Java | |
| Health check | Spring Boot Actuator (/api/v1/actuator/health) |
The homogeneous technology choice for the Control Plane was deliberate. All 10 services use Java/Spring Boot because:
- Spring Security provides the most mature JWT validation and filter chain framework
- Hibernate multi-tenancy natively supports schema-based tenant isolation
- Spring Data JPA reduces boilerplate for CRUD-heavy management services
- Consistent deployment -- a single Gradle build, single base image, single debugging toolchain
2.3.2Complete Service Registry
| Service | Port | Database | Primary Dependencies | Purpose |
|---|---|---|---|---|
iam-service | 8081 | iam | PostgreSQL, Redis | Identity, authentication, authorization |
tenant-service | 8082 | tenant | PostgreSQL, Redis, Kafka, iam-service | Tenant lifecycle and provisioning |
api-gateway | 8080 | none | iam-service, config-service | Request routing, JWT validation, rate limiting |
config-service | 8888 | config | PostgreSQL, Redis | Centralized configuration, feature flags |
notification-service | 8085 | notification | PostgreSQL, Redis, Kafka | Multi-channel notification delivery |
audit-service | 8086 | audit | PostgreSQL, Elasticsearch, Kafka | Immutable audit trail, compliance logging |
billing-service | 8087 | billing | PostgreSQL, Redis, Kafka | Usage metering, subscription management |
observability-api | 8088 | none | Prometheus, Elasticsearch | Metrics aggregation, observability API |
infrastructure-service | 8089 | infrastructure | PostgreSQL, Redis | Infrastructure provisioning, DNS, TLS |
platform-registry | 8084 | registry | PostgreSQL | Service catalog, schema registry |
2.3.3Service Descriptions
IAM Service (Port 8081)
The Identity and Access Management service is the security foundation of the entire platform. Every authenticated request in the platform was first authorized by an IAM-issued JWT token.
Core responsibilities:
- User registration, login, and profile management
- JWT access token (15-min expiry) and refresh token (7-day expiry) generation
- Service-to-service authentication token issuance (5-min expiry)
- Role-based access control (RBAC) with hierarchical permissions
- API key lifecycle management with scoped permissions
- Multi-factor authentication (TOTP, email verification)
- Password policies, account lockout, and brute-force protection
- OAuth2 / OIDC integration for SSO
Token architecture:
The IAM service generates four types of JWT tokens through JwtTokenProvider:
// Access token (15-minute expiry) - carries tenant_id, user_id, roles
generateAccessToken(userId, tenantId, roles)
// Refresh token (7-day expiry) - for obtaining new access tokens
generateRefreshToken(userId, tenantId)
// Service-to-service token (5-minute expiry) - inter-service auth
generateServiceToken(serviceName, scopes)
// API key token (configurable expiry) - programmatic access
generateApiKeyToken(keyId, tenantId, permissions, validity)Every access token carries three critical claims: sub (user ID), tenant_id (tenant scope), and roles (permission set). These claims are extracted by downstream services to establish tenant context without additional round-trips to the IAM service.
Key APIs:
| Endpoint | Method | Description |
|---|---|---|
/api/v1/auth/login | POST | Authenticate user, return JWT token pair |
/api/v1/auth/refresh | POST | Exchange refresh token for new access token |
/api/v1/auth/logout | POST | Invalidate active session |
/api/v1/users | GET/POST | List or create users |
/api/v1/users/{id} | GET/PUT/DELETE | User CRUD operations |
/api/v1/roles | GET/POST | Role management |
/api/v1/permissions | GET | Permission catalog |
/api/v1/api-keys | GET/POST | API key management |
/api/v1/auth/mfa/setup | POST | Configure MFA for user |
/api/v1/auth/mfa/verify | POST | Verify MFA code |
Tenant Service (Port 8082)
The Tenant Service manages the complete lifecycle of tenants from initial provisioning through configuration, scaling, and eventual decommissioning. It orchestrates a multi-phase state machine for provisioning.
Core responsibilities:
- Tenant CRUD operations (create, read, update, suspend, delete)
- Multi-phase provisioning state machine (8 phases)
- Namespace creation with RBAC, NetworkPolicies, ResourceQuotas
- DNS zone creation and ingress controller deployment
- Data plane service deployment orchestration via Helm
- Tenant configuration inheritance and override management
Provisioning state machine:
Phase 1: VALIDATE --> Validate tenant details, check slug uniqueness
Phase 2: CREATE_NAMESPACE --> Create K8s namespace with RBAC and NetworkPolicies
Phase 3: DEPLOY_SECRETS --> Create Kubernetes secrets for databases and services
Phase 4: DEPLOY_DATABASES --> Provision per-tenant PostgreSQL schemas
Phase 5: DEPLOY_SERVICES --> Helm install all 14 data plane services
Phase 5.5: DEPLOY_INGRESS --> Ingress controller + DNS zone + TLS certificate
Phase 6: CONFIGURE --> Apply tenant-specific configuration overrides
Phase 7: VERIFY --> Health check all deployed services
Phase 8: ACTIVATE --> Mark tenant as active, create admin userConfig Service (Port 8888)
Centralized, versioned configuration management for all platform services with hierarchical override support.
Configuration hierarchy:
Global defaults
--> Environment overrides (dev, staging, prod)
--> Service-specific overrides
--> Tenant-specific overridesHigher-specificity configurations override lower ones. Feature flags are evaluated against tenant tier for gradual rollouts. Configuration changes are distributed via Redis Pub/Sub for zero-downtime updates.
Notification Service (Port 8085)
Event-driven notification delivery across email, in-app, webhook, and Slack channels. Subscribes to Kafka topics for tenant lifecycle events, query failures, model deployments, billing thresholds, and security alerts.
Audit Service (Port 8086)
Maintains an immutable, append-only audit trail of all security-relevant operations. Stores records in both PostgreSQL (structured queries) and Elasticsearch (full-text search). Supports SOC 2, HIPAA, and GDPR compliance reporting.
Billing Service (Port 8087)
Tracks resource consumption per tenant (queries executed, AI tokens consumed, storage used, API calls, pipeline runs) and manages subscription tiers with feature entitlements.
Observability API (Port 8088)
Unified interface for querying metrics (Prometheus), traces (Tempo), and logs (Loki) across all services and namespaces.
Infrastructure Service (Port 8089)
Orchestrates Kubernetes resource creation, database provisioning, DNS zone lifecycle, TLS certificate management, and cloud provider API interactions for tenant infrastructure.
Platform Registry (Port 8084)
Maintains the catalog of all services, API schemas (OpenAPI), event schemas, service versions, and dependency information.
2.3.4Resource Allocation
All Control Plane services follow a consistent resource allocation pattern:
| Environment | CPU Request | CPU Limit | Memory Request | Memory Limit | Replicas |
|---|---|---|---|---|---|
| Development | 100m | 500m | 256Mi | 512Mi | 1 |
| Staging | 100m | 500m | 256Mi | 512Mi | 2 |
| Production | 100m | 500m | 256Mi | 512Mi | 3 |
Production deployments use 3 replicas for high availability, distributed across availability zones via pod anti-affinity rules defined in Helm values.
2.3.5Inter-Service Communication
Control Plane services communicate through two channels:
Synchronous REST -- Used for request-response patterns where the caller needs an immediate result. The RetryableRestClient from commons-java handles retries with exponential backoff and circuit breaking.
Asynchronous Kafka -- Used for event notifications that do not require an immediate response. Events flow through well-defined Kafka topics with tenant_id as the partition key, ensuring ordering within a tenant.
iam-service <---------> api-gateway (token validation)
|
tenant-service -----+--> notification-service (lifecycle events)
+--> audit-service (provisioning audit)
+--> billing-service (subscription creation)
+--> infrastructure-service (resource provisioning)
|
config-service <----+--> all services (configuration distribution via Redis)Related Sections
- Service Interactions -- Detailed inter-service communication flows
- IAM Architecture -- IAM service deep dive
- Tenant Architecture -- Provisioning state machine
- Shared Patterns -- Spring Boot patterns across all services
- Data Plane -- The 14 workload services