MATIH Platform is in active MVP development. Documentation reflects current implementation status.
2. Architecture
Overview

Control Plane Architecture

Production - 10 Java/Spring Boot 3.2 services in matih-control-plane namespace

The Control Plane is the management backbone of the MATIH platform. It consists of 10 Java/Spring Boot 3.2 services deployed in the matih-control-plane Kubernetes namespace. These services collectively handle identity management, tenant provisioning, configuration distribution, billing metering, audit logging, infrastructure orchestration, observability aggregation, and platform registry operations.

All Control Plane services share a common technology stack and coding conventions enforced through the commons-java library. They are tenant-aware -- they know about tenants and manage their lifecycle -- but they never process tenant business data directly.


2.3.1Technology Stack

Every Control Plane service is built on the same foundation:

ComponentTechnologyVersion
LanguageJava21 (LTS)
FrameworkSpring Boot3.2
SecuritySpring Security + custom JWT (commons-java)
PersistenceSpring Data JPA + Hibernate6.4
Multi-tenancyHibernate schema-based (TenantIdentifierResolver)
CachingRedis via TenantAwareCacheManager
MessagingApache Kafka via KafkaEventStreamingService
Build toolGradle8.x
Container baseEclipse Temurin / Distroless Java
Health checkSpring Boot Actuator (/api/v1/actuator/health)

The homogeneous technology choice for the Control Plane was deliberate. All 10 services use Java/Spring Boot because:

  1. Spring Security provides the most mature JWT validation and filter chain framework
  2. Hibernate multi-tenancy natively supports schema-based tenant isolation
  3. Spring Data JPA reduces boilerplate for CRUD-heavy management services
  4. Consistent deployment -- a single Gradle build, single base image, single debugging toolchain

2.3.2Complete Service Registry

ServicePortDatabasePrimary DependenciesPurpose
iam-service8081iamPostgreSQL, RedisIdentity, authentication, authorization
tenant-service8082tenantPostgreSQL, Redis, Kafka, iam-serviceTenant lifecycle and provisioning
api-gateway8080noneiam-service, config-serviceRequest routing, JWT validation, rate limiting
config-service8888configPostgreSQL, RedisCentralized configuration, feature flags
notification-service8085notificationPostgreSQL, Redis, KafkaMulti-channel notification delivery
audit-service8086auditPostgreSQL, Elasticsearch, KafkaImmutable audit trail, compliance logging
billing-service8087billingPostgreSQL, Redis, KafkaUsage metering, subscription management
observability-api8088nonePrometheus, ElasticsearchMetrics aggregation, observability API
infrastructure-service8089infrastructurePostgreSQL, RedisInfrastructure provisioning, DNS, TLS
platform-registry8084registryPostgreSQLService catalog, schema registry

2.3.3Service Descriptions

IAM Service (Port 8081)

The Identity and Access Management service is the security foundation of the entire platform. Every authenticated request in the platform was first authorized by an IAM-issued JWT token.

Core responsibilities:

  • User registration, login, and profile management
  • JWT access token (15-min expiry) and refresh token (7-day expiry) generation
  • Service-to-service authentication token issuance (5-min expiry)
  • Role-based access control (RBAC) with hierarchical permissions
  • API key lifecycle management with scoped permissions
  • Multi-factor authentication (TOTP, email verification)
  • Password policies, account lockout, and brute-force protection
  • OAuth2 / OIDC integration for SSO

Token architecture:

The IAM service generates four types of JWT tokens through JwtTokenProvider:

// Access token (15-minute expiry) - carries tenant_id, user_id, roles
generateAccessToken(userId, tenantId, roles)
 
// Refresh token (7-day expiry) - for obtaining new access tokens
generateRefreshToken(userId, tenantId)
 
// Service-to-service token (5-minute expiry) - inter-service auth
generateServiceToken(serviceName, scopes)
 
// API key token (configurable expiry) - programmatic access
generateApiKeyToken(keyId, tenantId, permissions, validity)

Every access token carries three critical claims: sub (user ID), tenant_id (tenant scope), and roles (permission set). These claims are extracted by downstream services to establish tenant context without additional round-trips to the IAM service.

Key APIs:

EndpointMethodDescription
/api/v1/auth/loginPOSTAuthenticate user, return JWT token pair
/api/v1/auth/refreshPOSTExchange refresh token for new access token
/api/v1/auth/logoutPOSTInvalidate active session
/api/v1/usersGET/POSTList or create users
/api/v1/users/{id}GET/PUT/DELETEUser CRUD operations
/api/v1/rolesGET/POSTRole management
/api/v1/permissionsGETPermission catalog
/api/v1/api-keysGET/POSTAPI key management
/api/v1/auth/mfa/setupPOSTConfigure MFA for user
/api/v1/auth/mfa/verifyPOSTVerify MFA code

Tenant Service (Port 8082)

The Tenant Service manages the complete lifecycle of tenants from initial provisioning through configuration, scaling, and eventual decommissioning. It orchestrates a multi-phase state machine for provisioning.

Core responsibilities:

  • Tenant CRUD operations (create, read, update, suspend, delete)
  • Multi-phase provisioning state machine (8 phases)
  • Namespace creation with RBAC, NetworkPolicies, ResourceQuotas
  • DNS zone creation and ingress controller deployment
  • Data plane service deployment orchestration via Helm
  • Tenant configuration inheritance and override management

Provisioning state machine:

Phase 1: VALIDATE          --> Validate tenant details, check slug uniqueness
Phase 2: CREATE_NAMESPACE  --> Create K8s namespace with RBAC and NetworkPolicies
Phase 3: DEPLOY_SECRETS    --> Create Kubernetes secrets for databases and services
Phase 4: DEPLOY_DATABASES  --> Provision per-tenant PostgreSQL schemas
Phase 5: DEPLOY_SERVICES   --> Helm install all 14 data plane services
Phase 5.5: DEPLOY_INGRESS  --> Ingress controller + DNS zone + TLS certificate
Phase 6: CONFIGURE         --> Apply tenant-specific configuration overrides
Phase 7: VERIFY            --> Health check all deployed services
Phase 8: ACTIVATE          --> Mark tenant as active, create admin user

Config Service (Port 8888)

Centralized, versioned configuration management for all platform services with hierarchical override support.

Configuration hierarchy:

Global defaults
  --> Environment overrides (dev, staging, prod)
    --> Service-specific overrides
      --> Tenant-specific overrides

Higher-specificity configurations override lower ones. Feature flags are evaluated against tenant tier for gradual rollouts. Configuration changes are distributed via Redis Pub/Sub for zero-downtime updates.

Notification Service (Port 8085)

Event-driven notification delivery across email, in-app, webhook, and Slack channels. Subscribes to Kafka topics for tenant lifecycle events, query failures, model deployments, billing thresholds, and security alerts.

Audit Service (Port 8086)

Maintains an immutable, append-only audit trail of all security-relevant operations. Stores records in both PostgreSQL (structured queries) and Elasticsearch (full-text search). Supports SOC 2, HIPAA, and GDPR compliance reporting.

Billing Service (Port 8087)

Tracks resource consumption per tenant (queries executed, AI tokens consumed, storage used, API calls, pipeline runs) and manages subscription tiers with feature entitlements.

Observability API (Port 8088)

Unified interface for querying metrics (Prometheus), traces (Tempo), and logs (Loki) across all services and namespaces.

Infrastructure Service (Port 8089)

Orchestrates Kubernetes resource creation, database provisioning, DNS zone lifecycle, TLS certificate management, and cloud provider API interactions for tenant infrastructure.

Platform Registry (Port 8084)

Maintains the catalog of all services, API schemas (OpenAPI), event schemas, service versions, and dependency information.


2.3.4Resource Allocation

All Control Plane services follow a consistent resource allocation pattern:

EnvironmentCPU RequestCPU LimitMemory RequestMemory LimitReplicas
Development100m500m256Mi512Mi1
Staging100m500m256Mi512Mi2
Production100m500m256Mi512Mi3

Production deployments use 3 replicas for high availability, distributed across availability zones via pod anti-affinity rules defined in Helm values.


2.3.5Inter-Service Communication

Control Plane services communicate through two channels:

Synchronous REST -- Used for request-response patterns where the caller needs an immediate result. The RetryableRestClient from commons-java handles retries with exponential backoff and circuit breaking.

Asynchronous Kafka -- Used for event notifications that do not require an immediate response. Events flow through well-defined Kafka topics with tenant_id as the partition key, ensuring ordering within a tenant.

iam-service <---------> api-gateway (token validation)
                    |
tenant-service -----+--> notification-service (lifecycle events)
                    +--> audit-service (provisioning audit)
                    +--> billing-service (subscription creation)
                    +--> infrastructure-service (resource provisioning)
                    |
config-service <----+--> all services (configuration distribution via Redis)

Related Sections