Part I: Platform Foundations

Chapter 2: Architecture Deep Dive

The MATIH Enterprise Platform is built on a two-plane architecture that cleanly separates platform management concerns from tenant workload execution. This chapter provides a comprehensive examination of every architectural layer, from high-level design philosophy down to individual service responsibilities, inter-service communication patterns, data store selection, and the decision rationale behind each structural choice.

Learning Objectives

Understand the two-plane architecture and why platform management is separated from tenant workloads
Map the complete service topology across 24 microservices and their dependencies
Trace request flows from browser through gateway, backend services, data stores, and back
Evaluate multi-tenancy isolation strategies at network, database, application, and event layers
Navigate the event-driven architecture including Kafka topology, CDC, and WebSocket patterns
Understand data store selection criteria across PostgreSQL, Redis, Kafka, Trino, and vector stores

Details

Estimated Read Time: 4-6 hours

Prerequisites:

Microservices architecture fundamentals
Kubernetes namespace and networking concepts
Event-driven architecture patterns
SQL and distributed query engines

Related Chapters:

Ch. 3: Security and Multi-Tenancy
Ch. 17: Kubernetes and Helm
Ch. 19: Observability
Ch. 18: CI/CD and Build System

Production - 24 microservices, 55+ Helm charts, 7 Kubernetes namespaces

What This Chapter Covers

This chapter is organized into ten sections, each addressing a distinct architectural dimension of the platform. The sections are designed to be read sequentially for a complete understanding, but each section is also self-contained for reference.

Section	Focus Area	Pages
Design Philosophy	Core principles, trade-offs, constraints, and the reasoning behind key architectural choices	1
Control Plane	All 10 Java/Spring Boot 3.2 services that manage platform operations, with internal architecture details	7
Data Plane	All 14 polyglot services (Java, Python, Node.js) that execute tenant workloads	7
Service Topology	Service discovery, dependency graphs, communication patterns, and failure propagation	4
Multi-Tenancy Architecture	Namespace isolation, TenantContext propagation, per-tenant databases, network policies	6
Event-Driven Architecture	Kafka event streaming, Redis Pub/Sub, WebSocket, CDC, and event schemas	6
API Gateway	Kong 3.5.0 gateway, custom Lua plugins, routing and rate limiting	1
Request Lifecycle and Data Flow	End-to-end request tracing from browser to database and back across five key flows	6
Data Stores	PostgreSQL, Redis, Kafka, Trino, Qdrant, Neo4j, MinIO, ClickHouse architecture	9
API Design	REST conventions, error handling, authentication patterns, rate limiting	5
Architecture Decision Records	ADRs documenting the rationale for major architectural decisions	1

Architecture at a Glance

The MATIH platform consists of 24 microservices distributed across 7 Kubernetes namespaces, communicating through a combination of synchronous REST APIs and asynchronous event streams. The platform processes natural language questions through a multi-agent AI pipeline that generates SQL, executes queries via Trino, and renders visualizations -- the "Intent to Insights" workflow.

                          +------------------+
                          |   Browser / CLI  |
                          +--------+---------+
                                   |
                          +--------v---------+
                          |  Kong API Gateway |
                          |   (Port 8080)     |
                          +--------+---------+
                                   |
                    +--------------+--------------+
                    |                             |
           +--------v---------+         +---------v--------+
           |  Control Plane   |         |   Data Plane     |
           |  (10 services)   |         |  (14 services)   |
           |  matih-control-  |         |  matih-data-     |
           |  plane namespace |         |  plane namespace  |
           +--------+---------+         +---------+--------+
                    |                             |
           +--------v---------+         +---------v--------+
           | PostgreSQL, Redis |         | PostgreSQL, Redis|
           | Kafka, ES         |         | Trino, Kafka     |
           +-------------------+         | Qdrant, Neo4j    |
                                         | ClickHouse, MinIO|
                                         +------------------+

Frontend Layer

bi-workbench:3000ml-workbench:3001data-workbench:3002agentic-workbench:3003control-plane-ui:3004data-plane-ui:3005onboarding-uiops-workbench

API Gateway

Kong 3.5.0:8080JWT PluginRate Limit PluginValidation Plugin

Control Plane (Java/Spring Boot 3.2)

iam-service:8081tenant-service:8082config-service:8888notification-service:8085audit-service:8086billing-service:8087observability-api:8088infrastructure-service:8089api-gateway:8080platform-registry:8084

Data Plane (Polyglot)

query-engine:8080catalog-service:8086semantic-layer:8086bi-service:8084pipeline-service:8092ai-service:8000ml-service:8000data-quality-service:8000render-service:8098data-plane-agent:8085ontology-service:8101governance-service:8080ops-agent-service:8080

Data Infrastructure

PostgreSQL 16Redis 7Kafka (Strimzi)TrinoClickHouseQdrantNeo4j/DgraphMinIOElasticsearch

ML/AI Infrastructure

MLflowRayvLLMTritonFeastJupyterHubFlinkSpark

Key Numbers

Metric	Value
Total microservices	24
Control Plane services	10 (all Java/Spring Boot 3.2)
Data Plane services	14 (Java, Python, Node.js)
Frontend applications	8 (React/Vite)
Kubernetes namespaces	7
Helm charts	55+
Kafka topics	20+ event categories
Commons libraries	4 (Java, Python, TypeScript, AI)
Data stores	9 distinct technologies

Two-Plane Architecture

The platform is divided into two distinct operational planes, each with its own deployment model, scaling strategy, and failure domain.

Control Plane

The Control Plane manages platform-level concerns that are shared across all tenants. It handles identity and access management, tenant provisioning, configuration distribution, billing, auditing, and infrastructure orchestration. All 10 Control Plane services are built with Java 21 and Spring Boot 3.2, deployed in the matih-control-plane namespace.

The Control Plane is tenant-aware but not tenant-specific -- it operates on metadata about tenants rather than on tenant data itself. When the Control Plane writes to its database, it stores tenant configuration, user profiles, and billing records -- never customer business data.

Data Plane

The Data Plane executes tenant-specific workloads including query execution, AI/ML inference, data pipeline orchestration, dashboard rendering, and data governance. Its 14 services span three technology stacks: Java/Spring Boot for data-intensive services, Python/FastAPI for AI/ML workloads, and Node.js for rendering. Data Plane services are deployed into per-tenant namespaces, providing namespace-level isolation.

The Data Plane processes, transforms, and analyzes actual customer data. Every query, every AI conversation, every ML training job runs within the tenant's Data Plane namespace, isolated from other tenants by Kubernetes NetworkPolicies and ResourceQuotas.

Namespace Organization

The platform uses seven Kubernetes namespaces to enforce logical and security boundaries:

Namespace	Purpose	Services
`matih-system`	Core platform infrastructure	Operators, CRDs, shared controllers, Strimzi, cert-manager
`matih-control-plane`	Platform management services	All 10 Control Plane services
`matih-data-plane`	Default tenant workload services	All 14 Data Plane services (per-tenant namespaces in production)
`matih-observability`	Monitoring and tracing	Prometheus, Grafana, Tempo, Loki
`matih-monitoring-control-plane`	Control Plane monitoring	Service-specific monitors and alerts
`matih-monitoring-data-plane`	Data Plane monitoring	Service-specific monitors and alerts
`matih-frontend`	Frontend applications	React workbench applications

In production, each tenant receives a dedicated namespace: matih-data-plane-{tenant-slug}. This namespace contains the tenant's Data Plane services, secrets, and resource quotas.

Commons Libraries

Shared functionality is extracted into four commons libraries that enforce consistency across service boundaries:

Library	Language	Key Modules
`commons-java`	Java	Security (JWT, RBAC), multi-tenancy (TenantContext), persistence, caching, observability, event streaming
`commons-python`	Python	Authentication middleware, tenant context, structured logging, health checks
`commons-typescript`	TypeScript	API client utilities, authentication hooks, shared UI components
`commons-ai`	Python	LLM abstractions, prompt management, RAG utilities, agent framework

The commons-java library alone provides over 100 classes spanning API versioning, billing context, cache management, CDN integration, circuit breakers, database optimization, event streaming, exception handling, Kafka messaging, observability, and security -- all designed with multi-tenancy as a first-class concern.

Technology Stack Summary

Backend Technologies

Layer	Technology	Version
Control Plane	Java 21 + Spring Boot 3.2	LTS
Data Plane (Java services)	Java 21 + Spring Boot 3.2	LTS
Data Plane (AI/ML services)	Python 3.11 + FastAPI	Latest
Data Plane (Rendering)	Node.js 20 + Express	LTS

Data Infrastructure

Component	Technology	Purpose
Primary database	PostgreSQL 16	Transactional data, metadata, tenant schemas
Caching and sessions	Redis 7	Session store, pub/sub, rate limiting
Event streaming	Kafka (Strimzi)	Asynchronous communication, event sourcing
Federated SQL	Trino	Distributed query execution across data sources
Full-text search	Elasticsearch 8.11	Audit log search, ontology search
OLAP analytics	ClickHouse / StarRocks	Fast analytical queries on large datasets
Vector embeddings	Qdrant / LanceDB	RAG embeddings, semantic search
Knowledge graphs	Neo4j / Dgraph	Context graphs, data lineage
Object storage	MinIO	S3-compatible artifact storage

ML/AI Infrastructure

Component	Technology	Purpose
Experiment tracking	MLflow	Model versioning, metrics, artifacts
Distributed compute	Ray	Model training, hyperparameter tuning
LLM inference	vLLM	High-throughput LLM serving
Model serving	Triton	GPU-optimized inference server
Feature store	Feast	Feature engineering and serving
Notebooks	JupyterHub	Interactive development environment
Stream processing	Apache Flink	Real-time data transformations
Batch processing	Apache Spark	Large-scale data processing

How to Read This Chapter

For architects and tech leads, start with Design Philosophy to understand the reasoning behind structural choices, then proceed to Service Topology for the interaction map, and Data Stores for storage architecture.

For backend developers, begin with Control Plane or Data Plane depending on which services you work with, then read Multi-Tenancy Architecture to understand context propagation and API Design for REST conventions.

For platform engineers, focus on API Gateway, Event-Driven Architecture, Data Stores, and Request Lifecycle to understand the infrastructure layer.

For anyone evaluating the platform, the Architecture Decision Records section provides the rationale behind every major technical decision, and Design Philosophy explains the trade-offs.

Related Chapters

Security and Multi-Tenancy -- Authentication, authorization, and tenant isolation in depth
Kubernetes and Helm -- Cluster topology, Helm chart structure, deployment patterns
Observability -- Monitoring, distributed tracing, and alerting architecture
CI/CD and Build System -- Pipeline stages and deployment workflow
AI Service -- Multi-agent orchestrator deep dive
Tenant Lifecycle -- Complete provisioning workflow

Getting Started Guide Design Philosophy