Chapter 2: Architecture Deep Dive
The MATIH Enterprise Platform is built on a two-plane architecture that cleanly separates platform management concerns from tenant workload execution. This chapter provides a comprehensive examination of every architectural layer, from high-level design philosophy down to individual service responsibilities, inter-service communication patterns, data store selection, and the decision rationale behind each structural choice.
Learning Objectives
- Understand the two-plane architecture and why platform management is separated from tenant workloads
- Map the complete service topology across 24 microservices and their dependencies
- Trace request flows from browser through gateway, backend services, data stores, and back
- Evaluate multi-tenancy isolation strategies at network, database, application, and event layers
- Navigate the event-driven architecture including Kafka topology, CDC, and WebSocket patterns
- Understand data store selection criteria across PostgreSQL, Redis, Kafka, Trino, and vector stores
Details
- Microservices architecture fundamentals
- Kubernetes namespace and networking concepts
- Event-driven architecture patterns
- SQL and distributed query engines
- Ch. 3: Security and Multi-Tenancy
- Ch. 17: Kubernetes and Helm
- Ch. 19: Observability
- Ch. 18: CI/CD and Build System
What This Chapter Covers
This chapter is organized into ten sections, each addressing a distinct architectural dimension of the platform. The sections are designed to be read sequentially for a complete understanding, but each section is also self-contained for reference.
| Section | Focus Area | Pages |
|---|---|---|
| Design Philosophy | Core principles, trade-offs, constraints, and the reasoning behind key architectural choices | 1 |
| Control Plane | All 10 Java/Spring Boot 3.2 services that manage platform operations, with internal architecture details | 7 |
| Data Plane | All 14 polyglot services (Java, Python, Node.js) that execute tenant workloads | 7 |
| Service Topology | Service discovery, dependency graphs, communication patterns, and failure propagation | 4 |
| Multi-Tenancy Architecture | Namespace isolation, TenantContext propagation, per-tenant databases, network policies | 6 |
| Event-Driven Architecture | Kafka event streaming, Redis Pub/Sub, WebSocket, CDC, and event schemas | 6 |
| API Gateway | Kong 3.5.0 gateway, custom Lua plugins, routing and rate limiting | 1 |
| Request Lifecycle and Data Flow | End-to-end request tracing from browser to database and back across five key flows | 6 |
| Data Stores | PostgreSQL, Redis, Kafka, Trino, Qdrant, Neo4j, MinIO, ClickHouse architecture | 9 |
| API Design | REST conventions, error handling, authentication patterns, rate limiting | 5 |
| Architecture Decision Records | ADRs documenting the rationale for major architectural decisions | 1 |
Architecture at a Glance
The MATIH platform consists of 24 microservices distributed across 7 Kubernetes namespaces, communicating through a combination of synchronous REST APIs and asynchronous event streams. The platform processes natural language questions through a multi-agent AI pipeline that generates SQL, executes queries via Trino, and renders visualizations -- the "Intent to Insights" workflow.
+------------------+
| Browser / CLI |
+--------+---------+
|
+--------v---------+
| Kong API Gateway |
| (Port 8080) |
+--------+---------+
|
+--------------+--------------+
| |
+--------v---------+ +---------v--------+
| Control Plane | | Data Plane |
| (10 services) | | (14 services) |
| matih-control- | | matih-data- |
| plane namespace | | plane namespace |
+--------+---------+ +---------+--------+
| |
+--------v---------+ +---------v--------+
| PostgreSQL, Redis | | PostgreSQL, Redis|
| Kafka, ES | | Trino, Kafka |
+-------------------+ | Qdrant, Neo4j |
| ClickHouse, MinIO|
+------------------+Key Numbers
| Metric | Value |
|---|---|
| Total microservices | 24 |
| Control Plane services | 10 (all Java/Spring Boot 3.2) |
| Data Plane services | 14 (Java, Python, Node.js) |
| Frontend applications | 8 (React/Vite) |
| Kubernetes namespaces | 7 |
| Helm charts | 55+ |
| Kafka topics | 20+ event categories |
| Commons libraries | 4 (Java, Python, TypeScript, AI) |
| Data stores | 9 distinct technologies |
Two-Plane Architecture
The platform is divided into two distinct operational planes, each with its own deployment model, scaling strategy, and failure domain.
Control Plane
The Control Plane manages platform-level concerns that are shared across all tenants. It handles identity and access management, tenant provisioning, configuration distribution, billing, auditing, and infrastructure orchestration. All 10 Control Plane services are built with Java 21 and Spring Boot 3.2, deployed in the matih-control-plane namespace.
The Control Plane is tenant-aware but not tenant-specific -- it operates on metadata about tenants rather than on tenant data itself. When the Control Plane writes to its database, it stores tenant configuration, user profiles, and billing records -- never customer business data.
Data Plane
The Data Plane executes tenant-specific workloads including query execution, AI/ML inference, data pipeline orchestration, dashboard rendering, and data governance. Its 14 services span three technology stacks: Java/Spring Boot for data-intensive services, Python/FastAPI for AI/ML workloads, and Node.js for rendering. Data Plane services are deployed into per-tenant namespaces, providing namespace-level isolation.
The Data Plane processes, transforms, and analyzes actual customer data. Every query, every AI conversation, every ML training job runs within the tenant's Data Plane namespace, isolated from other tenants by Kubernetes NetworkPolicies and ResourceQuotas.
Namespace Organization
The platform uses seven Kubernetes namespaces to enforce logical and security boundaries:
| Namespace | Purpose | Services |
|---|---|---|
matih-system | Core platform infrastructure | Operators, CRDs, shared controllers, Strimzi, cert-manager |
matih-control-plane | Platform management services | All 10 Control Plane services |
matih-data-plane | Default tenant workload services | All 14 Data Plane services (per-tenant namespaces in production) |
matih-observability | Monitoring and tracing | Prometheus, Grafana, Tempo, Loki |
matih-monitoring-control-plane | Control Plane monitoring | Service-specific monitors and alerts |
matih-monitoring-data-plane | Data Plane monitoring | Service-specific monitors and alerts |
matih-frontend | Frontend applications | React workbench applications |
In production, each tenant receives a dedicated namespace: matih-data-plane-{tenant-slug}. This namespace contains the tenant's Data Plane services, secrets, and resource quotas.
Commons Libraries
Shared functionality is extracted into four commons libraries that enforce consistency across service boundaries:
| Library | Language | Key Modules |
|---|---|---|
commons-java | Java | Security (JWT, RBAC), multi-tenancy (TenantContext), persistence, caching, observability, event streaming |
commons-python | Python | Authentication middleware, tenant context, structured logging, health checks |
commons-typescript | TypeScript | API client utilities, authentication hooks, shared UI components |
commons-ai | Python | LLM abstractions, prompt management, RAG utilities, agent framework |
The commons-java library alone provides over 100 classes spanning API versioning, billing context, cache management, CDN integration, circuit breakers, database optimization, event streaming, exception handling, Kafka messaging, observability, and security -- all designed with multi-tenancy as a first-class concern.
Technology Stack Summary
Backend Technologies
| Layer | Technology | Version |
|---|---|---|
| Control Plane | Java 21 + Spring Boot 3.2 | LTS |
| Data Plane (Java services) | Java 21 + Spring Boot 3.2 | LTS |
| Data Plane (AI/ML services) | Python 3.11 + FastAPI | Latest |
| Data Plane (Rendering) | Node.js 20 + Express | LTS |
Data Infrastructure
| Component | Technology | Purpose |
|---|---|---|
| Primary database | PostgreSQL 16 | Transactional data, metadata, tenant schemas |
| Caching and sessions | Redis 7 | Session store, pub/sub, rate limiting |
| Event streaming | Kafka (Strimzi) | Asynchronous communication, event sourcing |
| Federated SQL | Trino | Distributed query execution across data sources |
| Full-text search | Elasticsearch 8.11 | Audit log search, ontology search |
| OLAP analytics | ClickHouse / StarRocks | Fast analytical queries on large datasets |
| Vector embeddings | Qdrant / LanceDB | RAG embeddings, semantic search |
| Knowledge graphs | Neo4j / Dgraph | Context graphs, data lineage |
| Object storage | MinIO | S3-compatible artifact storage |
ML/AI Infrastructure
| Component | Technology | Purpose |
|---|---|---|
| Experiment tracking | MLflow | Model versioning, metrics, artifacts |
| Distributed compute | Ray | Model training, hyperparameter tuning |
| LLM inference | vLLM | High-throughput LLM serving |
| Model serving | Triton | GPU-optimized inference server |
| Feature store | Feast | Feature engineering and serving |
| Notebooks | JupyterHub | Interactive development environment |
| Stream processing | Apache Flink | Real-time data transformations |
| Batch processing | Apache Spark | Large-scale data processing |
How to Read This Chapter
For architects and tech leads, start with Design Philosophy to understand the reasoning behind structural choices, then proceed to Service Topology for the interaction map, and Data Stores for storage architecture.
For backend developers, begin with Control Plane or Data Plane depending on which services you work with, then read Multi-Tenancy Architecture to understand context propagation and API Design for REST conventions.
For platform engineers, focus on API Gateway, Event-Driven Architecture, Data Stores, and Request Lifecycle to understand the infrastructure layer.
For anyone evaluating the platform, the Architecture Decision Records section provides the rationale behind every major technical decision, and Design Philosophy explains the trade-offs.
Related Chapters
- Security and Multi-Tenancy -- Authentication, authorization, and tenant isolation in depth
- Kubernetes and Helm -- Cluster topology, Helm chart structure, deployment patterns
- Observability -- Monitoring, distributed tracing, and alerting architecture
- CI/CD and Build System -- Pipeline stages and deployment workflow
- AI Service -- Multi-agent orchestrator deep dive
- Tenant Lifecycle -- Complete provisioning workflow