Data Plane Architecture
The Data Plane is the execution engine of the MATIH platform. It consists of 14 polyglot services that process tenant-specific workloads -- executing queries, orchestrating AI agents, training ML models, managing data pipelines, rendering visualizations, and enforcing data governance. Unlike the homogeneous Control Plane, the Data Plane deliberately uses a polyglot architecture to match each service's technology to its problem domain.
2.4.1Technology Distribution
| Stack | Count | Services | Rationale |
|---|---|---|---|
| Java / Spring Boot 3.2 | 6 | query-engine, catalog-service, semantic-layer, bi-service, pipeline-service, data-plane-agent | JDBC, Hibernate multi-tenancy, Trino integration |
| Python / FastAPI | 7 | ai-service, ml-service, data-quality-service, ontology-service, governance-service, ops-agent-service, auth-proxy | LangChain, PyTorch, pandas, LLM libraries |
| Node.js / Express | 1 | render-service | Puppeteer/Playwright for chart rendering |
2.4.2Complete Service Registry
| Service | Stack | Port | Database | Key Dependencies |
|---|---|---|---|---|
query-engine | Java | 8080 | query | Trino, PostgreSQL, Redis |
catalog-service | Java | 8086 | catalog | PostgreSQL, OpenMetadata |
semantic-layer | Java | 8086 | semantic | PostgreSQL, Redis, Trino |
bi-service | Java | 8084 | bi | PostgreSQL, Redis, semantic-layer |
pipeline-service | Java | 8092 | pipeline | PostgreSQL, Kafka, Temporal |
data-plane-agent | Java | 8085 | none | Redis, Kafka |
ai-service | Python | 8000 | ai | PostgreSQL, Redis, Qdrant, vLLM, Dgraph |
ml-service | Python | 8000 | ml | PostgreSQL, Redis, Ray, MLflow |
data-quality-service | Python | 8000 | quality | PostgreSQL, Trino |
ontology-service | Python | 8101 | ontology | PostgreSQL, Elasticsearch |
governance-service | Python | 8080 | governance | PostgreSQL, OpenMetadata, Polaris |
ops-agent-service | Python | 8080 | ops_agent | PostgreSQL, Redis, Kafka, Prometheus, ChromaDB |
auth-proxy | Python | 5000 | none | IAM service |
render-service | Node.js | 8098 | none | Redis |
2.4.3Resource Allocation
| Service | CPU Request | CPU Limit | Memory Request | Memory Limit |
|---|---|---|---|---|
| query-engine | 200m | 1000m | 512Mi | 1Gi |
| catalog-service | 100m | 500m | 256Mi | 512Mi |
| semantic-layer | 200m | 1000m | 512Mi | 1Gi |
| bi-service | 100m | 500m | 256Mi | 512Mi |
| ai-service | 200m | 1000m | 512Mi | 2Gi |
| ml-service | 200m | 1000m | 512Mi | 2Gi |
| pipeline-service | 100m | 500m | 256Mi | 512Mi |
| data-quality-service | 100m | 500m | 256Mi | 512Mi |
| data-plane-agent | 100m | 500m | 256Mi | 512Mi |
| render-service | 100m | 500m | 256Mi | 512Mi |
| ontology-service | 100m | 500m | 256Mi | 512Mi |
| governance-service | 100m | 500m | 256Mi | 512Mi |
| ops-agent-service | 500m | 2000m | 1Gi | 4Gi |
The ops-agent-service requires the most resources due to its AI workloads and observability data processing. The ai-service and ml-service also have elevated memory limits for LLM context windows and model loading.
2.4.4Health Check Patterns
Data Plane services use different health check patterns based on their technology stack:
| Stack | Health Path | Framework | Kubernetes Probe |
|---|---|---|---|
| Java / Spring Boot | /api/v1/actuator/health | Spring Boot Actuator | httpGet |
| Python / FastAPI | /health | Custom endpoint | httpGet |
| Node.js | /health | Custom endpoint | httpGet |
Java services provide deep health checks via Spring Actuator that verify database connectivity, Redis connectivity, and Kafka broker availability. Python services implement simpler health endpoints that verify the application is running and can handle requests.
2.4.5Cross-Service Dependencies
The Data Plane services form an analytical pipeline where services chain together:
+-------------+
| ai-service |
+------+------+
|
+----------------+----------------+
| | |
+--------v------+ +-----v-------+ +----v----------+
| query-engine | |semantic-layer| |catalog-service|
+--------+------+ +-----+-------+ +----+----------+
| | |
+--------+-------+ |
| |
+--------v------+ +------v----------+
| Trino | | ontology-service|
+---------------+ +-----------------+The core "Intent to Insights" flow traverses: ai-service --> catalog-service (schema context) + semantic-layer (metric definitions) --> query-engine --> Trino (execution) --> back to ai-service (analysis) --> optionally render-service (visualization).
Sub-Pages
- Query Architecture -- Query engine and Trino integration
- AI Architecture -- Multi-agent orchestrator
- ML Architecture -- MLOps pipeline
- Catalog Architecture -- Metadata and semantic layer
- Pipeline Architecture -- Data pipelines and CDC
- Agent Architecture -- Data plane agent coordination