Data Infrastructure Overview
MATIH deploys 12+ data infrastructure components in the matih-data-plane namespace, providing SQL engines, streaming, storage, graph databases, and vector search. Each component is configured for production use with persistence, replication, monitoring, and security.
Component Summary
| Component | Category | Deployment Type | Persistence |
|---|---|---|---|
| Trino | SQL Engine | Deployment (Coordinator + Workers) | Spill disk |
| Kafka (Strimzi) | Streaming | Strimzi CRD (KRaft mode) | 10Gi per broker |
| PostgreSQL | RDBMS | StatefulSet (Primary + Replicas) | 50Gi |
| Redis | Cache | StatefulSet (Master + Sentinel) | 10Gi |
| MinIO | Object Storage | StatefulSet | Configurable |
| ClickHouse | OLAP | StatefulSet | SSD |
| Flink | Stream Processing | Deployment (JM + TM) | Checkpoints |
| Spark | Batch/Interactive | Deployment + Spark Connect | None (stateless) |
| Dgraph | Graph Database | StatefulSet (Alpha + Zero) | 10Gi |
| Qdrant | Vector Database | StatefulSet | SSD |
| Neo4j | Graph Database | StatefulSet | 10Gi |
Section Contents
| Page | Description |
|---|---|
| Trino | Federated SQL engine with Polaris Iceberg catalog |
| Kafka / Strimzi | Event streaming with KRaft mode and TLS |
| PostgreSQL | Relational database with HA replication |
| Redis | Caching, sessions, and pub/sub |
| MinIO | S3-compatible object storage |
| ClickHouse | OLAP columnar analytics |
| Flink | Real-time stream processing |
| Spark | Batch and interactive computing |
| Dgraph | Graph database for ontologies |
| Qdrant | Vector database for AI embeddings |
| Neo4j | Graph database for lineage |