Data Infrastructure Overview

MATIH deploys 12+ data infrastructure components in the matih-data-plane namespace, providing SQL engines, streaming, storage, graph databases, and vector search. Each component is configured for production use with persistence, replication, monitoring, and security.

Component Summary

Component	Category	Deployment Type	Persistence
Trino	SQL Engine	Deployment (Coordinator + Workers)	Spill disk
Kafka (Strimzi)	Streaming	Strimzi CRD (KRaft mode)	10Gi per broker
PostgreSQL	RDBMS	StatefulSet (Primary + Replicas)	50Gi
Redis	Cache	StatefulSet (Master + Sentinel)	10Gi
MinIO	Object Storage	StatefulSet	Configurable
ClickHouse	OLAP	StatefulSet	SSD
Flink	Stream Processing	Deployment (JM + TM)	Checkpoints
Spark	Batch/Interactive	Deployment + Spark Connect	None (stateless)
Dgraph	Graph Database	StatefulSet (Alpha + Zero)	10Gi
Qdrant	Vector Database	StatefulSet	SSD
Neo4j	Graph Database	StatefulSet	10Gi

Section Contents

Page	Description
Trino	Federated SQL engine with Polaris Iceberg catalog
Kafka / Strimzi	Event streaming with KRaft mode and TLS
PostgreSQL	Relational database with HA replication
Redis	Caching, sessions, and pub/sub
MinIO	S3-compatible object storage
ClickHouse	OLAP columnar analytics
Flink	Real-time stream processing
Spark	Batch and interactive computing
Dgraph	Graph database for ontologies
Qdrant	Vector database for AI embeddings
Neo4j	Graph database for lineage

Data Infrastructure Trino