Data Stores Overview
The MATIH Platform uses nine distinct data store technologies, each selected for specific performance characteristics, query patterns, and workload requirements. This section documents the architecture of each data store, its role in the platform, multi-tenancy strategy, and operational configuration.
Data Store Inventory
| Technology | Category | Primary Purpose | Services Using It |
|---|---|---|---|
| PostgreSQL 16 | Relational | Transactional data, metadata, tenant schemas | All 24 services |
| Redis 7 | Key-value | Caching, sessions, pub/sub, rate limiting | 16 services |
| Kafka (Strimzi) | Event streaming | Asynchronous messaging, event sourcing | 10 services |
| Trino | Query federation | Distributed SQL across multiple sources | 3 services |
| Vector Stores | Vector DB | RAG embeddings, semantic search | AI Service |
| Graph Stores | Graph DB | Knowledge graphs, data lineage | Context graph services |
| Object Storage | Object store | ML artifacts, exports, file storage | ML, Pipeline, Render |
| OLAP Engines | Columnar | Fast analytical queries | Query Engine, BI Service |
Selection Criteria
Each data store was selected based on these criteria:
| Criterion | Requirement |
|---|---|
| Workload fit | Matches the access pattern (OLTP, OLAP, search, streaming) |
| Kubernetes-native | Deploys on Kubernetes via Helm chart or Operator |
| Multi-tenancy | Supports tenant isolation at some level |
| Operational maturity | Proven in production, active community |
| Cost efficiency | Efficient resource utilization for the workload |
Multi-Tenancy Strategy per Store
| Data Store | Isolation Strategy |
|---|---|
| PostgreSQL | Schema-per-tenant via Hibernate multi-tenancy |
| Redis | Tenant-prefixed key namespacing |
| Kafka | Tenant ID as partition key, tenant-prefixed topics |
| Trino | Per-tenant catalog configuration |
| Qdrant | Tenant ID in metadata filter |
| Neo4j / Dgraph | Tenant-scoped subgraphs |
| MinIO | Per-tenant bucket or prefix |
| ClickHouse | Tenant ID column in all tables |
Deployment Architecture
All data stores are deployed on Kubernetes:
| Data Store | Deployment Method | Namespace |
|---|---|---|
| PostgreSQL | Helm chart (StatefulSet) | matih-system |
| Redis | Helm chart (StatefulSet) | matih-system |
| Kafka | Strimzi Operator | matih-system |
| Elasticsearch | Helm chart (StatefulSet) | matih-system |
| Trino | Helm chart | matih-data-plane |
| ClickHouse | Operator or StatefulSet | matih-data-plane |
| Qdrant | Helm chart | matih-data-plane |
| Neo4j | Helm chart | matih-data-plane |
| MinIO | Helm chart (StatefulSet) | matih-system |
Related Sections
- Service Topology -- Service dependencies on data stores
- Multi-Tenancy -- Tenant isolation across all stores
- Data Infrastructure -- Technology selection rationale