MATIH Platform is in active MVP development. Documentation reflects current implementation status.
1. Introduction
Data Infrastructure

Data Infrastructure

The MATIH Platform uses nine distinct data store technologies, each selected for specific workload characteristics. This section documents each data store, its role in the platform, and the services that depend on it.


Data Store Overview

TechnologyVersionCategoryPrimary Purpose
PostgreSQL16RelationalPrimary transactional database for all services
Redis7Key-valueCaching, sessions, pub/sub, rate limiting
Apache KafkaStrimzi 0.38+Event streamingAsynchronous communication, event sourcing
Elasticsearch8.11SearchFull-text search, audit log indexing
MinIOLatestObject storageS3-compatible artifact and file storage
TrinoLatestQuery federationFederated SQL across multiple data sources
ClickHouseLatestOLAPFast analytical queries on large datasets
QdrantLatestVector databaseRAG embeddings, semantic search
Neo4j / Dgraph5.xGraph databaseKnowledge graphs, data lineage

PostgreSQL

PostgreSQL 16 is the primary relational database for every service in the platform.

AspectDetails
Multi-tenancySchema-per-tenant via Hibernate multi-tenancy
Services using itAll 24 services
Connection poolingHikariCP with per-service pool configuration
DeploymentKubernetes StatefulSet via Helm
Backuppg_dump with schema filter for per-tenant backup

Each service has its own database within the PostgreSQL instance, and each tenant has its own schema within the service database.


Redis

Redis 7 provides low-latency caching, session management, and real-time messaging.

AspectDetails
CachingTenant-aware key namespacing via TenantAwareCacheManager
SessionsJWT session storage with configurable TTL
Pub/SubConfiguration change propagation, AI response streaming
Rate limitingPer-tenant API rate limiting counters
Services using it16 services

Apache Kafka

Kafka provides durable event streaming for asynchronous communication between services.

AspectDetails
DeploymentStrimzi Kafka Operator on Kubernetes
Brokers1 (dev) / 3 (production)
Replication1 (dev) / 3 (production) with 2 min in-sync replicas
PartitioningTenant ID as partition key
Retention7 days (dev) / 30 days (production)
CompressionSnappy
Services using it10 services

Elasticsearch

Elasticsearch provides full-text search and log indexing.

AspectDetails
Version8.11
Use casesAudit log search, ontology search, analytics
Services using itAudit Service, Observability API, Ontology Service

MinIO

MinIO provides S3-compatible object storage for artifacts and files.

AspectDetails
Use casesML model artifacts, pipeline outputs, exported reports, dashboard assets
API compatibilityAmazon S3 API
AuthenticationAccess key / secret key via Kubernetes secrets
Services using itML Service, Pipeline Service, Render Service

Selection Rationale

TechnologyWhy Selected Over Alternatives
PostgreSQLMature, excellent Hibernate support, schema-per-tenant multi-tenancy
RedisBest-in-class latency for caching, built-in pub/sub
Kafka (Strimzi)Kubernetes-native operator, durable event streaming, wide ecosystem
ElasticsearchIndustry standard for full-text search and log analytics
MinIOS3-compatible without cloud vendor lock-in, Kubernetes-native

Related Pages