Object Storage
MinIO provides S3-compatible object storage for the MATIH Platform. It stores ML model artifacts, pipeline outputs, exported reports, dashboard thumbnails, and any file-based data that does not fit in relational or key-value stores.
Role in the Platform
| Aspect | Details |
|---|---|
| Technology | MinIO |
| API compatibility | Amazon S3 API |
| Deployment | Kubernetes StatefulSet via Helm |
| Authentication | Access key / secret key via Kubernetes secrets |
| Multi-tenancy | Per-tenant bucket or prefix-based isolation |
Use Cases
| Use Case | Bucket Pattern | Written By | Read By |
|---|---|---|---|
| ML model artifacts | mlflow-artifacts/{tenant_id}/ | ML Service | ML Service, Ray Serve |
| Pipeline outputs | pipeline-outputs/{tenant_id}/ | Pipeline Service | Data Workbench, BI Service |
| Exported reports | exports/{tenant_id}/ | Render Service | BI Workbench |
| Dashboard thumbnails | thumbnails/{tenant_id}/ | Render Service | BI Workbench |
| Data lake (Iceberg) | lakehouse/{tenant_id}/ | Pipeline Service, Spark | Trino (Iceberg connector) |
| Backup archives | backups/{tenant_id}/ | Backup jobs | Recovery procedures |
Bucket Organization
MinIO Instance
+-- mlflow-artifacts/
| +-- acme-corp/
| | +-- experiment-1/
| | +-- experiment-2/
| +-- globex/
|
+-- pipeline-outputs/
| +-- acme-corp/
| +-- globex/
|
+-- lakehouse/
| +-- acme-corp/
| | +-- orders/
| | +-- customers/
| +-- globex/
|
+-- exports/
+-- acme-corp/
+-- globex/Multi-Tenancy
Tenant isolation is enforced through bucket policies and prefix-based access control:
| Strategy | Implementation |
|---|---|
| Prefix isolation | Objects scoped to {tenant_id}/ prefix within shared buckets |
| Access control | Service accounts with IAM policies restricting to tenant prefix |
| Encryption | Server-side encryption with per-tenant keys (SSE-KMS) |
Configuration
| Parameter | Development | Production |
|---|---|---|
| Storage class | Local PV | Cloud block storage |
| Erasure coding | Disabled | Enabled (EC:4) |
| Replication | 1 drive | 4+ drives across nodes |
| Lifecycle rules | 30-day expiry for temp objects | Configurable per bucket |
Integration with Trino
The Iceberg connector in Trino reads data from MinIO:
Trino --> Iceberg Connector --> MinIO (S3 API)
|
+-- Catalog: iceberg
+-- Schema: sales
+-- Table: orders
+-- Data files: s3://lakehouse/acme-corp/orders/data/*.parquetRelated Pages
- PostgreSQL -- Primary relational store
- Trino -- Query federation using MinIO data
- ML Flow -- ML artifact storage