ML Infrastructure Overview
MATIH deploys a comprehensive ML infrastructure stack for experiment tracking, feature management, distributed training, and model serving. These components integrate with the ML Service and AI Service to provide end-to-end ML lifecycle management.
Component Summary
| Component | Purpose | GPU Required |
|---|---|---|
| MLflow | Experiment tracking, model registry, artifact store | No |
| Feast | Feature store (online + offline) | No |
| Ray | Distributed training, hyperparameter tuning | Optional |
| vLLM | High-throughput LLM inference server | Yes |
| Triton | Multi-framework model serving | Yes |
| JupyterHub | Interactive notebook environment | Optional |
Section Contents
| Page | Description |
|---|---|
| MLflow | Experiment tracking and model registry |
| Feast | Feature store with online and offline stores |
| Ray | Distributed computing cluster |
| vLLM | LLM inference server |
| Triton | Multi-framework inference server |
| JupyterHub | Notebook environment |