ML Infrastructure Overview

MATIH deploys a comprehensive ML infrastructure stack for experiment tracking, feature management, distributed training, and model serving. These components integrate with the ML Service and AI Service to provide end-to-end ML lifecycle management.

Component Summary

Component	Purpose	GPU Required
MLflow	Experiment tracking, model registry, artifact store	No
Feast	Feature store (online + offline)	No
Ray	Distributed training, hyperparameter tuning	Optional
vLLM	High-throughput LLM inference server	Yes
Triton	Multi-framework model serving	Yes
JupyterHub	Interactive notebook environment	Optional

Section Contents

Page	Description
MLflow	Experiment tracking and model registry
Feast	Feature store with online and offline stores
Ray	Distributed computing cluster
vLLM	LLM inference server
Triton	Multi-framework inference server
JupyterHub	Notebook environment

Neo4j MLflow