MATIH Platform is in active MVP development. Documentation reflects current implementation status.
1. Introduction
ML & AI Infrastructure

ML and AI Infrastructure

The MATIH Platform provides a complete ML/AI infrastructure stack covering experiment tracking, distributed training, model serving, LLM inference, feature engineering, and vector search. This section documents each ML/AI technology, its role, and how it integrates with the platform.


ML/AI Stack Overview

TechnologyCategoryPurpose
MLflowExperiment trackingModel versioning, metrics logging, artifact storage
RayDistributed computeModel training, hyperparameter tuning, serving
vLLMLLM servingHigh-throughput large language model inference
TritonModel servingGPU-optimized inference server
LangChainLLM orchestrationLLM integration, chain composition
LangGraphAgent orchestrationMulti-agent state machine for the AI Service
QdrantVector databaseRAG embeddings, semantic similarity search
LanceDBVector databaseLightweight vector search for development
FeastFeature storeFeature engineering and serving
JupyterHubNotebooksInteractive development environment

MLflow

MLflow provides the experiment tracking and model registry backbone:

FeatureDetails
Experiment trackingLog parameters, metrics, and artifacts per training run
Model registryVersion, stage, and promote models (Staging, Production, Archived)
Artifact storageModel files stored in MinIO (S3-compatible)
Run comparisonSide-by-side metric comparison across experiments
REST APIProgrammatic access for automation

Ray

Ray provides the distributed compute layer for ML workloads:

ComponentPurpose
Ray TrainDistributed training across GPU/CPU nodes
Ray TuneHyperparameter optimization with scheduling algorithms
Ray ServeModel serving with request batching and auto-scaling
Ray DataDistributed data preprocessing pipelines

Ray runs as a Kubernetes-native cluster using the KubeRay operator, scaling dynamically based on workload demand.


vLLM

vLLM provides high-throughput LLM inference:

FeatureDetails
PagedAttentionEfficient memory management for long sequences
Continuous batchingDynamic request batching for throughput optimization
Model supportLlama, Mistral, GPT-family, and other open models
API compatibilityOpenAI-compatible REST API

LangChain and LangGraph

The AI Service uses LangChain and LangGraph for its multi-agent orchestrator:

TechnologyPurpose
LangChainLLM provider abstraction, prompt templates, RAG pipelines
LangGraphState machine orchestration for multi-agent workflows

The agent orchestrator manages four specialized agents: RouterAgent, SQLAgent, AnalysisAgent, and VizAgent.


Vector Databases

Two vector database options serve different deployment scenarios:

TechnologyUse CaseFeatures
QdrantProduction vector searchDistributed, HNSW indexing, filtered search, REST/gRPC API
LanceDBDevelopment and testingEmbedded, no server required, lightweight

Vector stores power the RAG (Retrieval-Augmented Generation) pipeline, storing embeddings of schema metadata, query examples, and business terminology to improve text-to-SQL accuracy.


Feast

Feast manages the feature engineering and serving lifecycle:

FeatureDetails
Feature definitionsDeclarative feature specifications in Python
Online storeLow-latency feature retrieval for real-time inference
Offline storeHistorical feature access for training dataset creation
Point-in-time joinsTemporally correct feature values at training time

Related Pages