MATIH Platform is in active MVP development. Documentation reflects current implementation status.
13. ML Service & MLOps
Overview

Chapter 13: ML Service and MLOps

The ML Service provides a comprehensive machine learning operations (MLOps) platform within MATIH, enabling end-to-end model lifecycle management from training through deployment and monitoring. Built on Python and FastAPI with 50+ API routers, it integrates with MLflow for model registry, Ray AIR for distributed training, Feast for feature management, and ONNX Runtime for optimized inference. This chapter covers the ML Service architecture, training pipelines, serving infrastructure, and operational monitoring.


What You Will Learn

By the end of this chapter, you will understand:

  • The ML Service architecture including its 50+ router module organization, dependency graph, and integration with external ML frameworks
  • Training pipelines with distributed training via Ray AIR, hyperparameter tuning, GPU management, and checkpoint management
  • Model registry using MLflow for versioning, experiment tracking, model stage management, and artifact storage
  • Model serving through ONNX Runtime optimization, Ray Serve for online inference, Triton Inference Server integration, and batch prediction
  • Ensemble pipelines supporting voting, stacking, blending, bagging, and boosting methods with parallel inference
  • A/B testing and canary deployments for safe model rollouts with automated traffic splitting and statistical analysis
  • Feature store using Feast for online/offline feature serving, feature versioning, materialization, and streaming features
  • Model monitoring with drift detection, performance tracking, retraining triggers, and automated alerting
  • Distributed training with Ray AIR for data-parallel, model-parallel, and pipeline-parallel training strategies

Chapter Structure

SectionDescriptionAudience
ArchitectureService architecture, module layout, framework integrations, and deployment topologyML engineers, architects
Training PipelinesDistributed training with Ray, hyperparameter tuning, GPU management, and checkpointingML engineers, data scientists
Model RegistryMLflow integration for experiment tracking, model versioning, and artifact managementML engineers, MLOps engineers
Model ServingONNX Runtime, Ray Serve, Triton, batch prediction, and model loadingML engineers, backend developers
Ensemble PipelinesEnsemble methods, configuration, parallel inference, and meta-learningML engineers, data scientists
A/B TestingTraffic splitting, canary deployments, statistical analysis, and safe rolloutML engineers, product managers
Feature StoreFeast integration, online/offline stores, materialization, and streaming featuresData engineers, ML engineers
Model MonitoringDrift detection, performance tracking, retraining triggers, and alertingMLOps engineers
Distributed TrainingRay AIR orchestration, data-parallel and model-parallel strategiesML engineers
API ReferenceComplete REST API documentation for all ML Service endpointsAll developers

ML Service at a Glance

The ML Service is a Python/FastAPI application that provides the machine learning infrastructure layer for the MATIH platform.

                    +--------------------------+
                    |   ML Workbench (UI)      |
                    |   API Consumers          |
                    +-----------+--------------+
                                |
                    +-----------v--------------+
                    |      ML Service          |
                    |     (Port 8000)          |
                    +-----------+--------------+
                                |
        +-----------+-----------+-----------+-----------+
        |           |           |           |           |
  +-----v-----+ +--v------+ +-v--------+ +v--------+ +v---------+
  | Training  | | Model   | | Feature  | | Model   | | Active   |
  | Pipeline  | | Registry| | Store    | | Serving | | Learning |
  +-----+-----+ +--+------+ +--+-------+ +-+-------+ +----------+
        |           |           |           |
  +-----v-----+ +--v------+ +-v--------+ +v---------+
  | Ray AIR   | | MLflow  | | Feast    | | ONNX RT  |
  | Cluster   | | Server  | | Server   | | / Triton |
  +-----------+ +---------+ +----------+ +----------+

Key Numbers

MetricValue
TechnologyPython 3.11+, FastAPI, uvicorn
Service port8000
API routers50+
Training frameworkRay AIR (Ray Train + Ray Tune)
Model registryMLflow
Feature storeFeast
Model formatsscikit-learn, ONNX, PyTorch, TensorFlow
Inference runtimeONNX Runtime, Ray Serve, Triton
Ensemble methodsVoting, stacking, blending, bagging, boosting
MonitoringDrift detection, performance tracking, retraining triggers

Key Source Files

The ML Service implementation is organized under data-plane/ml-service/src/:

PathPurpose
main.pyFastAPI application entry point
training/distributed_trainer.pyRay Train distributed training
training/hyperparameter_tuner.pyRay Tune hyperparameter search
training/gpu_manager.pyGPU resource management
training/checkpoint_manager.pyTraining checkpoint persistence
training/job_manager.pyTraining job lifecycle
training/job_scheduler.pyJob scheduling and queuing
training/validation_pipeline.pyModel validation
registry/model_registry.pyMLflow model registry integration
serving/prediction_service.pyOnline prediction service
serving/model_loader.pyModel loading and caching
serving/ray_serve.pyRay Serve deployment
serving/triton_inference_service.pyTriton Inference Server
models/ensemble.pyEnsemble configuration and methods
models/prediction.pyPrediction request/response models
features/feature_store.pyFeast feature store integration
features/feast_online_store.pyOnline feature serving
features/feast_offline_store.pyOffline feature retrieval
features/streaming_feature_service.pyStreaming feature ingestion
monitoring/drift_detection_service.pyData/concept drift detection
monitoring/model_monitoring_service.pyModel performance monitoring
monitoring/retraining_trigger_service.pyAutomated retraining
testing/model_testing_service.pyA/B testing and canary
ray_air/orchestrator.pyRay AIR orchestration
active_learning/Active learning for label selection
compliance/Model compliance and governance
labeling/Data labeling management

Integration Points

IntegrationProtocolPurpose
MLflowHTTP RESTExperiment tracking, model registry
Ray ClusterRay ClientDistributed training, hyperparameter tuning
FeastgRPC/HTTPFeature serving and materialization
ONNX RuntimeIn-processOptimized model inference
TritongRPCHigh-performance inference server
MinIO/S3S3 APIModel artifact storage
PostgreSQLSQLMetadata storage
RedisRedis protocolFeature caching, model caching
KafkaKafka protocolTraining events, monitoring alerts
PrometheusHTTPMetrics export

Getting Started

# Navigate to service directory
cd data-plane/ml-service
 
# Install dependencies
pip install -r requirements.txt
 
# Set environment variables
export DATABASE_URL=postgresql://matih:password@localhost:5432/matih_ml
export MLFLOW_TRACKING_URI=http://localhost:5000
export RAY_ADDRESS=ray://localhost:10001
export FEAST_REPO_PATH=/path/to/feast/repo
export S3_ENDPOINT=http://localhost:9000
export S3_ACCESS_KEY=minioadmin
export S3_SECRET_KEY=minioadmin
 
# Start the service
uvicorn src.main:app --host 0.0.0.0 --port 8000 --reload