MATIH Platform is in active MVP development. Documentation reflects current implementation status.
13. ML Service & MLOps
Service Architecture

ML Service Architecture

The ML Service is a Python/FastAPI application with 50+ API routers covering the full machine learning lifecycle. It follows a modular architecture where each subsystem (training, serving, features, monitoring) is independently deployable and configurable. This section examines the module organization, framework integrations, deployment topology, and configuration management.


Module Organization

src/
  main.py                     # FastAPI entry point
  main/                       # Application bootstrap
  api/                        # API router definitions
  auth/                       # Authentication middleware
  middleware/                  # Request middleware
  models/                     # Data models
    ensemble.py               # Ensemble configuration
    model_metadata.py         # Model metadata
    prediction.py             # Prediction request/response
  training/                   # Training subsystem
    distributed_trainer.py    # Ray Train integration
    distributed_workflow_service.py  # Training workflows
    deepspeed_fsdp_trainer.py # DeepSpeed/FSDP training
    hyperparameter_tuner.py   # Ray Tune integration
    gpu_manager.py            # GPU resource management
    gpu_tracker.py            # GPU utilization tracking
    checkpoint_manager.py     # Checkpoint persistence
    enhanced_checkpoint_service.py  # Advanced checkpointing
    job_manager.py            # Training job lifecycle
    job_scheduler.py          # Job scheduling
    job_monitoring_service.py # Job health monitoring
    metrics_collector.py      # Training metrics collection
    cost_calculator.py        # Training cost estimation
    validation_pipeline.py    # Model validation
  serving/                    # Inference subsystem
    prediction_service.py     # Online predictions
    model_loader.py           # Model loading/caching
    ray_serve.py              # Ray Serve deployments
    advanced_ray_serve.py     # Advanced serving features
    triton_inference_service.py  # Triton integration
  registry/                   # Model registry
    model_registry.py         # MLflow integration
  features/                   # Feature store subsystem
    feature_store.py          # Feast core integration
    unified_feature_store.py  # Unified feature API
    feast_online_store.py     # Online feature serving
    feast_offline_store.py    # Offline feature retrieval
    feast_registry_service.py # Feature registry
    feature_group_service.py  # Feature group management
    feature_serving.py        # Feature serving endpoints
    feature_versioning_service.py  # Feature versioning
    feature_materialization_service.py  # Materialization
    streaming_feature_service.py  # Streaming features
    iceberg_offline_store.py  # Iceberg-backed offline store
    aerospike_online_store.py # Aerospike-backed online store
    embedding_feature_service.py  # Embedding features
    agentic_feature_interface.py  # Agent-accessible features
    registry_state_machine.py # Feature lifecycle FSM
  monitoring/                 # Model monitoring
    drift_detection_service.py    # Data/concept drift
    model_monitoring_service.py   # Performance monitoring
    performance_monitoring_service.py  # Detailed perf tracking
    retraining_trigger_service.py # Automated retraining
  testing/                    # Model testing
    model_testing_service.py  # A/B testing, canary
  ray_air/                    # Ray AIR integration
    orchestrator.py           # Ray AIR orchestration
    ray_data_service.py       # Ray Data integration
    ray_serve_service.py      # Ray Serve management
  ray_cluster/                # Ray cluster management
  active_learning/            # Active learning
  automl/                     # AutoML pipelines
  batch/                      # Batch prediction
  caching/                    # Model/feature caching
  compliance/                 # Model compliance
  compression/                # Model compression
  cost/                       # Cost tracking
  datasets/                   # Dataset management
  debugging/                  # Model debugging
  embeddings/                 # Embedding generation
  explainability/             # Model explainability
  fairness/                   # Fairness evaluation
  governance/                 # Model governance
  inference/                  # Inference optimization
  labeling/                   # Data labeling
  lifecycle/                  # Model lifecycle
  observability/              # Metrics/tracing
  pipeline/                   # ML pipeline orchestration
  pipelines/                  # Pre-built pipelines
  reproducibility/            # Reproducibility tools
  safety/                     # Model safety
  scheduler/                  # Job scheduling
  shadow/                     # Shadow deployments
  storage/                    # Artifact storage
  templates/                  # ML templates
  tracking/                   # Experiment tracking
  validation/                 # Data validation
  versioning/                 # Model versioning
  vllm/                       # vLLM optimization

Framework Integration Map

The ML Service integrates with multiple external ML frameworks:

ML Service (FastAPI)
    |
    +-- Ray AIR ---------> Ray Cluster
    |   +-- Ray Train         (distributed training)
    |   +-- Ray Tune          (hyperparameter tuning)
    |   +-- Ray Serve         (model serving)
    |   +-- Ray Data          (data processing)
    |
    +-- MLflow ----------> MLflow Server
    |   +-- Tracking          (experiment tracking)
    |   +-- Registry          (model registry)
    |   +-- Artifacts         (model artifact storage)
    |
    +-- Feast ------------> Feature Store
    |   +-- Online Store      (Redis/Aerospike)
    |   +-- Offline Store     (Iceberg/S3)
    |   +-- Registry          (feature definitions)
    |
    +-- ONNX Runtime -----> (in-process inference)
    |
    +-- Triton ----------> Triton Inference Server
    |
    +-- scikit-learn -----> (training/inference)
    |
    +-- PyTorch ----------> (training via Ray)
    |
    +-- TensorFlow -------> (training via Ray)

API Router Architecture

The 50+ API routers are organized by domain:

Router GroupEndpointsKey Operations
Models/api/v1/modelsCRUD, versioning, metadata
Ensembles/api/v1/ensemblesCreate, configure, predict
Features/api/v1/featuresFeature groups, serving, materialization
Predictions/api/v1/predictionsSingle, batch, streaming
Training/api/v1/trainingJobs, status, metrics
Tuning/api/v1/tuningHyperparameter search
Deployment/api/v1/deploymentsDeploy, scale, rollback
Experiments/api/v1/experimentsTracking, comparison
Feature Store/api/v1/feature-storeFeast operations
Monitoring/api/v1/monitoringDrift, performance, alerts
Performance/api/v1/performanceBenchmarks, profiling
Drift/api/v1/driftDetection, analysis
Retraining/api/v1/retrainingTriggers, scheduling
vLLM/api/v1/vllmLLM optimization
Reproducibility/api/v1/reproducibilityExperiment reproducibility
Compliance/api/v1/complianceModel cards, auditing
Labeling/api/v1/labelingData labeling workflows
Caching/api/v1/cachingCache management

Configuration

class MLServiceSettings(BaseSettings):
    """ML Service configuration."""
 
    # Service
    service_name: str = "ml-service"
    service_port: int = 8000
 
    # Database
    database_url: str = "postgresql+asyncpg://..."
 
    # MLflow
    mlflow_tracking_uri: str = "http://localhost:5000"
    mlflow_artifact_root: str = "s3://mlflow-artifacts"
 
    # Ray
    ray_address: str = "ray://localhost:10001"
    ray_namespace: str = "matih-ml"
    ray_dashboard_port: int = 8265
 
    # Feast
    feast_repo_path: str = "/opt/feast/repo"
    feast_online_store_type: str = "redis"
    feast_offline_store_type: str = "file"
 
    # Object Storage
    s3_endpoint: str = "http://localhost:9000"
    s3_bucket: str = "ml-artifacts"
 
    # GPU
    gpu_enabled: bool = False
    max_gpu_per_job: int = 4
    gpu_memory_fraction: float = 0.9
 
    # Monitoring
    drift_check_interval_minutes: int = 60
    performance_alert_threshold: float = 0.1
 
    class Config:
        env_file = ".env"
        env_prefix = "MATIH_ML_"

Multi-Tenancy

All ML Service operations are tenant-scoped:

ResourceIsolation Mechanism
ModelsStored with tenant_id in metadata
ExperimentsMLflow experiments prefixed with tenant ID
FeaturesFeast feature views scoped by tenant
Training jobsRay namespaces per tenant
ArtifactsS3 paths prefixed with tenant/{tenant_id}/
PredictionsRequest validation ensures tenant access

Deployment

# Kubernetes deployment
replicaCount: 2
 
resources:
  requests:
    cpu: 500m
    memory: 1Gi
  limits:
    cpu: 2000m
    memory: 4Gi
 
# GPU nodes for training
nodeSelector:
  gpu: "true"
 
tolerations:
  - key: "nvidia.com/gpu"
    operator: "Exists"
    effect: "NoSchedule"

Health Checks

EndpointPurpose
GET /healthLiveness check
GET /health/readyReadiness (DB, MLflow, Ray connectivity)
GET /health/rayRay cluster health
GET /health/mlflowMLflow server health