ML Service Architecture

Production - Python/FastAPI - Port 8000 - Ray, MLflow, Feast integration

The ML Service provides machine learning operations (MLOps) capabilities including model training orchestration, experiment tracking, model versioning, feature store integration, and model serving. It bridges the gap between data scientists working in notebooks and production ML deployments.

2.4.C.1MLOps Pipeline

Data Preparation                Training                  Deployment
+------------------+    +-------------------+    +-------------------+
| Feature Store    |    | Ray Cluster       |    | Model Registry    |
| (Feast)          |--->| (distributed      |--->| (MLflow)          |
|                  |    |  training)        |    |                   |
| - Feature defs   |    | - Hyperparameter  |    | - Version control |
| - Point-in-time  |    |   tuning          |    | - Stage promotion |
| - Online serving |    | - Distributed     |    | - Artifact store  |
+------------------+    |   data parallel   |    +--------+----------+
                        +-------------------+             |
                                                          v
                                                +-------------------+
                                                | Model Serving     |
                                                | (Ray Serve /      |
                                                |  Triton)          |
                                                |                   |
                                                | - A/B testing     |
                                                | - Canary deploy   |
                                                | - Auto-scaling    |
                                                +-------------------+

2.4.C.2Infrastructure Integration

Component	Technology	Purpose
Distributed training	Ray Train	Scales training across multiple workers
Hyperparameter tuning	Ray Tune	Bayesian optimization, grid/random search
Experiment tracking	MLflow	Metrics, parameters, artifacts per run
Model registry	MLflow Registry	Model versioning with stage gates
Feature store	Feast	Feature engineering and point-in-time joins
Model serving	Ray Serve / Triton	Low-latency inference endpoints
Artifact storage	MinIO (S3-compatible)	Model files, datasets, checkpoints
GPU scheduling	Kubernetes device plugin	GPU allocation for training and inference

Ray Cluster Configuration

Environment	Head Node	Worker Nodes	GPU Workers
Development	1 (2 CPU, 4Gi)	0	0
Production	1 (4 CPU, 8Gi)	2-8 (auto-scale)	1-4 (on demand)

2.4.C.3Model Lifecycle

Models progress through defined stages:

Development --> Staging --> Production --> Archived
    |              |            |
    |              |            +--> Monitoring (drift detection)
    |              |
    |              +--> Validation (automated testing)
    |
    +--> Experiment tracking (MLflow)

Stage Gates

Transition	Requirements
Development --> Staging	All unit tests pass, metrics meet baseline
Staging --> Production	Integration tests pass, A/B test shows improvement, manual approval
Production --> Archived	New model version promoted, old version traffic drained

2.4.C.4Key APIs

Endpoint	Method	Description
`/api/v1/ml/experiments`	GET/POST	Experiment management
`/api/v1/ml/experiments/{id}/runs`	GET/POST	Training run management
`/api/v1/ml/models`	GET/POST	Model registry
`/api/v1/ml/models/{id}/versions`	GET	Model version history
`/api/v1/ml/models/{id}/deploy`	POST	Deploy model for serving
`/api/v1/ml/models/{id}/promote`	POST	Promote model to next stage
`/api/v1/ml/predict`	POST	Run inference against deployed model
`/api/v1/ml/features`	GET	Feature store catalog
`/api/v1/ml/features/serve`	POST	Get features for inference
`/api/v1/ml/training/submit`	POST	Submit distributed training job

Related Sections

ML Flow -- Training to serving pipeline
Object Storage -- MinIO artifact storage
ML Service Deep Dive -- Complete ML documentation

AI Architecture Catalog Architecture