MATIH Platform is in active MVP development. Documentation reflects current implementation status.
18. CI/CD & Build System
Stage 14: ML Infrastructure

Stage 14: ML Infrastructure

Stage 14 deploys the machine learning infrastructure stack: Ray (operator and cluster), MLflow for experiment tracking, Feast for feature store, and JupyterHub for notebook environments.

Source file: scripts/stages/14-ml-infrastructure.sh


Components Deployed

ComponentChartPurpose
KubeRay Operatorkuberay/kuberay-operator (bundled in matih-ray)Manages RayCluster CRDs
RayClustermatih-ray subchartDistributed ML training and serving
MLflowmatih-mlflowExperiment tracking, model registry
FeastCustom chartFeature store for online/offline serving
JupyterHubjupyterhub/jupyterhubNotebook environments for data scientists

Ray Deployment

The matih-ray chart bundles the KubeRay operator as a subchart dependency. Legacy standalone operator releases are cleaned up automatically:

# Remove legacy standalone kuberay-operator if exists
if helm status kuberay-operator -n matih-data-plane; then
    helm uninstall kuberay-operator -n matih-data-plane --wait
fi
 
# Deploy bundled chart
helm upgrade --install matih-ray \
    infrastructure/helm/ray \
    --namespace matih-data-plane \
    --values infrastructure/helm/ray/values-dev.yaml

MLflow Configuration

MLflow stores experiment metadata in PostgreSQL and artifacts in MinIO (dev) or cloud object storage (production):

SettingDevProduction
Backend storePostgreSQL via K8s SecretPostgreSQL via K8s Secret
Artifact storeMinIO (s3-compatible)Azure Blob / S3
CredentialssecretKeyRef from dev secretsESO from Key Vault

Libraries Used

LibraryPurpose
core/config.shTerraform output access
k8s/namespace.shNamespace management
helm/repo.shRepository management
helm/deploy.shDeployment functions
k8s/dev-secrets.shDev secrets

Dependencies

  • Requires: 05b-data-plane-infrastructure, 11-compute-engines
  • Required by: 15-ai-infrastructure

Dependency Verification

kubectl get pods -n matih-data-plane -l app.kubernetes.io/name=kuberay-operator
kubectl get raycluster -n matih-data-plane
kubectl get pods -n matih-data-plane -l app=mlflow