MATIH Platform is in active MVP development. Documentation reflects current implementation status.
13. ML Service & MLOps
Experiment Tracking
MLflow Integration

MLflow Integration

The MATIH ML Service uses MLflow as the backend for experiment tracking and model registry. The ExperimentTracker class provides a high-level wrapper that adds tenant isolation, structured logging, and integration with the broader MATIH ecosystem.


Configuration

Environment VariableDefaultDescription
MLFLOW_TRACKING_URIhttp://localhost:5000MLflow tracking server endpoint
MLFLOW_ARTIFACT_ROOTs3://matih-mlflow-artifactsDefault artifact storage location
MLFLOW_REGISTRY_URISame as tracking URIModel registry endpoint

ExperimentTracker Initialization

from src.tracking.experiment_tracker import ExperimentTracker
 
tracker = ExperimentTracker(
    tracking_uri="http://mlflow:5000",
    default_artifact_root="s3://matih-mlflow-artifacts",
    registry_uri=None,  # Defaults to tracking_uri
)

Under the hood, the tracker configures the global MLflow client:

mlflow.set_tracking_uri(self.tracking_uri)
if self.registry_uri != self.tracking_uri:
    mlflow.set_registry_uri(self.registry_uri)
self._client = MlflowClient(tracking_uri=self.tracking_uri)

Context Manager Pattern

The recommended pattern uses start_run as a context manager, which automatically handles run lifecycle:

with tracker.start_run(
    experiment_name="my-experiment",
    run_config=RunConfig(
        run_name="training-run-1",
        tenant_id="acme-corp",
        user_id="alice@acme.com",
        job_id="job-123",
    ),
) as run:
    run.log_params({"learning_rate": 0.01, "epochs": 100})
 
    for epoch in range(100):
        loss = train_epoch()
        run.log_metric("loss", loss, step=epoch)

The context manager creates the MLflow run with appropriate tags including matih.tenant_id, matih.user_id, and matih.job_id.


ActiveRun Wrapper

The ActiveRun class wraps the native MLflow run with fluent methods:

class ActiveRun:
    @property
    def run_id(self) -> str: ...
    @property
    def experiment_id(self) -> str: ...
    @property
    def artifact_uri(self) -> str: ...
 
    def log_params(self, params) -> "ActiveRun": ...
    def log_metrics(self, metrics, step=None) -> "ActiveRun": ...
    def log_artifact(self, local_path, artifact_path=None) -> "ActiveRun": ...
    def log_dict(self, dictionary, artifact_file) -> "ActiveRun": ...
    def set_tag(self, key, value) -> "ActiveRun": ...
    def get_elapsed_time(self) -> float: ...

All methods return self for chaining:

run.log_params(params).log_metrics(metrics).set_tag("status", "complete")

Singleton Access

A global singleton is available for shared access:

from src.tracking.experiment_tracker import get_experiment_tracker
 
tracker = get_experiment_tracker()

MLflow Compatibility

The MATIH experiment tracking API is compatible with MLflow's standard interfaces, enabling migration from existing MLflow setups:

MLflow ConceptMATIH Equivalent
ExperimentExperiment (with tenant prefix)
RunRun (with tenant/user/job tags)
ParametersParameters (string-converted)
MetricsMetrics (with step tracking)
ArtifactsArtifacts (S3 storage)
Model RegistryModel lifecycle manager
TagsTags (with matih.* namespace)

Source Files

FilePath
ExperimentTrackerdata-plane/ml-service/src/tracking/experiment_tracker.py
MLflow Deployment Servicedata-plane/ml-service/src/tracking/mlflow_deployment_service.py
Model Registry Clientdata-plane/ml-service/src/tracking/model_registry_client.py