MLflow Integration

The MATIH ML Service uses MLflow as the backend for experiment tracking and model registry. The ExperimentTracker class provides a high-level wrapper that adds tenant isolation, structured logging, and integration with the broader MATIH ecosystem.

Configuration

Environment Variable	Default	Description
`MLFLOW_TRACKING_URI`	`http://localhost:5000`	MLflow tracking server endpoint
`MLFLOW_ARTIFACT_ROOT`	`s3://matih-mlflow-artifacts`	Default artifact storage location
`MLFLOW_REGISTRY_URI`	Same as tracking URI	Model registry endpoint

ExperimentTracker Initialization

from src.tracking.experiment_tracker import ExperimentTracker
 
tracker = ExperimentTracker(
    tracking_uri="http://mlflow:5000",
    default_artifact_root="s3://matih-mlflow-artifacts",
    registry_uri=None,  # Defaults to tracking_uri
)

Under the hood, the tracker configures the global MLflow client:

mlflow.set_tracking_uri(self.tracking_uri)
if self.registry_uri != self.tracking_uri:
    mlflow.set_registry_uri(self.registry_uri)
self._client = MlflowClient(tracking_uri=self.tracking_uri)

Context Manager Pattern

The recommended pattern uses start_run as a context manager, which automatically handles run lifecycle:

with tracker.start_run(
    experiment_name="my-experiment",
    run_config=RunConfig(
        run_name="training-run-1",
        tenant_id="acme-corp",
        user_id="alice@acme.com",
        job_id="job-123",
    ),
) as run:
    run.log_params({"learning_rate": 0.01, "epochs": 100})
 
    for epoch in range(100):
        loss = train_epoch()
        run.log_metric("loss", loss, step=epoch)

The context manager creates the MLflow run with appropriate tags including matih.tenant_id, matih.user_id, and matih.job_id.

ActiveRun Wrapper

The ActiveRun class wraps the native MLflow run with fluent methods:

class ActiveRun:
    @property
    def run_id(self) -> str: ...
    @property
    def experiment_id(self) -> str: ...
    @property
    def artifact_uri(self) -> str: ...
 
    def log_params(self, params) -> "ActiveRun": ...
    def log_metrics(self, metrics, step=None) -> "ActiveRun": ...
    def log_artifact(self, local_path, artifact_path=None) -> "ActiveRun": ...
    def log_dict(self, dictionary, artifact_file) -> "ActiveRun": ...
    def set_tag(self, key, value) -> "ActiveRun": ...
    def get_elapsed_time(self) -> float: ...

All methods return self for chaining:

run.log_params(params).log_metrics(metrics).set_tag("status", "complete")

Singleton Access

A global singleton is available for shared access:

from src.tracking.experiment_tracker import get_experiment_tracker
 
tracker = get_experiment_tracker()

MLflow Compatibility

The MATIH experiment tracking API is compatible with MLflow's standard interfaces, enabling migration from existing MLflow setups:

MLflow Concept	MATIH Equivalent
Experiment	Experiment (with tenant prefix)
Run	Run (with tenant/user/job tags)
Parameters	Parameters (string-converted)
Metrics	Metrics (with step tracking)
Artifacts	Artifacts (S3 storage)
Model Registry	Model lifecycle manager
Tags	Tags (with `matih.*` namespace)

Source Files

File	Path
ExperimentTracker	`data-plane/ml-service/src/tracking/experiment_tracker.py`
MLflow Deployment Service	`data-plane/ml-service/src/tracking/mlflow_deployment_service.py`
Model Registry Client	`data-plane/ml-service/src/tracking/model_registry_client.py`

Artifact Management API Reference