MATIH Platform is in active MVP development. Documentation reflects current implementation status.
13. ML Service & MLOps
Experiment Tracking
Comparing Runs

Comparing Runs

The ML Service provides multi-run comparison capabilities for identifying the best model configuration across training iterations. Comparisons can be performed across up to 10 runs simultaneously with metric-level analysis.


Compare Runs API

POST /api/v1/experiments/runs/compare
Content-Type: application/json
X-Tenant-ID: acme-corp
 
{
  "run_ids": [
    "run-001-xgboost-baseline",
    "run-002-xgboost-tuned",
    "run-003-lightgbm-baseline"
  ],
  "metrics": ["val_accuracy", "val_f1", "val_loss"]
}

Response

{
  "runs": [
    {
      "run_id": "run-001-xgboost-baseline",
      "name": "xgboost-baseline",
      "status": "finished",
      "params": {"learning_rate": "0.01", "max_depth": "6"},
      "metrics": {"val_accuracy": 0.876, "val_f1": 0.851, "val_loss": 0.342}
    },
    {
      "run_id": "run-002-xgboost-tuned",
      "name": "xgboost-tuned",
      "status": "finished",
      "params": {"learning_rate": "0.005", "max_depth": "8"},
      "metrics": {"val_accuracy": 0.912, "val_f1": 0.893, "val_loss": 0.287}
    }
  ],
  "comparison": {
    "best_run": null,
    "metric_comparison": {
      "val_loss": {
        "best_run_id": "run-002-xgboost-tuned",
        "best_value": 0.287,
        "all_values": {
          "run-001-xgboost-baseline": 0.342,
          "run-002-xgboost-tuned": 0.287,
          "run-003-lightgbm-baseline": 0.315
        }
      },
      "val_accuracy": {
        "best_run_id": "run-002-xgboost-tuned",
        "best_value": 0.912,
        "all_values": { "...": "..." }
      }
    }
  }
}

Comparison Logic

The comparison engine identifies the best run for each metric. For loss metrics, lower values are preferred (the API selects the minimum):

if values:
    best = min(values, key=lambda x: x[1] if x[1] is not None else float("inf"))
    comparison["metric_comparison"][metric] = {
        "best_run_id": best[0],
        "best_value": best[1],
        "all_values": dict(values),
    }

Searching Across Experiments

The ExperimentTracker SDK supports cross-experiment run search with MLflow filter expressions:

runs = tracker.search_runs(
    experiment_names=["fraud-detection-v3", "fraud-detection-v4"],
    filter_string="metrics.val_accuracy > 0.9 AND params.model_type = 'xgboost'",
    max_results=50,
    order_by=["metrics.val_accuracy DESC"],
    tenant_id="acme-corp",
)

This translates to MLflow search queries with automatic tenant filtering via the matih.tenant_id tag.


Constraints

ConstraintValue
Minimum runs to compare2
Maximum runs to compare10
Metric filteringOptional (empty returns all metrics)

Source Files

FilePath
Compare Endpointdata-plane/ml-service/src/api/experiments.py
Search Runsdata-plane/ml-service/src/tracking/experiment_tracker.py