Comparing Runs

The ML Service provides multi-run comparison capabilities for identifying the best model configuration across training iterations. Comparisons can be performed across up to 10 runs simultaneously with metric-level analysis.

Compare Runs API

POST /api/v1/experiments/runs/compare
Content-Type: application/json
X-Tenant-ID: acme-corp
 
{
  "run_ids": [
    "run-001-xgboost-baseline",
    "run-002-xgboost-tuned",
    "run-003-lightgbm-baseline"
  ],
  "metrics": ["val_accuracy", "val_f1", "val_loss"]
}

Response

{
  "runs": [
    {
      "run_id": "run-001-xgboost-baseline",
      "name": "xgboost-baseline",
      "status": "finished",
      "params": {"learning_rate": "0.01", "max_depth": "6"},
      "metrics": {"val_accuracy": 0.876, "val_f1": 0.851, "val_loss": 0.342}
    },
    {
      "run_id": "run-002-xgboost-tuned",
      "name": "xgboost-tuned",
      "status": "finished",
      "params": {"learning_rate": "0.005", "max_depth": "8"},
      "metrics": {"val_accuracy": 0.912, "val_f1": 0.893, "val_loss": 0.287}
    }
  ],
  "comparison": {
    "best_run": null,
    "metric_comparison": {
      "val_loss": {
        "best_run_id": "run-002-xgboost-tuned",
        "best_value": 0.287,
        "all_values": {
          "run-001-xgboost-baseline": 0.342,
          "run-002-xgboost-tuned": 0.287,
          "run-003-lightgbm-baseline": 0.315
        }
      },
      "val_accuracy": {
        "best_run_id": "run-002-xgboost-tuned",
        "best_value": 0.912,
        "all_values": { "...": "..." }
      }
    }
  }
}

Comparison Logic

The comparison engine identifies the best run for each metric. For loss metrics, lower values are preferred (the API selects the minimum):

if values:
    best = min(values, key=lambda x: x[1] if x[1] is not None else float("inf"))
    comparison["metric_comparison"][metric] = {
        "best_run_id": best[0],
        "best_value": best[1],
        "all_values": dict(values),
    }

Searching Across Experiments

The ExperimentTracker SDK supports cross-experiment run search with MLflow filter expressions:

runs = tracker.search_runs(
    experiment_names=["fraud-detection-v3", "fraud-detection-v4"],
    filter_string="metrics.val_accuracy > 0.9 AND params.model_type = 'xgboost'",
    max_results=50,
    order_by=["metrics.val_accuracy DESC"],
    tenant_id="acme-corp",
)

This translates to MLflow search queries with automatic tenant filtering via the matih.tenant_id tag.

Constraints

Constraint	Value
Minimum runs to compare	2
Maximum runs to compare	10
Metric filtering	Optional (empty returns all metrics)

Source Files

File	Path
Compare Endpoint	`data-plane/ml-service/src/api/experiments.py`
Search Runs	`data-plane/ml-service/src/tracking/experiment_tracker.py`

Managing Runs Artifact Management