MATIH Platform is in active MVP development. Documentation reflects current implementation status.
12. AI Service
ML Integration
Model Training

Model Training

The Model Training integration enables users to submit, monitor, and evaluate training jobs through the AI Service. Training requests are forwarded to the ML Service which executes them on Ray AIR with distributed compute capabilities. The AI Service provides the conversational interface, progress tracking, and result visualization.


Training Workflow

  1. Configuration: User specifies algorithm, dataset, target, and hyperparameters
  2. Validation: AI Service validates the configuration against the schema context
  3. Submission: Training job is submitted to the ML Service
  4. Monitoring: Progress is tracked via polling or WebSocket streaming
  5. Evaluation: Results are returned with metrics, feature importance, and model artifacts

Supported Algorithms

AlgorithmTaskFramework
XGBoostClassification, RegressionXGBoost + Ray
LightGBMClassification, RegressionLightGBM + Ray
Random ForestClassification, Regressionscikit-learn + Ray
Linear/LogisticRegression, Classificationscikit-learn + Ray
Neural NetworkClassification, Regression, NLPPyTorch + Ray Train
ProphetTime Series ForecastingProphet
ARIMATime Series Forecastingstatsmodels

Training Configuration

{
  "name": "churn-predictor-v2",
  "algorithm": "xgboost",
  "task_type": "classification",
  "dataset": {
    "source": "sql",
    "query": "SELECT * FROM ml_features.customer_churn",
    "tenant_id": "acme-corp"
  },
  "target_column": "churned",
  "feature_columns": ["tenure", "monthly_charges", "total_charges"],
  "split": {
    "strategy": "stratified",
    "test_size": 0.2,
    "random_state": 42
  },
  "hyperparameters": {
    "n_estimators": 100,
    "max_depth": 6,
    "learning_rate": 0.1
  }
}

Training API

Submit Training Job

POST /api/v1/ml/train

Get Training Status

GET /api/v1/ml/train/:job_id

Cancel Training Job

DELETE /api/v1/ml/train/:job_id

List Training Jobs

GET /api/v1/ml/train?tenant_id=acme-corp&status=running

Training Results

Completed training jobs return comprehensive metrics:

{
  "job_id": "train-abc123",
  "status": "completed",
  "metrics": {
    "accuracy": 0.94,
    "precision": 0.91,
    "recall": 0.88,
    "f1_score": 0.895,
    "auc_roc": 0.96,
    "confusion_matrix": [[450, 30], [25, 95]]
  },
  "feature_importance": {
    "tenure": 0.35,
    "monthly_charges": 0.28,
    "total_charges": 0.22,
    "contract_type": 0.15
  },
  "model_artifact_id": "model-xyz789",
  "training_duration_seconds": 842,
  "data_summary": {
    "total_samples": 5000,
    "train_samples": 4000,
    "test_samples": 1000
  }
}

Resource Management

Training jobs are resource-bounded per tenant:

ResourceDefault LimitConfigurable
Max concurrent jobs5Yes
Max training duration4 hoursYes
Max dataset size10 GBYes
CPU per job4 coresYes
Memory per job16 GBYes
GPU per job0 (CPU only)Yes