MATIH Platform is in active MVP development. Documentation reflects current implementation status.
13. ML Service & MLOps
Inference & Serving
Batch Inference

Batch Inference

The Batch Inference module enables high-throughput model scoring across large datasets using distributed compute via Ray. Batch jobs are submitted through the API, processed in parallel across worker nodes, and results are written to the data warehouse or object store.


Batch Pipeline Architecture

API Request --> Batch Processor --> Ray Data Pipeline --> Model Scoring --> Output Store
                     |                    |
                Job Manager          Data Partitioning
                     |                    |
               Status Tracking       Parallel Workers

Submit Batch Job

POST /api/v1/inference/batch
{
  "model_id": "model-xyz789",
  "input": {
    "source": "sql",
    "query": "SELECT customer_id, tenure, monthly_charges, total_charges FROM customers"
  },
  "output": {
    "destination": "table",
    "table_name": "predictions.churn_scores",
    "mode": "overwrite"
  },
  "config": {
    "batch_size": 1000,
    "parallelism": 4,
    "timeout_minutes": 60
  }
}

Response

{
  "job_id": "batch-abc123",
  "status": "submitted",
  "estimated_duration_minutes": 15,
  "input_row_count": 500000
}

Get Job Status

GET /api/v1/inference/batch/:job_id
{
  "job_id": "batch-abc123",
  "status": "running",
  "progress": {
    "total_rows": 500000,
    "processed_rows": 325000,
    "percentage": 65
  },
  "started_at": "2025-03-15T10:00:00Z",
  "estimated_completion": "2025-03-15T10:12:00Z"
}

Job States

StateDescription
submittedJob accepted and queued
preparingLoading model and partitioning data
runningActively scoring records
completingWriting results to output store
completedJob finished successfully
failedJob encountered an error
cancelledJob was cancelled by user

Output Destinations

DestinationFormatDescription
tableSQL tableWrite predictions to data warehouse table
parquetParquet filesWrite to object store as Parquet
csvCSV filesWrite to object store as CSV
kafkaKafka topicStream predictions to Kafka topic

Data Partitioning

Large datasets are automatically partitioned for parallel processing:

class BatchInferenceEngine:
    async def run(self, job: BatchJob):
        dataset = ray.data.read_sql(job.input_query)
        predictions = dataset.map_batches(
            self._predict_batch,
            batch_size=job.batch_size,
            num_gpus=0,
            num_cpus=1,
        )
        predictions.write_parquet(job.output_path)

Resource Management

SettingDefaultDescription
batch_size1000Records per scoring batch
parallelism4Number of parallel workers
max_memory_gb16Maximum memory per worker
timeout_minutes60Job timeout
retry_count3Retries on transient failures

Scheduling

Batch jobs can be scheduled for recurring execution:

{
  "schedule": {
    "cron": "0 2 * * *",
    "timezone": "UTC",
    "enabled": true
  }
}

Configuration

Environment VariableDefaultDescription
BATCH_MAX_CONCURRENT_JOBS3Max concurrent batch jobs per tenant
BATCH_DEFAULT_PARALLELISM4Default worker parallelism
BATCH_MAX_INPUT_ROWS10000000Maximum input dataset rows
BATCH_RESULT_TTL_HOURS168Result retention (7 days)