MATIH Platform is in active MVP development. Documentation reflects current implementation status.
17. Kubernetes & Helm
ML Infrastructure
Triton

Triton Inference Server

NVIDIA Triton Inference Server provides multi-framework model serving with support for TensorFlow, PyTorch, ONNX, and custom backends.


Architecture

Triton loads models from a shared model repository (S3/MinIO) and serves them via HTTP/gRPC:

+------------------+     +------------------+
| Triton Server    |<----| Model Repository |
| HTTP: 8000       |     | (S3 / MinIO)     |
| gRPC: 8001       |     +------------------+
| Metrics: 8002    |
+------------------+

Supported Backends

BackendModel FormatUse Case
TensorRTOptimized ONNX/TFLow-latency inference
PyTorchTorchScriptPyTorch models
TensorFlowSavedModelTensorFlow models
ONNX RuntimeONNXCross-framework models
PythonCustom scriptsPreprocessing/postprocessing

GPU Configuration

resources:
  limits:
    nvidia.com/gpu: 1
    cpu: "8"
    memory: "16Gi"