Ray

Ray provides distributed computing for ML training, hyperparameter tuning, and model serving through Ray Serve.

Architecture

+------------------+     +------------------+
| Ray Head Node    |     | Ray Workers      |
| Port: 6379, 8265 |---->| (Autoscaled)     |
+------------------+     +------------------+

Configuration

config:
  ray:
    headAddress: "ray-head:6379"
    dashboardUrl: "http://ray-head:8265"

Components

Component	Port	Purpose
Head Node	6379	Cluster coordination (GCS)
Dashboard	8265	Web UI for job monitoring
Client	10001	Ray client connection
Serve	8000	Model serving endpoint

Autoscaling

Ray workers autoscale based on pending task queue depth. The Ray autoscaler communicates with the Kubernetes cluster autoscaler to provision new GPU nodes when needed.

GPU Support

Ray workers can request GPU resources for training:

worker:
  resources:
    limits:
      nvidia.com/gpu: 1
  nodeSelector:
    nvidia.com/gpu.present: "true"
  tolerations:
    - key: "nvidia.com/gpu"
      operator: "Exists"
      effect: "NoSchedule"

Feast vLLM