Stage 15: AI Infrastructure

Stage 15 deploys local AI inference components: vLLM for open-source model serving, Triton Inference Server, and the MATIH Copilot service. Cloud AI providers (Azure OpenAI, AWS Bedrock, GCP Vertex AI) are provisioned on-demand by the TenantService, not at deploy time.

Source file: scripts/stages/15-ai-infrastructure.sh

Components Deployed

Component	Purpose
vLLM	Open-source LLM inference server
Triton Inference Server	Multi-framework model serving
MATIH Copilot	Code and query assistance service

CPU Mode

In environments without GPU nodes, the stage automatically enables CPU mode:

Setting	Default	Description
`FORCE_CPU_MODE`	`true`	Deploy vLLM/Triton on CPU nodes (slower but functional)

Image Tag Resolution

The stage reads image tags from build metadata, matching the pattern used by Stage 08:

# Priority: metadata JSON > tag file > IMAGE_TAG env > "latest"
if [[ -f "$METADATA_FILE" ]]; then
    IMAGE_TAG=$(jq -r '.imageTag // "latest"' "$METADATA_FILE")
elif [[ -f "$TAG_FILE" ]]; then
    IMAGE_TAG=$(cat "$TAG_FILE")
fi

Cloud AI Provisioning

Cloud AI providers are not deployed in this stage. They are provisioned per-tenant by the TenantService:

Provider	Provisioned By	When
Azure OpenAI	TenantService / InfrastructureService	Tenant creation
AWS Bedrock	TenantService	Tenant creation
GCP Vertex AI	TenantService	Tenant creation

Libraries Used

Library	Purpose
`core/config.sh`	Terraform output access
`k8s/namespace.sh`	Namespace management
`k8s/secrets.sh`	Secret management
`helm/deploy.sh`	Deployment functions
`azure/aks.sh`	AKS node pool operations
`acr/deploy.sh`	ACR image operations

Dependencies

Requires: 11-compute-engines, 14-ml-infrastructure
Required by: 16-data-plane-services

Dependency Verification

kubectl get pods -n matih-data-plane -l app=vllm
kubectl get pods -n matih-data-plane -l app=matih-copilot

Stage 14: ML Infrastructure Stage 16: DP Services