Stage 15: AI Infrastructure
Stage 15 deploys local AI inference components: vLLM for open-source model serving, Triton Inference Server, and the MATIH Copilot service. Cloud AI providers (Azure OpenAI, AWS Bedrock, GCP Vertex AI) are provisioned on-demand by the TenantService, not at deploy time.
Source file: scripts/stages/15-ai-infrastructure.sh
Components Deployed
| Component | Purpose |
|---|---|
| vLLM | Open-source LLM inference server |
| Triton Inference Server | Multi-framework model serving |
| MATIH Copilot | Code and query assistance service |
CPU Mode
In environments without GPU nodes, the stage automatically enables CPU mode:
| Setting | Default | Description |
|---|---|---|
FORCE_CPU_MODE | true | Deploy vLLM/Triton on CPU nodes (slower but functional) |
Image Tag Resolution
The stage reads image tags from build metadata, matching the pattern used by Stage 08:
# Priority: metadata JSON > tag file > IMAGE_TAG env > "latest"
if [[ -f "$METADATA_FILE" ]]; then
IMAGE_TAG=$(jq -r '.imageTag // "latest"' "$METADATA_FILE")
elif [[ -f "$TAG_FILE" ]]; then
IMAGE_TAG=$(cat "$TAG_FILE")
fiCloud AI Provisioning
Cloud AI providers are not deployed in this stage. They are provisioned per-tenant by the TenantService:
| Provider | Provisioned By | When |
|---|---|---|
| Azure OpenAI | TenantService / InfrastructureService | Tenant creation |
| AWS Bedrock | TenantService | Tenant creation |
| GCP Vertex AI | TenantService | Tenant creation |
Libraries Used
| Library | Purpose |
|---|---|
core/config.sh | Terraform output access |
k8s/namespace.sh | Namespace management |
k8s/secrets.sh | Secret management |
helm/deploy.sh | Deployment functions |
azure/aks.sh | AKS node pool operations |
acr/deploy.sh | ACR image operations |
Dependencies
- Requires:
11-compute-engines,14-ml-infrastructure - Required by:
16-data-plane-services
Dependency Verification
kubectl get pods -n matih-data-plane -l app=vllm
kubectl get pods -n matih-data-plane -l app=matih-copilot