Cloud Deployment
This guide covers deploying the MATIH Platform to production Kubernetes clusters on Azure AKS, AWS EKS, and GCP GKE. The deployment uses Terraform for infrastructure provisioning and the MATIH CD pipeline for service deployment.
Deployment Architecture
A production MATIH deployment consists of three layers:
| Layer | Components | Provisioning |
|---|---|---|
| Cloud Infrastructure | Kubernetes cluster, databases, key vault, networking | Terraform |
| Platform Services | Helm charts for all MATIH services | CD Pipeline (scripts/cd-new.sh) |
| Tenant Configuration | Per-tenant namespaces, secrets, ingress | Tenant provisioning API |
Terraform (infrastructure/terraform/)
|
|-- Creates: Kubernetes cluster, databases, networking, key vaults
v
CD Pipeline (scripts/cd-new.sh)
|
|-- Deploys: Helm charts for all services (55+ charts)
v
Tenant Provisioning (control-plane/tenant-service)
|
|-- Creates: Namespace, secrets, ingress, DNS per tenant
v
Running PlatformTerraform Modules
MATIH provides Terraform modules for each cloud provider in infrastructure/terraform/modules/:
Azure Modules
| Module | Path | Purpose |
|---|---|---|
| Kubernetes (AKS) | modules/azure/kubernetes/control-plane/ | AKS cluster provisioning |
| Kubernetes (Data Plane) | modules/azure/kubernetes/data-plane/ | Data plane node pools |
| PostgreSQL | modules/azure/postgres/ | Azure Database for PostgreSQL |
| Key Vault | modules/azure/keyvault/ | Secret management |
| Networking | modules/azure/networking/ | VNet, subnets, DNS zones |
| Storage | modules/azure/storage/ | Blob storage for artifacts |
| Monitoring | modules/azure/monitoring/ | Log Analytics, Azure Monitor |
| Cognitive Services | modules/azure/cognitive-services/ | Azure OpenAI |
AWS Modules
| Module | Path | Purpose |
|---|---|---|
| Kubernetes (EKS) | modules/aws/kubernetes/control-plane/ | EKS cluster provisioning |
| RDS | modules/aws/rds/ | PostgreSQL on RDS |
| S3 | modules/aws/s3/ | Object storage |
| Networking | modules/aws/networking/ | VPC, subnets, security groups |
| Bedrock | modules/aws/bedrock/ | AWS Bedrock AI services |
| Governance | modules/aws/governance/ | IAM roles, policies |
GCP Modules
| Module | Path | Purpose |
|---|---|---|
| Kubernetes (GKE) | modules/gcp/kubernetes/control-plane/ | GKE cluster provisioning |
| Cloud SQL | modules/gcp/cloudsql/ | PostgreSQL on Cloud SQL |
| Storage | modules/gcp/storage/ | Cloud Storage buckets |
| Networking | modules/gcp/networking/ | VPC, subnets, firewall rules |
| Vertex AI | modules/gcp/vertex-ai/ | Google AI platform |
| Governance | modules/gcp/governance/ | IAM, organization policies |
Azure AKS Deployment
Step 1: Authenticate
# Login to Azure
az login
# Set subscription
az account set --subscription "your-subscription-id"
# Verify
az account showStep 2: Initialize Terraform
cd infrastructure/terraform/environments/azure-matihlabs
# Initialize Terraform
terraform init
# Review the plan
terraform plan -var-file="terraform.tfvars"Step 3: Provision Infrastructure
# Apply Terraform configuration
terraform apply -var-file="terraform.tfvars"
# This creates:
# - AKS cluster with system and user node pools
# - Azure Database for PostgreSQL Flexible Server
# - Azure Key Vault with secrets
# - Virtual Network with subnets
# - Azure DNS zone (matih.ai)
# - Azure Container Registry
# - Log Analytics workspaceStep 4: Configure kubectl
# Get AKS credentials
az aks get-credentials \
--resource-group matih-rg \
--name matih-aks-cluster
# Verify connection
kubectl get nodesStep 5: Deploy Platform Services
# Run the full CD pipeline
./scripts/cd-new.sh all dev
# Or deploy in stages
./scripts/cd-new.sh infra dev # Infrastructure services first
./scripts/cd-new.sh services dev # Application services secondAWS EKS Deployment
Step 1: Authenticate
# Configure AWS credentials
aws configure
# Enter: Access Key ID, Secret Access Key, Region, Output format
# Verify
aws sts get-caller-identityStep 2: Initialize Terraform
cd infrastructure/terraform/environments/aws-production
terraform init
terraform plan -var-file="terraform.tfvars"Step 3: Provision Infrastructure
terraform apply -var-file="terraform.tfvars"
# This creates:
# - EKS cluster with managed node groups
# - RDS PostgreSQL instances
# - AWS Secrets Manager secrets
# - VPC with public/private subnets
# - S3 buckets for artifacts
# - IAM roles and policiesStep 4: Configure kubectl
# Update kubeconfig for EKS
aws eks update-kubeconfig \
--region us-east-1 \
--name matih-eks-cluster
# Verify
kubectl get nodesStep 5: Deploy Platform Services
./scripts/cd-new.sh all devGCP GKE Deployment
Step 1: Authenticate
# Login to GCP
gcloud auth login
gcloud config set project matih-production
# Verify
gcloud config listStep 2: Initialize Terraform
cd infrastructure/terraform/environments/gcp-production
terraform init
terraform plan -var-file="terraform.tfvars"Step 3: Provision Infrastructure
terraform apply -var-file="terraform.tfvars"
# This creates:
# - GKE Autopilot or Standard cluster
# - Cloud SQL PostgreSQL instances
# - GCP Secret Manager secrets
# - VPC with subnets
# - Cloud Storage buckets
# - IAM service accountsStep 4: Configure kubectl
gcloud container clusters get-credentials matih-gke-cluster \
--region us-central1
kubectl get nodesStep 5: Deploy Platform Services
./scripts/cd-new.sh all devThe CD Pipeline
The MATIH CD pipeline (scripts/cd-new.sh) is a 12-stage automated deployment process:
Pipeline Stages
| Stage | Name | Description |
|---|---|---|
| 00 | Preflight | Verify cluster connectivity, Helm version, namespace existence |
| 01 | Namespaces | Create matih-system, matih-shared, tenant namespaces |
| 02 | Secrets | Deploy secrets (ESO ExternalSecrets or dev-secrets.sh) |
| 03 | Infrastructure | Deploy databases, message brokers, caches |
| 04 | Observability | Deploy Prometheus, Grafana, Loki, Tempo |
| 05 | Control Plane | Deploy IAM, tenant, config, and other CP services |
| 06 | Data Plane | Deploy AI, ML, query engine, and other DP services |
| 07 | Frontend | Deploy workbench applications |
| 08 | Ingress | Configure ingress controllers and TLS certificates |
| 09 | DNS | Configure DNS records (production only) |
| 10 | Verification | Health checks across all services |
| 11 | Notifications | Send deployment notifications |
Running the Pipeline
# Full deployment
./scripts/cd-new.sh all dev
# Infrastructure only (stages 00-04)
./scripts/cd-new.sh infra dev
# Services only (stages 05-07)
./scripts/cd-new.sh services dev
# Check pipeline status
./scripts/cd-new.sh statusPipeline for Different Environments
# Development environment
./scripts/cd-new.sh all dev
# Staging environment
./scripts/cd-new.sh all staging
# Production environment (requires additional confirmation)
./scripts/cd-new.sh all prodHelm Charts
MATIH includes 55+ Helm charts in infrastructure/helm/:
Chart Structure
Each service has its own Helm chart with environment-specific values:
infrastructure/helm/ai-service/
├── Chart.yaml # Chart metadata and dependencies
├── values.yaml # Base default values
├── values-dev.yaml # Development environment overrides
├── values-prod.yaml # Production environment overrides
└── templates/
├── _helpers.tpl # Template helper functions
├── deployment.yaml # Pod deployment
├── service.yaml # Kubernetes service
├── configmap.yaml # Configuration
├── hpa.yaml # Horizontal Pod Autoscaler
└── networkpolicy.yaml # Network isolation rulesKey Chart Values
# values.yaml (base defaults)
replicaCount: 1
image:
repository: matih/ai-service
tag: latest
pullPolicy: IfNotPresent
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2000m
memory: 2Gi
# values-dev.yaml (development overrides)
replicaCount: 1
resources:
requests:
cpu: 250m
memory: 256Mi
# values-prod.yaml (production overrides)
replicaCount: 3
resources:
requests:
cpu: 1000m
memory: 1Gi
limits:
cpu: 4000m
memory: 4GiDNS and Ingress Configuration
Development
In development environments, services are accessed via port-forward or NodePort:
# Port-forward to a service
kubectl port-forward svc/ai-service 8000:8000 -n matih-systemProduction
Production deployments use dedicated ingress controllers with TLS:
| Component | Configuration |
|---|---|
| Ingress controller | NGINX Ingress Controller (per-tenant) |
| TLS certificates | cert-manager with Let's Encrypt (DNS01 challenge) |
| DNS | Azure DNS / Route53 / Cloud DNS managed by Terraform |
| Domain | Per-tenant subdomains (e.g., acme.matih.ai) |
# Validate tenant ingress configuration
./scripts/tools/validate-tenant-ingress.sh --tenant acme-corpPost-Deployment Checklist
After deploying, verify the following:
| Check | Command | Expected |
|---|---|---|
| Nodes ready | kubectl get nodes | All nodes Ready |
| System pods | kubectl get pods -n matih-system | All pods Running |
| Shared infra | kubectl get pods -n matih-shared | All pods Running |
| Platform status | ./scripts/tools/platform-status.sh | All services green |
| Health check | ./scripts/disaster-recovery/health-check.sh | All checks pass |
Next Steps
After deployment, proceed to:
- First-Time Configuration to set up your first tenant and admin user
- Verifying the Deployment for comprehensive validation