Scaling Procedures
This runbook covers horizontal and vertical scaling of MATIH services in response to increased load, resource alerts, or tenant growth.
Symptoms
HighCPUUsageorHighMemoryUsagealerts- Elevated request latency
- Request queue depth increasing
- New tenant onboarding requiring additional capacity
Horizontal Scaling (Replicas)
Scale a Service
Horizontal scaling is managed through Helm values. Update the replica count in the appropriate values file and redeploy:
./scripts/tools/service-build-deploy.sh <service-name>Recommended Replica Counts
| Service | Dev | Staging | Production |
|---|---|---|---|
| AI Service | 1 | 2 | 3-5 |
| Query Engine | 1 | 2 | 3-5 |
| API Gateway | 1 | 2 | 3 |
| IAM Service | 1 | 2 | 2 |
| Tenant Service | 1 | 1 | 2 |
Horizontal Pod Autoscaler
For production, configure HPA for automatic scaling:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ai-service-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ai-service
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80Vertical Scaling (Resources)
Increase Resource Limits
Update resource requests and limits in the Helm values:
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 4GiThen redeploy:
./scripts/tools/service-build-deploy.sh <service-name>Resource Guidelines
| Service | CPU Request | CPU Limit | Memory Request | Memory Limit |
|---|---|---|---|---|
| AI Service | 500m | 2000m | 1Gi | 4Gi |
| Query Engine | 250m | 1000m | 512Mi | 2Gi |
| API Gateway | 100m | 500m | 256Mi | 1Gi |
Database Scaling
PostgreSQL
- Vertical: Increase pod resource limits
- Read replicas: Add read replicas for read-heavy workloads
- Connection pooling: Deploy PgBouncer for connection management
Redis
- Vertical: Increase memory limits
- Clustering: Enable Redis Cluster for horizontal scaling
Verification
After scaling:
- Run platform status check
- Verify all new replicas are healthy
- Monitor resource utilization to confirm the scaling resolved the issue
- Check that response latencies have improved
./scripts/tools/platform-status.sh
./scripts/disaster-recovery/health-check.sh