Service Build and Deploy
For rapid iteration during development and targeted hotfixes in production, MATIH provides scripts to build and deploy individual services without running the full CD pipeline. The two primary scripts -- service-build-deploy.sh and full-service-rebuild.sh -- handle the complete workflow from source code to running Kubernetes pod.
Single Service Deployment
service-build-deploy.sh
./scripts/tools/service-build-deploy.sh <service-name> [options]This script handles the complete lifecycle for a single service:
- Build the Docker image from source
- Tag the image with version and git SHA
- Push the image to ACR
- Deploy via Helm upgrade to the correct namespace
- Validate the deployment is healthy
Usage Examples
# Build and deploy ai-service
./scripts/tools/service-build-deploy.sh ai-service
# Build and deploy with a specific tag
./scripts/tools/service-build-deploy.sh ai-service --tag 1.0.0-hotfix1
# Build only (no deploy)
./scripts/tools/service-build-deploy.sh ai-service --build-only
# Deploy only (skip build, use existing image)
./scripts/tools/service-build-deploy.sh ai-service --deploy-only --tag 1.0.0-abc1234
# Deploy with dev values
./scripts/tools/service-build-deploy.sh ai-service --environment devScript Options
| Option | Description | Default |
|---|---|---|
--tag <tag> | Override image tag | <version>-<git-sha> |
--build-only | Build and push, do not deploy | false |
--deploy-only | Deploy existing image, do not build | false |
--environment <env> | Target environment (dev/staging/prod) | dev |
--namespace <ns> | Override target namespace | Auto-detected |
--no-wait | Do not wait for rollout completion | false |
--dry-run | Show commands without executing | false |
Execution Flow
service-build-deploy.sh ai-service
|
+-- 1. Detect service type (java/python/node)
| Source: scripts/config/components.yaml
|
+-- 2. Determine namespace
| ai-service -> matih-data-plane
|
+-- 3. Build Docker image
| docker build -t matihlabsacr.azurecr.io/matih/ai-service:1.0.0-abc1234
| -f data-plane/ai-service/Dockerfile .
|
+-- 4. Push to ACR
| docker push matihlabsacr.azurecr.io/matih/ai-service:1.0.0-abc1234
|
+-- 5. Helm upgrade
| helm upgrade --install ai-service
| infrastructure/helm/ai-service
| -f infrastructure/helm/ai-service/values.yaml
| -f infrastructure/helm/ai-service/values-dev.yaml
| --set image.tag=1.0.0-abc1234
| --namespace matih-data-plane
| --timeout 5m --wait
|
+-- 6. Validate rollout
kubectl rollout status deployment/ai-service
-n matih-data-plane --timeout=300sService-to-Namespace Mapping
The script automatically determines the correct namespace based on the service name:
| Service | Namespace | Chart Location |
|---|---|---|
| iam-service | matih-control-plane | infrastructure/helm/iam-service |
| tenant-service | matih-control-plane | infrastructure/helm/tenant-service |
| config-service | matih-control-plane | infrastructure/helm/config-service |
| audit-service | matih-control-plane | infrastructure/helm/audit-service |
| notification-service | matih-control-plane | infrastructure/helm/notification-service |
| ai-service | matih-data-plane | infrastructure/helm/ai-service |
| bi-service | matih-data-plane | infrastructure/helm/bi-service |
| ml-service | matih-data-plane | infrastructure/helm/ml-service |
| query-engine | matih-data-plane | infrastructure/helm/query-engine |
| catalog-service | matih-data-plane | infrastructure/helm/catalog-service |
| pipeline-service | matih-data-plane | infrastructure/helm/pipeline-service |
| semantic-layer | matih-data-plane | infrastructure/helm/semantic-layer |
| render-service | matih-data-plane | infrastructure/helm/render-service |
| data-quality-service | matih-data-plane | infrastructure/helm/data-quality-service |
| bi-workbench | matih-frontend | infrastructure/helm/frontend |
| ml-workbench | matih-frontend | infrastructure/helm/frontend |
Full Service Rebuild
full-service-rebuild.sh
For situations where a complete rebuild from base images is needed:
./scripts/tools/full-service-rebuild.sh [options]This script:
- Rebuilds base images (optional)
- Rebuilds all service images without Docker cache
- Pushes all images to ACR
- Deploys all services via umbrella charts
Options
| Option | Description |
|---|---|
--include-base | Also rebuild base images before service images |
--services <list> | Comma-separated list of services to rebuild |
--environment <env> | Target environment |
--parallel | Build services in parallel |
When to Use Full Rebuild
| Scenario | Use |
|---|---|
| Base image security update | full-service-rebuild.sh --include-base |
| Dependency version bump | full-service-rebuild.sh |
| Build cache corruption | full-service-rebuild.sh (uses --no-cache) |
| New environment setup | full-service-rebuild.sh --environment staging |
Deployment Strategies
Rolling Update (Default)
All MATIH services use rolling updates by default:
# Deployment strategy in Helm template
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0This means:
- At most 1 extra pod is created during update
- No pods are terminated until new pods are ready
- Zero-downtime deployment
Blue-Green Deployment (Manual)
For critical services requiring instant rollback:
# Deploy new version alongside existing
helm upgrade --install ai-service-blue \
infrastructure/helm/ai-service \
--set image.tag=2.0.0-new \
--set service.port=8001 \
--namespace matih-data-plane
# Test the blue deployment
curl http://ai-service-blue.matih-data-plane:8001/api/v1/health
# Switch traffic (update ingress/service)
# ...
# Remove old deployment
helm uninstall ai-service-green --namespace matih-data-planeCanary Deployment
For gradual rollout with traffic splitting:
# Deploy canary with reduced replicas
helm upgrade --install ai-service-canary \
infrastructure/helm/ai-service \
--set image.tag=2.0.0-canary \
--set replicaCount=1 \
--set autoscaling.enabled=false \
--namespace matih-data-plane
# Monitor canary metrics
# If healthy, promote to full deployment
# If unhealthy, remove canaryRollback Procedures
Helm Rollback
# View release history
helm history ai-service --namespace matih-data-plane
# Output:
# REVISION STATUS CHART APP VERSION DESCRIPTION
# 1 superseded ai-service-1.0.0 1.0.0 Install complete
# 2 superseded ai-service-1.0.0 1.0.0 Upgrade complete
# 3 deployed ai-service-1.0.0 1.0.0 Upgrade complete
# Rollback to revision 2
helm rollback ai-service 2 --namespace matih-data-plane --timeout 5m
# Verify rollback
kubectl rollout status deployment/ai-service -n matih-data-planeImage Tag Rollback
If you know the previous working image tag:
# Deploy with known-good image tag
./scripts/tools/service-build-deploy.sh ai-service \
--deploy-only \
--tag 1.0.0-previousgoodshaherePost-Deployment Validation
After every deployment, the script validates:
Health Check Sequence
| Step | Check | Pass Criteria | Timeout |
|---|---|---|---|
| 1 | Rollout status | All replicas ready | 300s |
| 2 | Pod readiness | All readinessProbes pass | 60s |
| 3 | HTTP health endpoint | HTTP 200 response | 30s |
| 4 | Dependency connectivity | Database, Redis, Kafka connected | 30s |
# Automated post-deploy validation
echo "Checking deployment status..."
kubectl rollout status deployment/ai-service \
-n matih-data-plane --timeout=300s
echo "Checking pod health..."
kubectl get pods -l app.kubernetes.io/name=ai-service \
-n matih-data-plane -o wide
echo "Checking HTTP health endpoint..."
kubectl exec -n matih-data-plane deploy/ai-service -- \
curl -sf http://localhost:8000/api/v1/healthDevelopment Workflow
Typical Development Cycle
1. Make code changes locally
2. Run local tests: pytest tests/ -v
3. Build and deploy to dev cluster:
./scripts/tools/service-build-deploy.sh ai-service --environment dev
4. Verify in dev:
./scripts/tools/platform-status.sh
5. Open PR for review
6. After merge, CD pipeline deploys to staging
7. After staging validation, CD pipeline deploys to productionFast Iteration Loop
For rapid development iteration:
# Build, push, and deploy (takes ~3-5 minutes for Python)
./scripts/tools/service-build-deploy.sh ai-service
# Check logs while deploying
kubectl logs -f deployment/ai-service -n matih-data-plane
# Quick health check
./scripts/tools/platform-status.shTroubleshooting
Common Deployment Issues
| Issue | Symptom | Resolution |
|---|---|---|
| Build fails | Docker build error | Check Dockerfile; verify base image exists |
| Push fails | ACR authentication error | Re-authenticate: az acr login --name matihlabsacr |
| Helm upgrade fails | "release in failed state" | Run helm rollback then retry |
| Pod CrashLoopBackOff | Container exits immediately | Check pod logs for application errors |
| ImagePullBackOff | Image not found in registry | Verify image tag was pushed successfully |
| Init container fails | Migration error | Check database connectivity and migration SQL |
| Readiness probe fails | Service not starting | Increase startupProbe failureThreshold |
Next Steps
- Next: GitHub Actions
- Previous: Base Docker Images