Chapter 18: CI/CD and Build System
The MATIH platform employs a comprehensive CI/CD pipeline spanning 142 scripts and over 61,000 lines of Bash that cover the full lifecycle from source code compilation through container image building, Terraform infrastructure provisioning, Helm chart deployment, and post-deployment validation. This chapter provides an exhaustive reference for every component of the build and deployment system.
What You Will Learn
By the end of this chapter, you will understand:
- The build system including the unified
build.shscript with multi-language support for Java (Maven), Python (pip/poetry), and TypeScript (npm/Vite) - The CD pipeline with its 25-stage architecture from Stage 00 (Terraform) through Stage 18 (Validation), including dependency graphs, rollback capabilities, lock management, and state tracking
- The script library with modular libraries for core utilities, Helm operations, Kubernetes management, Azure cloud operations, and validation
- Developer tooling including single-service build and deploy, platform status checking, database setup, and port validation
- GitHub Actions workflows for automated CI/CD on pull requests and merges
- GitOps patterns with ArgoCD for declarative infrastructure management, tenant application sets, and platform version promotion
- Terraform provisioning across six environments spanning Azure, AWS, and GCP with cloud-specific modules for compute, networking, storage, and AI services
Chapter Structure
| Section | Description | Audience |
|---|---|---|
| Build System | Unified build script, Java/Python/TypeScript builds, Docker images | All developers |
| CD Pipeline | 25-stage deployment pipeline with orchestration and rollback | DevOps engineers, SREs |
| Scripts Library | Modular Bash libraries for config, logging, Helm, K8s, Azure | Platform engineers |
| Tooling | Single-service deploy, platform status, database tools | All developers |
| GitHub Actions | Automated CI/CD workflows and release automation | DevOps engineers |
| GitOps with ArgoCD | ArgoCD setup, tenant application sets, version management | Platform engineers |
| Terraform Provisioning | Multi-cloud IaC across Azure, AWS, and GCP | Platform engineers |
Pipeline Architecture
The MATIH CI/CD system is a multi-layered architecture that separates build, deploy, and validation concerns:
Developer Workflow
==================
git push --> GitHub Actions CI
|
+--> build.sh --test-only (unit tests)
+--> helm lint (chart validation)
+--> terraform validate (IaC checks)
+--> pre-deploy validation
Merge to main --> CD Pipeline (cd-new.sh all dev)
|
+--> Stage 00: Terraform (Azure/AWS/GCP infra)
+--> Stage 01: Build Setup (Docker buildx, schema validation)
+--> Stage 02: Base Images (Java, Python, Node.js base images)
+--> Stage 03: Commons (shared libraries)
+--> Stage 04: Service Images (all service Docker images)
+--> Stage 05a: Control Plane Infrastructure (PostgreSQL, Redis, Kafka)
+--> Stage 05b: Data Plane Infrastructure (PostgreSQL, Redis, Kafka)
+--> Stage 06: Ingress Controller (NGINX)
+--> Stage 07: Control Plane Monitoring (Prometheus, Grafana)
+--> Stage 08: Control Plane Services (IAM, tenant, config, etc.)
+--> Stage 09: Control Plane Frontend (control-plane-ui)
+--> Stage 10: Data Plane Monitoring
+--> Stage 11: Compute Engines (Spark, Flink, Ray, Trino)
+--> Stage 12: Workflow Orchestration (Airflow)
+--> Stage 13: Data Catalogs (OpenMetadata)
+--> Stage 14: ML Infrastructure (KubeRay, MLflow)
+--> Stage 15: AI Infrastructure (vLLM, Ollama)
+--> Stage 16: Data Plane Services (ai-service, ml-service, etc.)
+--> Stage 17: Data Plane Frontend (workbenches)
+--> Stage 18: Validation (health checks, smoke tests)
|
+--> Auto-rollback on failure
+--> Build nodepool cleanupKey Metrics
| Metric | Value |
|---|---|
| Total scripts | 142 |
| Total lines of Bash | 61,185 |
| CD pipeline stages | 25 (00 through 18, with sub-stages) |
| Library modules | 28 (across core, helm, k8s, azure, validate) |
| Terraform environments | 6 (dev, aws-dev, aws-prod, gcp-dev, gcp-prod, azure-matihlabs) |
| Terraform modules | 25+ (across Azure, AWS, GCP) |
| Service definitions | 30+ (Java, Python, Node.js) |
| Connector modules | 8 (PostgreSQL, MySQL, BigQuery, Snowflake, Salesforce, S3, GCS, Azure Blob) |
| Frontend applications | 7 (workbenches and UIs) |
Quick Reference
| Task | Command |
|---|---|
| Build everything | ./scripts/build.sh |
| Build (skip tests) | ./scripts/build.sh --skip-tests |
| Build Java only | ./scripts/build.sh --java |
| Build Python only | ./scripts/build.sh --python |
| Run tests only | ./scripts/build.sh --test-only --with-deps |
| Full CD pipeline | ./scripts/cd-new.sh all dev |
| CD infrastructure only | ./scripts/cd-new.sh infra dev |
| CD services only | ./scripts/cd-new.sh services dev |
| CD single stage | ./scripts/cd-new.sh 04 dev |
| Pipeline status | ./scripts/cd-new.sh status |
| Pipeline history | ./scripts/cd-new.sh history |
| Pipeline dependencies | ./scripts/cd-new.sh deps |
| Build single service | ./scripts/tools/service-build-deploy.sh ai-service |
| Platform status | ./scripts/tools/platform-status.sh |
| Health check | ./scripts/disaster-recovery/health-check.sh |
| Validate ports | ./scripts/tools/validate-ports.sh |
| Rollback release | ./scripts/cd-new.sh rollback ai-service matih-data-plane dev |
| Dry run | DRY_RUN=true ./scripts/cd-new.sh all dev |
Environment Variables
The CD pipeline accepts configuration through environment variables:
# Version control
RELEASE_VERSION=1.2.3 # Semantic version for deployment
IMAGE_TAG=sha-abc123 # Docker image tag (default: latest)
# Pipeline behavior
DRY_RUN=true # Preview without executing
ROLLBACK_ON_FAILURE=true # Auto-rollback on stage failure (default: true)
SKIP_DEPENDENCY_CHECK=true # Skip dependency verification
SKIP_AI_INFRA=true # Mark AI infrastructure as optional
SKIP_SCHEMA_VALIDATION=true # Skip schema validation stage
# Build configuration
BUILD_CLEANUP_ENABLED=true # Enable build nodepool cleanup (default: true)
BUILD_CLEANUP_WAIT_TIMEOUT=180 # Timeout for nodepool scale-down
FULL_SCHEMA_VALIDATION=true # Run full Hibernate validation
KUBECTL_TIMEOUT=10 # Timeout for kubectl commands in seconds
# Registry
ACR_NAME=matihacr # Azure Container Registry name
REGISTRY=ghcr.io/matih # Container registry URLNext Steps
Begin with the Build System section to understand how multi-language builds work, then proceed to the CD Pipeline for the full deployment pipeline walkthrough.