MATIH Platform is in active MVP development. Documentation reflects current implementation status.
18. CI/CD & Build System
Overview

Chapter 18: CI/CD and Build System

The MATIH platform employs a comprehensive CI/CD pipeline spanning 142 scripts and over 61,000 lines of Bash that cover the full lifecycle from source code compilation through container image building, Terraform infrastructure provisioning, Helm chart deployment, and post-deployment validation. This chapter provides an exhaustive reference for every component of the build and deployment system.


What You Will Learn

By the end of this chapter, you will understand:

  • The build system including the unified build.sh script with multi-language support for Java (Maven), Python (pip/poetry), and TypeScript (npm/Vite)
  • The CD pipeline with its 25-stage architecture from Stage 00 (Terraform) through Stage 18 (Validation), including dependency graphs, rollback capabilities, lock management, and state tracking
  • The script library with modular libraries for core utilities, Helm operations, Kubernetes management, Azure cloud operations, and validation
  • Developer tooling including single-service build and deploy, platform status checking, database setup, and port validation
  • GitHub Actions workflows for automated CI/CD on pull requests and merges
  • GitOps patterns with ArgoCD for declarative infrastructure management, tenant application sets, and platform version promotion
  • Terraform provisioning across six environments spanning Azure, AWS, and GCP with cloud-specific modules for compute, networking, storage, and AI services

Chapter Structure

SectionDescriptionAudience
Build SystemUnified build script, Java/Python/TypeScript builds, Docker imagesAll developers
CD Pipeline25-stage deployment pipeline with orchestration and rollbackDevOps engineers, SREs
Scripts LibraryModular Bash libraries for config, logging, Helm, K8s, AzurePlatform engineers
ToolingSingle-service deploy, platform status, database toolsAll developers
GitHub ActionsAutomated CI/CD workflows and release automationDevOps engineers
GitOps with ArgoCDArgoCD setup, tenant application sets, version managementPlatform engineers
Terraform ProvisioningMulti-cloud IaC across Azure, AWS, and GCPPlatform engineers

Pipeline Architecture

The MATIH CI/CD system is a multi-layered architecture that separates build, deploy, and validation concerns:

Developer Workflow
==================

  git push --> GitHub Actions CI
               |
               +--> build.sh --test-only (unit tests)
               +--> helm lint (chart validation)
               +--> terraform validate (IaC checks)
               +--> pre-deploy validation

  Merge to main --> CD Pipeline (cd-new.sh all dev)
                    |
                    +--> Stage 00: Terraform (Azure/AWS/GCP infra)
                    +--> Stage 01: Build Setup (Docker buildx, schema validation)
                    +--> Stage 02: Base Images (Java, Python, Node.js base images)
                    +--> Stage 03: Commons (shared libraries)
                    +--> Stage 04: Service Images (all service Docker images)
                    +--> Stage 05a: Control Plane Infrastructure (PostgreSQL, Redis, Kafka)
                    +--> Stage 05b: Data Plane Infrastructure (PostgreSQL, Redis, Kafka)
                    +--> Stage 06: Ingress Controller (NGINX)
                    +--> Stage 07: Control Plane Monitoring (Prometheus, Grafana)
                    +--> Stage 08: Control Plane Services (IAM, tenant, config, etc.)
                    +--> Stage 09: Control Plane Frontend (control-plane-ui)
                    +--> Stage 10: Data Plane Monitoring
                    +--> Stage 11: Compute Engines (Spark, Flink, Ray, Trino)
                    +--> Stage 12: Workflow Orchestration (Airflow)
                    +--> Stage 13: Data Catalogs (OpenMetadata)
                    +--> Stage 14: ML Infrastructure (KubeRay, MLflow)
                    +--> Stage 15: AI Infrastructure (vLLM, Ollama)
                    +--> Stage 16: Data Plane Services (ai-service, ml-service, etc.)
                    +--> Stage 17: Data Plane Frontend (workbenches)
                    +--> Stage 18: Validation (health checks, smoke tests)
                    |
                    +--> Auto-rollback on failure
                    +--> Build nodepool cleanup

Key Metrics

MetricValue
Total scripts142
Total lines of Bash61,185
CD pipeline stages25 (00 through 18, with sub-stages)
Library modules28 (across core, helm, k8s, azure, validate)
Terraform environments6 (dev, aws-dev, aws-prod, gcp-dev, gcp-prod, azure-matihlabs)
Terraform modules25+ (across Azure, AWS, GCP)
Service definitions30+ (Java, Python, Node.js)
Connector modules8 (PostgreSQL, MySQL, BigQuery, Snowflake, Salesforce, S3, GCS, Azure Blob)
Frontend applications7 (workbenches and UIs)

Quick Reference

TaskCommand
Build everything./scripts/build.sh
Build (skip tests)./scripts/build.sh --skip-tests
Build Java only./scripts/build.sh --java
Build Python only./scripts/build.sh --python
Run tests only./scripts/build.sh --test-only --with-deps
Full CD pipeline./scripts/cd-new.sh all dev
CD infrastructure only./scripts/cd-new.sh infra dev
CD services only./scripts/cd-new.sh services dev
CD single stage./scripts/cd-new.sh 04 dev
Pipeline status./scripts/cd-new.sh status
Pipeline history./scripts/cd-new.sh history
Pipeline dependencies./scripts/cd-new.sh deps
Build single service./scripts/tools/service-build-deploy.sh ai-service
Platform status./scripts/tools/platform-status.sh
Health check./scripts/disaster-recovery/health-check.sh
Validate ports./scripts/tools/validate-ports.sh
Rollback release./scripts/cd-new.sh rollback ai-service matih-data-plane dev
Dry runDRY_RUN=true ./scripts/cd-new.sh all dev

Environment Variables

The CD pipeline accepts configuration through environment variables:

# Version control
RELEASE_VERSION=1.2.3          # Semantic version for deployment
IMAGE_TAG=sha-abc123           # Docker image tag (default: latest)
 
# Pipeline behavior
DRY_RUN=true                   # Preview without executing
ROLLBACK_ON_FAILURE=true       # Auto-rollback on stage failure (default: true)
SKIP_DEPENDENCY_CHECK=true     # Skip dependency verification
SKIP_AI_INFRA=true             # Mark AI infrastructure as optional
SKIP_SCHEMA_VALIDATION=true    # Skip schema validation stage
 
# Build configuration
BUILD_CLEANUP_ENABLED=true     # Enable build nodepool cleanup (default: true)
BUILD_CLEANUP_WAIT_TIMEOUT=180 # Timeout for nodepool scale-down
FULL_SCHEMA_VALIDATION=true    # Run full Hibernate validation
KUBECTL_TIMEOUT=10             # Timeout for kubectl commands in seconds
 
# Registry
ACR_NAME=matihacr              # Azure Container Registry name
REGISTRY=ghcr.io/matih         # Container registry URL

Next Steps

Begin with the Build System section to understand how multi-language builds work, then proceed to the CD Pipeline for the full deployment pipeline walkthrough.