MATIH Platform is in active MVP development. Documentation reflects current implementation status.
18. CI/CD & Build System
Stage 11: Compute Engines

Stage 11: Compute Engines

Stage 11 deploys the distributed compute engines used by the data plane: Spark Operator, Spark Thrift Server, Flink Operator, Trino, and ClickHouse. These engines provide the execution backend for pipeline processing, ad-hoc queries, and real-time analytics.

Source file: scripts/stages/11-compute-engines.sh


Components Deployed

ComponentChartPurpose
Spark Operatorspark-operator/spark-operatorManages SparkApplication CRDs on Kubernetes
Spark Thrift ServerCustom manifestJDBC/ODBC interface for Spark SQL
Flink Operatorflink-operator/flink-kubernetes-operatorManages FlinkDeployment CRDs
Trinotrino/trinoFederated SQL query engine
ClickHouseCustom chartOLAP columnar database

Spark Operator Deployment

The stage handles webhook conflicts during upgrades by cleaning up existing webhook configurations:

# Delete webhook configurations to avoid admission enforcer conflicts
kubectl delete mutatingwebhookconfiguration spark-operator-webhook
kubectl delete validatingwebhookconfiguration spark-operator-webhook
 
# Deploy with webhook enabled
helm upgrade --install spark-operator spark-operator/spark-operator \
    --namespace matih-data-plane \
    --set webhook.enable=true \
    --set sparkJobNamespace=matih-data-plane

The stage also checks if the operator is already deployed and running before attempting an upgrade, avoiding unnecessary downtime.


Spark Thrift Server

Provides a JDBC/ODBC interface for tools like OpenMetadata to query Spark SQL:

infrastructure/k8s/spark/spark-thrift-server.yaml

Flink Operator

Manages Flink clusters and jobs as Kubernetes custom resources:

helm upgrade --install flink-operator flink-operator/flink-kubernetes-operator \
    --namespace matih-data-plane \
    --values infrastructure/helm/platform/flink/values-dev.yaml

Values Files

EngineBase ValuesDev Override
Sparkinfrastructure/helm/platform/spark/values.yamlvalues-dev.yaml
Flinkinfrastructure/helm/platform/flink/values.yamlvalues-dev.yaml
Trinoinfrastructure/helm/trino/values.yamlvalues-dev.yaml

Dependencies

  • Requires: 06-ingress-controller
  • Required by: 14-ml-infrastructure, 15-ai-infrastructure

Dependency Verification

kubectl get pods -n matih-data-plane -l app.kubernetes.io/name=spark-operator
kubectl get pods -n matih-data-plane -l app=flink-kubernetes-operator
kubectl get pods -n matih-data-plane -l app=trino