Stage 11: Compute Engines
Stage 11 deploys the distributed compute engines used by the data plane: Spark Operator, Spark Thrift Server, Flink Operator, Trino, and ClickHouse. These engines provide the execution backend for pipeline processing, ad-hoc queries, and real-time analytics.
Source file: scripts/stages/11-compute-engines.sh
Components Deployed
| Component | Chart | Purpose |
|---|---|---|
| Spark Operator | spark-operator/spark-operator | Manages SparkApplication CRDs on Kubernetes |
| Spark Thrift Server | Custom manifest | JDBC/ODBC interface for Spark SQL |
| Flink Operator | flink-operator/flink-kubernetes-operator | Manages FlinkDeployment CRDs |
| Trino | trino/trino | Federated SQL query engine |
| ClickHouse | Custom chart | OLAP columnar database |
Spark Operator Deployment
The stage handles webhook conflicts during upgrades by cleaning up existing webhook configurations:
# Delete webhook configurations to avoid admission enforcer conflicts
kubectl delete mutatingwebhookconfiguration spark-operator-webhook
kubectl delete validatingwebhookconfiguration spark-operator-webhook
# Deploy with webhook enabled
helm upgrade --install spark-operator spark-operator/spark-operator \
--namespace matih-data-plane \
--set webhook.enable=true \
--set sparkJobNamespace=matih-data-planeThe stage also checks if the operator is already deployed and running before attempting an upgrade, avoiding unnecessary downtime.
Spark Thrift Server
Provides a JDBC/ODBC interface for tools like OpenMetadata to query Spark SQL:
infrastructure/k8s/spark/spark-thrift-server.yamlFlink Operator
Manages Flink clusters and jobs as Kubernetes custom resources:
helm upgrade --install flink-operator flink-operator/flink-kubernetes-operator \
--namespace matih-data-plane \
--values infrastructure/helm/platform/flink/values-dev.yamlValues Files
| Engine | Base Values | Dev Override |
|---|---|---|
| Spark | infrastructure/helm/platform/spark/values.yaml | values-dev.yaml |
| Flink | infrastructure/helm/platform/flink/values.yaml | values-dev.yaml |
| Trino | infrastructure/helm/trino/values.yaml | values-dev.yaml |
Dependencies
- Requires:
06-ingress-controller - Required by:
14-ml-infrastructure,15-ai-infrastructure
Dependency Verification
kubectl get pods -n matih-data-plane -l app.kubernetes.io/name=spark-operator
kubectl get pods -n matih-data-plane -l app=flink-kubernetes-operator
kubectl get pods -n matih-data-plane -l app=trino