Google Kubernetes Engine (GKE)
Google GKE is a fully supported deployment target for MATIH. The cluster uses GKE native networking, Workload Identity Federation for pod-level GCP access, and integration with GCP Secret Manager.
Cluster Configuration
GKE clusters are provisioned through the Terraform module at infrastructure/terraform/modules/gcp/gke/:
resource "google_container_cluster" "primary" {
name = "matih-${var.environment}"
location = var.region
release_channel {
channel = "REGULAR"
}
network = var.network_id
subnetwork = var.subnetwork_id
ip_allocation_policy {
cluster_secondary_range_name = "pods"
services_secondary_range_name = "services"
}
workload_identity_config {
workload_pool = "${var.project_id}.svc.id.goog"
}
private_cluster_config {
enable_private_nodes = true
enable_private_endpoint = false
master_ipv4_cidr_block = "172.16.0.0/28"
}
addons_config {
gce_persistent_disk_csi_driver_config {
enabled = true
}
network_policy_config {
disabled = false
}
}
}Node Pools
GKE uses dedicated node pools with autoscaling:
| Node Pool | Machine Type | Min/Max | Purpose | Taint |
|---|---|---|---|---|
| system | e2-standard-4 | 3/3 | System components | None |
| ctrlplane | e2-standard-4 | 2/5 | Control plane services | matih.ai/control-plane=true:NoSchedule |
| dataplane | e2-standard-8 | 2/10 | Data plane services | matih.ai/data-plane=true:NoSchedule |
| compute | e2-highmem-16 | 2/10 | Trino, Spark workers | matih.ai/compute=true:NoSchedule |
| aicompute | e2-standard-8 | 1/8 | AI/ML workloads | matih.ai/ai-compute=true:NoSchedule |
| gpu | a2-highgpu-1g | 0/4 | GPU inference (A100) | nvidia.com/gpu=true:NoSchedule |
Workload Identity Federation
GKE Workload Identity Federation maps Kubernetes service accounts to GCP service accounts:
apiVersion: v1
kind: ServiceAccount
metadata:
name: external-secrets
namespace: external-secrets
annotations:
iam.gke.io/gcp-service-account: "external-secrets@matih-project.iam.gserviceaccount.com"| Service | GCP Service Account | Purpose |
|---|---|---|
| external-secrets | external-secrets@project | Secret Manager access |
| cert-manager | cert-manager@project | Cloud DNS validation |
| ai-service | ai-service@project | Vertex AI inference |
| data-plane-agent | data-agent@project | GCS data lake access |
Networking
GKE uses VPC-native networking with alias IP ranges:
| Setting | Value |
|---|---|
| Network mode | VPC-native |
| Pod CIDR | Secondary range "pods" |
| Service CIDR | Secondary range "services" |
| Network policy | Calico (GKE add-on) |
| Private cluster | Enabled |
| Master authorized networks | Configured per environment |
Storage Classes
GKE provides PD-backed storage classes:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ssd
provisioner: pd.csi.storage.gke.io
parameters:
type: pd-ssd
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true| Storage Class | Disk Type | Use Case |
|---|---|---|
| ssd | pd-ssd | Default for databases, stateful workloads |
| balanced | pd-balanced | General purpose |
| standard | pd-standard | Non-critical, archival storage |
Artifact Registry
For GKE deployments, images are stored in Google Artifact Registry:
global:
imageRegistry: us-central1-docker.pkg.dev/matih-project/matih
imagePullSecrets:
- name: gar-secret