Google Kubernetes Engine (GKE)

Google GKE is a fully supported deployment target for MATIH. The cluster uses GKE native networking, Workload Identity Federation for pod-level GCP access, and integration with GCP Secret Manager.

Cluster Configuration

GKE clusters are provisioned through the Terraform module at infrastructure/terraform/modules/gcp/gke/:

resource "google_container_cluster" "primary" {
  name     = "matih-${var.environment}"
  location = var.region
 
  release_channel {
    channel = "REGULAR"
  }
 
  network    = var.network_id
  subnetwork = var.subnetwork_id
 
  ip_allocation_policy {
    cluster_secondary_range_name  = "pods"
    services_secondary_range_name = "services"
  }
 
  workload_identity_config {
    workload_pool = "${var.project_id}.svc.id.goog"
  }
 
  private_cluster_config {
    enable_private_nodes    = true
    enable_private_endpoint = false
    master_ipv4_cidr_block  = "172.16.0.0/28"
  }
 
  addons_config {
    gce_persistent_disk_csi_driver_config {
      enabled = true
    }
    network_policy_config {
      disabled = false
    }
  }
}

Node Pools

GKE uses dedicated node pools with autoscaling:

Node Pool	Machine Type	Min/Max	Purpose	Taint
system	e2-standard-4	3/3	System components	None
ctrlplane	e2-standard-4	2/5	Control plane services	`matih.ai/control-plane=true:NoSchedule`
dataplane	e2-standard-8	2/10	Data plane services	`matih.ai/data-plane=true:NoSchedule`
compute	e2-highmem-16	2/10	Trino, Spark workers	`matih.ai/compute=true:NoSchedule`
aicompute	e2-standard-8	1/8	AI/ML workloads	`matih.ai/ai-compute=true:NoSchedule`
gpu	a2-highgpu-1g	0/4	GPU inference (A100)	`nvidia.com/gpu=true:NoSchedule`

Workload Identity Federation

GKE Workload Identity Federation maps Kubernetes service accounts to GCP service accounts:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: external-secrets
  namespace: external-secrets
  annotations:
    iam.gke.io/gcp-service-account: "external-secrets@matih-project.iam.gserviceaccount.com"

Service	GCP Service Account	Purpose
external-secrets	external-secrets@project	Secret Manager access
cert-manager	cert-manager@project	Cloud DNS validation
ai-service	ai-service@project	Vertex AI inference
data-plane-agent	data-agent@project	GCS data lake access

Networking

GKE uses VPC-native networking with alias IP ranges:

Setting	Value
Network mode	VPC-native
Pod CIDR	Secondary range "pods"
Service CIDR	Secondary range "services"
Network policy	Calico (GKE add-on)
Private cluster	Enabled
Master authorized networks	Configured per environment

Storage Classes

GKE provides PD-backed storage classes:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ssd
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-ssd
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

Storage Class	Disk Type	Use Case
ssd	pd-ssd	Default for databases, stateful workloads
balanced	pd-balanced	General purpose
standard	pd-standard	Non-critical, archival storage

Artifact Registry

For GKE deployments, images are stored in Google Artifact Registry:

global:
  imageRegistry: us-central1-docker.pkg.dev/matih-project/matih
  imagePullSecrets:
    - name: gar-secret

AWS EKS Node Pools