MATIH Platform is in active MVP development. Documentation reflects current implementation status.
17. Kubernetes & Helm
Kafka / Strimzi

Kafka / Strimzi

MATIH uses Strimzi to deploy and manage Apache Kafka 4.1.1 in KRaft mode (no ZooKeeper). Kafka provides the event streaming backbone for domain events, audit logs, and inter-service communication.


Cluster Configuration

# From infrastructure/k8s/kafka/kafka-cluster.yaml
apiVersion: kafka.strimzi.io/v1
kind: Kafka
metadata:
  name: strimzi-kafka
  namespace: matih-data-plane
  annotations:
    strimzi.io/kraft: "enabled"
    strimzi.io/node-pools: "enabled"
spec:
  kafka:
    version: 4.1.1
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      offsets.topic.replication.factor: 1
      transaction.state.log.replication.factor: 1
      log.retention.hours: 168
      log.segment.bytes: 1073741824

KafkaNodePool

apiVersion: kafka.strimzi.io/v1
kind: KafkaNodePool
metadata:
  name: combined
  namespace: matih-data-plane
spec:
  replicas: 1
  roles:
    - controller
    - broker
  storage:
    type: jbod
    volumes:
      - id: 0
        type: persistent-claim
        size: 10Gi
        deleteClaim: false
        kraftMetadata: shared
  resources:
    requests:
      memory: 512Mi
      cpu: 250m
    limits:
      memory: 1Gi
      cpu: 500m

TLS Enforcement

All MATIH services connect to Kafka on port 9093 (TLS). Network policies block plaintext port 9092:

# Service configuration
kafka:
  bootstrapServers: "strimzi-kafka-kafka-bootstrap.matih-data-plane.svc.cluster.local:9093"
  securityProtocol: SSL
  ssl:
    truststoreSecret: "kafka-cluster-ca-cert"
    truststoreKey: "ca.p12"
    truststorePasswordKey: "ca.password"

AI Service Topics

topics:
  - name: "matih.ai.state-changes"     # Agent state transitions
    partitions: 12
    retentionMs: 2592000000              # 30 days
  - name: "matih.ai.agent-traces"       # Agent execution traces
    partitions: 12
  - name: "matih.ai.evaluations"        # Quality evaluations
    partitions: 6
    retentionMs: 7776000000              # 90 days
  - name: "matih.ai.llm-ops"            # LLM operation metrics
    partitions: 12
    retentionMs: 604800000               # 7 days
  - name: "matih.ai.feedback"           # User feedback
    partitions: 6

Entity Operator

Strimzi deploys topic and user operators for automated topic/user management:

entityOperator:
  topicOperator:
    resources:
      requests: { memory: 256Mi, cpu: 100m }
      limits: { memory: 512Mi, cpu: 250m }
  userOperator:
    resources:
      requests: { memory: 256Mi, cpu: 100m }
      limits: { memory: 512Mi, cpu: 250m }