MATIH Platform is in active MVP development. Documentation reflects current implementation status.
2. Architecture
Kafka Topology

Kafka Topology

Apache Kafka is deployed via the Strimzi Kafka Operator and serves as the primary durable event streaming infrastructure. This section documents the cluster configuration, topic naming conventions, partitioning strategy, and producer/consumer patterns.


Cluster Configuration

ParameterDevelopmentProduction
Brokers13
Replication factor13
Min in-sync replicas12
Default partitions36
Retention period7 days30 days
CompressionSnappySnappy
Max message size1 MB1 MB

Topic Naming Convention

Topics follow the pattern {domain}.{entity}.{action}:

TopicPartitionsPublisherKey Consumers
tenant.lifecycle.events6Tenant ServiceAudit, Billing, Notification
query.execution.events6Query EngineAudit, Billing, Data Quality
ai.agent.events6AI ServiceAudit, Billing
ml.model.events3ML ServiceAudit, Catalog
pipeline.job.events3Pipeline ServiceAudit, Notification
data.quality.events3Data Quality ServiceNotification, Governance
config.change.events3Config ServiceAll services
security.audit.events6IAM ServiceAudit
governance.policy.events3Governance ServiceCatalog, Query Engine
billing.usage.events3Billing ServiceNotification

Partitioning Strategy

All topics use tenant_id as the Kafka message key:

PropertyBenefit
Ordering guaranteeEvents for a single tenant are ordered within a partition
Partition affinitySame tenant always maps to the same partition
Consumer localityA consumer processes all events for its assigned tenants

Producer Configuration

The KafkaEventStreamingService configures producers for exactly-once semantics:

SettingValuePurpose
acksallWait for all replica acknowledgment
retries3Retry on transient failures
enable.idempotencetruePrevent duplicate messages
max.in.flight.requests1Preserve ordering during retries
compression.typesnappyReduce network and storage overhead

Consumer Groups

Consumer GroupServicesTopics
audit-consumersAudit ServiceAll event topics
billing-consumersBilling Servicequery, ai, ml, pipeline, tenant topics
notification-consumersNotification Servicetenant, pipeline, quality, billing topics
data-quality-consumersData Quality Servicequery topics
governance-consumersGovernance Servicequality, catalog topics

Monitoring

MetricAlert Threshold
Consumer lagMore than 1000 messages
Failed eventsMore than 1% of consumed events
Publish latency (p95)More than 500ms
Under-replicated partitionsAny

Related Pages