MATIH Platform is in active MVP development. Documentation reflects current implementation status.
11. Pipelines & Data Engineering
Schema Registry

Schema Registry

The Schema Registry manages schema definitions, evolution rules, and compatibility validation for data flowing through MATIH pipelines. It ensures that producers and consumers agree on data formats and that schema changes do not break downstream consumers.


Architecture

PropertyValue
BackendPostgreSQL (schema storage)
Format supportAvro, JSON Schema, Protobuf
IntegrationKafka (serialization/deserialization), Pipeline Service (validation)
Compatibility modesBACKWARD, FORWARD, FULL, NONE

Compatibility Modes

ModeDescriptionAllowed Changes
BACKWARDNew schema can read old dataAdd optional fields, widen types
FORWARDOld schema can read new dataRemove optional fields, narrow types
FULLBoth backward and forward compatibleAdd/remove optional fields only
NONENo compatibility checkingAny change allowed

Schema Registration

POST /v1/schemas/:subject/versions

Request:
{
  "schema": "{\"type\":\"record\",\"name\":\"Transaction\",...}",
  "schemaType": "AVRO",
  "compatibility": "BACKWARD"
}

Response:
{
  "id": 42,
  "subject": "transactions-value",
  "version": 3,
  "schemaType": "AVRO",
  "compatible": true
}

Schema Validation in Pipelines

Pipeline definitions reference schemas for validation at extraction and load time:

sources:
  events:
    type: kafka
    topic: matih.events
    schema_registry: ${SCHEMA_REGISTRY_URL}
    schema_subject: events-value
    schema_version: latest
 
quality_checks:
  - name: schema_conformance
    type: schema_check
    schema_subject: events-value
    schema_version: 3
    severity: critical

Schema Evolution Rules

Adding a Field (BACKWARD Compatible)

{
  "type": "record",
  "name": "Transaction",
  "fields": [
    {"name": "id", "type": "string"},
    {"name": "amount", "type": "double"},
    {"name": "currency", "type": "string", "default": "USD"}
  ]
}

The currency field has a default value, so old data without this field can still be read.

Removing a Field (FORWARD Compatible)

Removing a field with a default value is forward-compatible because old consumers can use the default when the field is missing.


Kafka Integration

The Schema Registry integrates with Kafka via serializers and deserializers:

ComponentPurpose
AvroSerializerEncodes Kafka messages using registered Avro schemas
AvroDeserializerDecodes Kafka messages using schema ID embedded in the message
JsonSchemaValidatorValidates JSON messages against registered JSON Schema

Schema Subjects

Subjects follow the naming convention {topic}-{key|value}:

SubjectSchema TypeDescription
matih.ai.state-changes-valueJSONFSM state transition events
matih.ai.agent-traces-valueJSONAgent execution traces
matih.ai.llm-ops-valueJSONLLM operation metrics

Related Pages