MATIH Platform is in active MVP development. Documentation reflects current implementation status.
11. Pipelines & Data Engineering
Architecture

Pipeline Service Architecture

The Pipeline Service sub-pages provide detailed documentation for each ingestion pattern, connector type, and data integration capability supported by the MATIH pipeline framework. Each pattern is backed by a dedicated operator in the Pipeline Service codebase.


Sub-Pages

PageDescription
Batch IngestionScheduled bulk data extraction from databases and file systems
Stream IngestionReal-time data ingestion from Kafka and event streams
File IngestionCloud storage file processing from S3, GCS, and Azure Blob
Change Data CaptureCDC pipelines using Flink CDC and Debezium connectors
Database ReplicationFull and incremental database replication patterns
API ConnectorsREST API data extraction with pagination and auth
Schema RegistrySchema evolution, compatibility, and validation
Data VirtualizationFederated queries across multiple data sources via Trino
Event SourcingEvent-driven pipelines using Kafka and state stores

Source Code

All pipeline operators reside under data-plane/pipeline-service/src/matih_pipeline/operators/.

OperatorFile
DatabaseExtractOperatoroperators/database_extract.py
ApiExtractOperatoroperators/api_extract.py
CloudStorageExtractOperatoroperators/cloud_storage_extract.py
KafkaExtractOperatoroperators/kafka_extract.py
SqlTransformOperatoroperators/sql_transform.py
DbtTransformOperatoroperators/dbt_transform.py
DeltaLoadOperatoroperators/delta_load.py
ClickHouseLoadOperatoroperators/clickhouse_load.py