Pipeline Service Architecture
The Pipeline Service sub-pages provide detailed documentation for each ingestion pattern, connector type, and data integration capability supported by the MATIH pipeline framework. Each pattern is backed by a dedicated operator in the Pipeline Service codebase.
Sub-Pages
| Page | Description |
|---|---|
| Batch Ingestion | Scheduled bulk data extraction from databases and file systems |
| Stream Ingestion | Real-time data ingestion from Kafka and event streams |
| File Ingestion | Cloud storage file processing from S3, GCS, and Azure Blob |
| Change Data Capture | CDC pipelines using Flink CDC and Debezium connectors |
| Database Replication | Full and incremental database replication patterns |
| API Connectors | REST API data extraction with pagination and auth |
| Schema Registry | Schema evolution, compatibility, and validation |
| Data Virtualization | Federated queries across multiple data sources via Trino |
| Event Sourcing | Event-driven pipelines using Kafka and state stores |
Source Code
All pipeline operators reside under data-plane/pipeline-service/src/matih_pipeline/operators/.
| Operator | File |
|---|---|
| DatabaseExtractOperator | operators/database_extract.py |
| ApiExtractOperator | operators/api_extract.py |
| CloudStorageExtractOperator | operators/cloud_storage_extract.py |
| KafkaExtractOperator | operators/kafka_extract.py |
| SqlTransformOperator | operators/sql_transform.py |
| DbtTransformOperator | operators/dbt_transform.py |
| DeltaLoadOperator | operators/delta_load.py |
| ClickHouseLoadOperator | operators/clickhouse_load.py |