MATIH Platform is in active MVP development. Documentation reflects current implementation status.
2. Architecture
Pipeline Flow

Pipeline Flow

The pipeline flow traces the execution of data pipelines from definition through orchestration, execution, and monitoring. Pipelines are managed by the Pipeline Service and orchestrated through Temporal for durable, fault-tolerant workflow execution.


Pipeline Execution Path

Data Workbench / API
  |
  v
Pipeline Service (Port 8092)
  | 1. Validate pipeline definition
  | 2. Apply tenant context
  | 3. Submit workflow to Temporal
  |
  v
Temporal (Workflow Orchestration)
  | 4. Create workflow execution
  | 5. Execute steps sequentially/in parallel
  |
  v
Step Execution
  +-- SQL Transform --> Query Engine --> Trino
  +-- Spark Job --> Spark on Kubernetes
  +-- Flink Job --> Flink Operator
  +-- Python Script --> Job container
  +-- Data Quality Check --> Data Quality Service
  |
  v
Pipeline Service
  | 6. Update pipeline status
  | 7. Publish completion event (Kafka)
  | 8. Trigger notifications
  |
  v
Result (pipeline output)

Pipeline Lifecycle

PhaseStateDescription
DefinitionDRAFTPipeline created in the editor
ValidationVALIDATEDSchema and dependency checks passed
SchedulingSCHEDULEDCron schedule configured
ExecutionRUNNINGTemporal workflow executing steps
Step completionSTEP_COMPLETEDIndividual step finished
SuccessSUCCEEDEDAll steps completed successfully
FailureFAILEDA step failed, retry policy exhausted
RetryRETRYINGAutomatic retry of failed step

Step Types

Step TypeExecution EngineUse Case
SQL transformQuery Engine / TrinoData transformation via SQL
Spark jobSpark on KubernetesLarge-scale batch processing
Flink jobFlink OperatorReal-time stream processing
Python scriptJob containerCustom Python logic
Data quality checkData Quality ServiceAutomated quality validation
NotificationNotification ServiceAlert on step completion/failure

Temporal Integration

Temporal provides durable workflow execution:

FeatureBenefit
Durable executionWorkflow survives service restarts
Automatic retriesConfigurable retry policies per step
Timeout handlingPer-step and per-workflow timeout limits
Parallel executionIndependent steps run concurrently
Saga patternCompensating actions on failure
VisibilityQuery workflow state and history

Event Publishing

Pipeline state changes are published as Kafka events:

EventPublished WhenConsumers
PIPELINE_STARTEDWorkflow beginsAudit, notification
PIPELINE_STEP_COMPLETEDEach step finishesMonitoring
PIPELINE_SUCCEEDEDAll steps completeAudit, notification, billing
PIPELINE_FAILEDA step fails permanentlyAudit, notification, alerting

Scheduling

Pipelines support cron-based scheduling:

FeatureDescription
Cron expressionsStandard cron syntax with timezone support
DependenciesTrigger on completion of upstream pipelines
Manual triggerOn-demand execution via API or UI
BackfillRe-run pipelines for historical date ranges

Related Pages