MATIH Platform is in active MVP development. Documentation reflects current implementation status.
10a. Data Ingestion
Stream Ingestion

Stream Ingestion

Stream ingestion enables real-time data flow from Kafka topics to Iceberg tables via Flink SQL jobs managed by the pipeline-service.

Architecture

Kafka Topics → Flink SQL Job → Iceberg Table → Trino Query

Creating a Stream Configuration

POST /api/v1/streams
X-Tenant-Id: {tenantId}
{
  "name": "orders-stream",
  "topics": ["orders.created", "orders.updated"],
  "consumerGroup": "matih-orders-consumer",
  "startingPosition": "LATEST",
  "batchSize": 1000,
  "targetTable": "orders_realtime",
  "targetSchema": "default",
  "valueFormat": "JSON"
}

Stream Lifecycle

StateDescription
STOPPEDConfiguration saved, job not running
STARTINGFlink job submission in progress
RUNNINGFlink job active, consuming from Kafka
STOPPINGGraceful shutdown in progress
ERRORJob failed, see errorMessage for details

Operations

ActionEndpointPermission
CreatePOST /api/v1/streamsstreams:write
ListGET /api/v1/streamsstreams:read
GetGET /api/v1/streams/{id}streams:read
StartPOST /api/v1/streams/{id}/startingestion:execute
StopPOST /api/v1/streams/{id}/stopingestion:execute
DeleteDELETE /api/v1/streams/{id}streams:write

Value Formats

FormatSchema RegistryUse Case
JSONOptionalMost common, flexible schema
AVRORequiredSchema evolution, compact binary
PROTOBUFRequiredHigh-performance, strong typing
CSVNoLegacy systems, simple data

Topic Security

Topic names are validated to prevent cross-tenant access:

  • Empty topic names are rejected
  • Control characters are rejected
  • Topic names containing another tenant's ID prefix are blocked

RBAC

Rolestreams:readstreams:writeingestion:execute
DATA_ENGINEERYesYesYes
DATA_ANALYSTYesNoNo
DATA_SCIENTISTNoNoNo
VIEWERNoNoNo