MATIH Platform is in active MVP development. Documentation reflects current implementation status.
11. Pipelines & Data Engineering
Ontology Service
Datasource Mapping

Datasource Mapping

Datasource mapping connects logical object types to their physical data sources. The mapping engine analyzes source schemas, suggests property mappings, and maintains the link between the semantic ontology and underlying tables, APIs, or files.

Source: data-plane/ontology-service/src/schema_mapping/


Mapping Architecture

Physical Schema (table/columns) ──> Schema Analyzer ──> Mapping Suggestions
                                                              |
Logical Object Type (properties) <── Schema Mapper  <─── Approved Mappings

Datasource Types

TypeDescriptionConnection
jdbcRelational database tablesPostgreSQL, MySQL, SQL Server
icebergIceberg tables via Polaris catalogSpark, Trino
apiREST API endpointsHTTP connections
fileFile-based sourcesS3, GCS, Azure Blob
kafkaKafka topicsStrimzi Kafka

Mapping Model

FieldTypeDescription
idUUIDMapping identifier
object_type_idUUIDTarget object type
datasource_typeDatasourceTypeSource type (jdbc, iceberg, api, file)
connectionstringConnection reference
source_tablestringPhysical table or endpoint
column_mappingslistColumn-to-property mappings
sync_modeSyncModeFULL, INCREMENTAL, CDC
update_frequencyUpdateFrequencyREALTIME, HOURLY, DAILY, MANUAL

Column Mapping

{
  "sourceColumn": "cust_email",
  "targetProperty": "email",
  "transformation": null,
  "dataTypeConversion": "VARCHAR -> string"
}

Schema Analysis

The analyzer inspects source schemas and suggests mappings to existing object types:

POST /v1/ontology/mappings/analyze

Request:
{
  "datasourceType": "jdbc",
  "connection": "crm-postgres",
  "table": "customers",
  "targetObjectType": "Customer"
}

Response:
{
  "suggestions": [
    {
      "sourceColumn": "cust_id",
      "targetProperty": "id",
      "confidence": 0.95,
      "matchType": "name_similarity"
    },
    {
      "sourceColumn": "cust_email",
      "targetProperty": "email",
      "confidence": 0.90,
      "matchType": "name_similarity"
    }
  ],
  "unmappedColumns": ["internal_flag", "legacy_code"],
  "unmappedProperties": ["phone_number"]
}

Sync Modes

ModeDescriptionImplementation
FULLComplete data refresh on each syncTruncate and reload
INCREMENTALSync only changed recordsWatermark-based extraction
CDCReal-time change captureFlink CDC connector

API Endpoints

MethodEndpointDescription
POST/v1/ontology/mappingsCreate a datasource mapping
GET/v1/ontology/mappingsList mappings for an object type
PUT/v1/ontology/mappings/:idUpdate mapping configuration
DELETE/v1/ontology/mappings/:idRemove a mapping
POST/v1/ontology/mappings/analyzeAnalyze source and suggest mappings
POST/v1/ontology/mappings/:id/syncTrigger manual data sync

Related Pages