MATIH Platform is in active MVP development. Documentation reflects current implementation status.
10a. Data Ingestion
Connectors
Database Connectors

Database Connectors

Database connectors extract data from relational and NoSQL databases. They support schema discovery, multiple sync modes, and type-safe extraction with automatic Iceberg schema mapping.


PostgreSQL

The PostgreSQL connector supports Full Refresh, Incremental, and CDC (via logical replication / WAL) sync modes. It is one of the most widely used connectors on the platform.

Configuration

{
  "name": "production-postgres",
  "connectorType": "postgres",
  "connectionConfig": {
    "host": "pg.example.com",
    "port": 5432,
    "database": "analytics",
    "username": "matih_readonly",
    "password": "********",
    "ssl_mode": "require",
    "schemas": ["public", "sales"],
    "replication_method": {
      "method": "CDC",
      "replication_slot": "matih_slot",
      "publication": "matih_pub"
    }
  }
}

Configuration Fields

FieldTypeRequiredDefaultDescription
hoststringYes--PostgreSQL server hostname
portintegerNo5432Server port
databasestringYes--Database name
usernamestringYes--Username with SELECT privileges
passwordstringYes--Password
ssl_modestringNopreferSSL mode: disable, allow, prefer, require, verify-ca, verify-full
schemasstring[]NoAll schemasSchemas to include in discovery
replication_method.methodstringNoStandardStandard (cursor-based) or CDC (WAL logical replication)
replication_method.replication_slotstringCDC only--Logical replication slot name
replication_method.publicationstringCDC only--PostgreSQL publication name

CDC Prerequisites

To use CDC mode with PostgreSQL:

  1. Set wal_level = logical in postgresql.conf
  2. Create a publication: CREATE PUBLICATION matih_pub FOR ALL TABLES;
  3. Create a replication slot: SELECT pg_create_logical_replication_slot('matih_slot', 'pgoutput');
  4. Grant the connector user replication privileges: ALTER USER matih_readonly REPLICATION;

Type Mapping

PostgreSQL TypeIceberg Type
integer, serialINT
bigint, bigserialLONG
numeric, decimalDECIMAL(p,s)
realFLOAT
double precisionDOUBLE
booleanBOOLEAN
varchar, text, charSTRING
dateDATE
timestamp, timestamptzTIMESTAMP
jsonb, jsonSTRING
uuidSTRING
byteaBINARY

MySQL

The MySQL connector supports Full Refresh, Incremental, and CDC (via binlog) sync modes.

Configuration

{
  "name": "mysql-orders",
  "connectorType": "mysql",
  "connectionConfig": {
    "host": "mysql.example.com",
    "port": 3306,
    "database": "orders_db",
    "username": "matih_reader",
    "password": "********",
    "ssl_mode": "required",
    "replication_method": "CDC"
  }
}

Configuration Fields

FieldTypeRequiredDefaultDescription
hoststringYes--MySQL server hostname
portintegerNo3306Server port
databasestringYes--Database name
usernamestringYes--Username with SELECT privileges
passwordstringYes--Password
ssl_modestringNopreferreddisabled, preferred, required, verify_ca, verify_identity
replication_methodstringNoStandardStandard (cursor-based) or CDC (binlog)

CDC Prerequisites

To use CDC mode with MySQL:

  1. Enable binary logging: log_bin = ON in my.cnf
  2. Set binlog format: binlog_format = ROW
  3. Set binlog row image: binlog_row_image = FULL
  4. Grant replication privileges: GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'matih_reader';

MongoDB

The MongoDB connector supports Full Refresh and CDC (via change streams) sync modes.

Configuration

{
  "name": "mongo-users",
  "connectorType": "mongodb",
  "connectionConfig": {
    "connection_string": "mongodb+srv://matih_reader:********@cluster0.example.net/",
    "database": "user_data",
    "auth_source": "admin"
  }
}

Configuration Fields

FieldTypeRequiredDefaultDescription
connection_stringstringYes--MongoDB connection URI
databasestringYes--Database name
auth_sourcestringNoadminAuthentication database

CDC Prerequisites

MongoDB CDC requires a replica set or sharded cluster. Standalone MongoDB instances do not support change streams. The connector user needs the readAnyDatabase and read roles.


SQL Server

The SQL Server connector supports Full Refresh, Incremental, and CDC sync modes.

Configuration

{
  "name": "sqlserver-erp",
  "connectorType": "mssql",
  "connectionConfig": {
    "host": "sqlserver.example.com",
    "port": 1433,
    "database": "ERP",
    "username": "matih_reader",
    "password": "********",
    "ssl_method": "encrypted_trust_server_certificate",
    "schemas": ["dbo"],
    "replication_method": "CDC"
  }
}

Configuration Fields

FieldTypeRequiredDefaultDescription
hoststringYes--SQL Server hostname
portintegerNo1433Server port
databasestringYes--Database name
usernamestringYes--Username
passwordstringYes--Password
ssl_methodstringNounencryptedunencrypted, encrypted_trust_server_certificate, encrypted_verify_certificate
schemasstring[]NoAllSchemas to include
replication_methodstringNoStandardStandard or CDC

CDC Prerequisites

To use CDC mode with SQL Server:

  1. Enable CDC on the database: EXEC sys.sp_cdc_enable_db;
  2. Enable CDC on each table: EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'orders', @role_name = NULL;
  3. Ensure SQL Server Agent is running (required for CDC cleanup jobs)

Oracle

The Oracle connector supports Full Refresh, Incremental, and CDC (via LogMiner) sync modes.

Configuration

{
  "name": "oracle-finance",
  "connectorType": "oracle",
  "connectionConfig": {
    "host": "oracle.example.com",
    "port": 1521,
    "sid": "ORCL",
    "username": "matih_reader",
    "password": "********",
    "encryption": {
      "encryption_method": "client_nne",
      "encryption_algorithm": "AES256"
    },
    "schemas": ["FINANCE"]
  }
}

Configuration Fields

FieldTypeRequiredDefaultDescription
hoststringYes--Oracle server hostname
portintegerNo1521Server port
sidstringYes*--Oracle SID (mutually exclusive with service_name)
service_namestringYes*--Oracle service name (mutually exclusive with sid)
usernamestringYes--Username
passwordstringYes--Password
schemasstring[]NoAllSchemas to include
encryptionobjectNo--Native Network Encryption configuration

Other Database Connectors

The platform supports additional database connectors through Airbyte.

ConnectorTypeSync ModesNotes
MariaDBRelationalFull Refresh, Incremental, CDCUses MySQL-compatible protocol
CockroachDBRelationalFull Refresh, Incremental, CDCPostgreSQL wire protocol compatible
Amazon RedshiftData WarehouseFull Refresh, IncrementalRequires UNLOAD permissions for large extracts
Google BigQueryData WarehouseFull Refresh, IncrementalUses service account authentication
SnowflakeData WarehouseFull Refresh, IncrementalUses key pair or password authentication
ElasticsearchSearch/NoSQLFull RefreshExtracts documents from indices
DynamoDBNoSQLFull Refresh, CDCUses DynamoDB Streams for CDC
CassandraNoSQLFull RefreshWide-column store extraction
RedisKey-ValueFull RefreshExtracts keys matching specified patterns
ClickHouseOLAPFull Refresh, IncrementalUseful for migrating from external ClickHouse

For detailed configuration of any connector, consult the Airbyte connector documentation (opens in a new tab) for the specific connector version deployed in your tenant.