MATIH Platform is in active MVP development. Documentation reflects current implementation status.
10a. Data Ingestion
Conversational Setup

Conversational Setup

The Ingestion Agent provides a chat-based interface for configuring data source connections without navigating complex forms.

How It Works

The agent uses a Finite State Machine (FSM) to guide users through source setup:

WELCOME → SELECT_SOURCE_TYPE → GATHER_CREDENTIALS → TEST_CONNECTION
    → DISCOVER_SCHEMA → SELECT_STREAMS → CONFIGURE_SYNC → REVIEW → EXECUTE

Starting a Session

POST /api/v1/agents/ingestion/sessions
X-Tenant-Id: {tenantId}
X-User-ID: {userId}
X-User-Roles: ROLE_DATA_ENGINEER

Response:

{
  "session_id": "abc-123",
  "state": "select_source_type",
  "message": "Welcome! What type of data source would you like to connect?"
}

Chat Interaction

POST /api/v1/agents/ingestion/sessions/{sessionId}/chat
{ "message": "postgres" }

The agent responds with connector-specific credential prompts:

{
  "state": "gather_credentials",
  "message": "To connect to PostgreSQL, I'll need: Host, Port, Database, Username, Password",
  "options": [],
  "data": { "required_fields": ["host", "port", "database", "username", "password"] }
}

Supported Connectors

The agent has built-in templates for:

  • PostgreSQL — host, port, database, username, password
  • MySQL — host, port, database, username, password
  • Snowflake — account, warehouse, database, schema, role, username, password
  • Amazon S3 — bucket, region, access_key_id, secret_access_key

Other Airbyte connectors can be configured via the standard UI.

Security

  • Credentials are never stored in the agent session — they are sent to ingestion-service once and stored server-side
  • Message history is redacted — passwords and secrets are replaced with ***REDACTED*** before storage
  • User roles are propagated — the agent passes the real user's JWT roles to downstream services (no privilege escalation)
  • Per-session locking — concurrent messages to the same session are serialized via asyncio.Lock

Error Handling

  • Missing fields: Agent prompts for specific missing fields before submitting
  • Permission denied (403): Clear message: "You don't have permission. Contact your administrator."
  • Partial failure: If schedule or initial sync fails, the connection is still created; the agent reports what succeeded and what didn't
  • Zero streams: If schema discovery returns no tables, agent explains possible causes

Session Management

  • Sessions expire after 24 hours of inactivity
  • Maximum 10,000 concurrent sessions per pod
  • Background eviction runs every 5 minutes
  • Sessions are in-memory (lost on pod restart — Redis persistence planned)