Conversational Setup
The Ingestion Agent provides a chat-based interface for configuring data source connections without navigating complex forms.
How It Works
The agent uses a Finite State Machine (FSM) to guide users through source setup:
WELCOME → SELECT_SOURCE_TYPE → GATHER_CREDENTIALS → TEST_CONNECTION
→ DISCOVER_SCHEMA → SELECT_STREAMS → CONFIGURE_SYNC → REVIEW → EXECUTEStarting a Session
POST /api/v1/agents/ingestion/sessions
X-Tenant-Id: {tenantId}
X-User-ID: {userId}
X-User-Roles: ROLE_DATA_ENGINEERResponse:
{
"session_id": "abc-123",
"state": "select_source_type",
"message": "Welcome! What type of data source would you like to connect?"
}Chat Interaction
POST /api/v1/agents/ingestion/sessions/{sessionId}/chat
{ "message": "postgres" }The agent responds with connector-specific credential prompts:
{
"state": "gather_credentials",
"message": "To connect to PostgreSQL, I'll need: Host, Port, Database, Username, Password",
"options": [],
"data": { "required_fields": ["host", "port", "database", "username", "password"] }
}Supported Connectors
The agent has built-in templates for:
- PostgreSQL — host, port, database, username, password
- MySQL — host, port, database, username, password
- Snowflake — account, warehouse, database, schema, role, username, password
- Amazon S3 — bucket, region, access_key_id, secret_access_key
Other Airbyte connectors can be configured via the standard UI.
Security
- Credentials are never stored in the agent session — they are sent to ingestion-service once and stored server-side
- Message history is redacted — passwords and secrets are replaced with
***REDACTED***before storage - User roles are propagated — the agent passes the real user's JWT roles to downstream services (no privilege escalation)
- Per-session locking — concurrent messages to the same session are serialized via
asyncio.Lock
Error Handling
- Missing fields: Agent prompts for specific missing fields before submitting
- Permission denied (403): Clear message: "You don't have permission. Contact your administrator."
- Partial failure: If schedule or initial sync fails, the connection is still created; the agent reports what succeeded and what didn't
- Zero streams: If schema discovery returns no tables, agent explains possible causes
Session Management
- Sessions expire after 24 hours of inactivity
- Maximum 10,000 concurrent sessions per pod
- Background eviction runs every 5 minutes
- Sessions are in-memory (lost on pod restart — Redis persistence planned)