Cloud Storage Connectors
Cloud storage connectors extract data from files stored in object storage systems and file servers. They support reading structured files (CSV, Parquet, JSON, Avro) from cloud buckets and SFTP servers, making it possible to ingest data from data lakes, file drops, and partner data exchanges.
Amazon S3
The S3 connector reads files from Amazon S3 buckets. It supports path pattern matching, file format detection, and incremental sync based on file modification time.
Configuration
{
"name": "s3-data-lake",
"connectorType": "s3",
"connectionConfig": {
"bucket": "company-data-lake",
"aws_access_key_id": "AKIA...",
"aws_secret_access_key": "********",
"region_name": "us-east-1",
"path_prefix": "raw/sales/",
"streams": [
{
"name": "sales_data",
"format": {
"filetype": "parquet"
},
"globs": ["raw/sales/**/*.parquet"]
}
]
}
}Configuration Fields
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
bucket | string | Yes | -- | S3 bucket name |
aws_access_key_id | string | Yes* | -- | AWS access key ID. Not required if using IAM role. |
aws_secret_access_key | string | Yes* | -- | AWS secret access key |
region_name | string | No | us-east-1 | AWS region |
path_prefix | string | No | -- | Prefix to filter S3 objects |
streams[].name | string | Yes | -- | Logical stream name for the extracted data |
streams[].format.filetype | string | Yes | -- | File format: csv, parquet, jsonl, avro |
streams[].globs | string[] | No | -- | Glob patterns to match files |
Supported File Formats
| Format | Extensions | Features |
|---|---|---|
| CSV | .csv, .tsv, .txt | Configurable delimiter, quoting, encoding, header detection |
| Parquet | .parquet | Full schema preservation, predicate pushdown |
| JSON Lines | .jsonl, .ndjson | One JSON object per line |
| Avro | .avro | Schema registry integration |
IAM Role Authentication
For deployments running on AWS, you can use IAM role-based authentication instead of access keys. Attach an IAM role with s3:GetObject and s3:ListBucket permissions to the Airbyte worker pods.
Google Cloud Storage
The GCS connector reads files from Google Cloud Storage buckets.
Configuration
{
"name": "gcs-analytics",
"connectorType": "gcs",
"connectionConfig": {
"service_account": "{\"type\":\"service_account\",\"project_id\":\"my-project\",...}",
"bucket": "analytics-data",
"streams": [
{
"name": "web_events",
"format": {
"filetype": "jsonl"
},
"globs": ["events/**/*.jsonl"]
}
]
}
}Configuration Fields
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
service_account | string | Yes | -- | GCP service account JSON key (stringified) |
bucket | string | Yes | -- | GCS bucket name |
streams[].name | string | Yes | -- | Logical stream name |
streams[].format.filetype | string | Yes | -- | File format: csv, parquet, jsonl, avro |
streams[].globs | string[] | No | -- | Glob patterns to match objects |
Required Permissions
The service account must have the following IAM roles:
roles/storage.objectViewer-- read access to objectsroles/storage.legacyBucketReader-- list objects in the bucket
Azure Blob Storage
The Azure Blob Storage connector reads files from Azure Storage containers.
Configuration
{
"name": "azure-blob-reports",
"connectorType": "azure-blob-storage",
"connectionConfig": {
"azure_blob_storage_account_name": "mystorageaccount",
"azure_blob_storage_account_key": "********",
"azure_blob_storage_container_name": "reports",
"streams": [
{
"name": "monthly_reports",
"format": {
"filetype": "csv"
},
"globs": ["reports/2024/**/*.csv"]
}
]
}
}Configuration Fields
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
azure_blob_storage_account_name | string | Yes | -- | Storage account name |
azure_blob_storage_account_key | string | Yes* | -- | Storage account access key |
azure_blob_storage_sas_token | string | Yes* | -- | SAS token (alternative to account key) |
azure_blob_storage_container_name | string | Yes | -- | Container name |
streams[].name | string | Yes | -- | Logical stream name |
streams[].format.filetype | string | Yes | -- | File format: csv, parquet, jsonl, avro |
streams[].globs | string[] | No | -- | Glob patterns to match blobs |
SFTP
The SFTP connector reads files from SFTP servers. This is commonly used for partner data exchanges and legacy system integrations.
Configuration
{
"name": "sftp-partner-data",
"connectorType": "sftp-bulk",
"connectionConfig": {
"host": "sftp.partner.com",
"port": 22,
"username": "matih_user",
"credentials": {
"auth_type": "password",
"password": "********"
},
"streams": [
{
"name": "daily_feed",
"format": {
"filetype": "csv",
"delimiter": "|",
"encoding": "utf-8"
},
"globs": ["/data/feeds/daily_*.csv"]
}
]
}
}Configuration Fields
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
host | string | Yes | -- | SFTP server hostname |
port | integer | No | 22 | SFTP server port |
username | string | Yes | -- | SFTP username |
credentials.auth_type | string | Yes | -- | password or ssh_key |
credentials.password | string | Conditional | -- | Password (if auth_type is password) |
credentials.private_key | string | Conditional | -- | SSH private key (if auth_type is ssh_key) |
streams[].name | string | Yes | -- | Logical stream name |
streams[].format.filetype | string | Yes | -- | File format: csv, parquet, jsonl, avro |
streams[].format.delimiter | string | No | , | Column delimiter for CSV files |
streams[].format.encoding | string | No | utf-8 | File encoding |
streams[].globs | string[] | No | -- | Glob patterns to match files |
Cloud Storage vs. File Import
The platform provides two ways to ingest file-based data. Choose the appropriate method based on your use case.
| Feature | Cloud Storage Connector | File Import (Direct Upload) |
|---|---|---|
| Source | Files in S3, GCS, Azure, SFTP | Files on your local machine |
| Schedule | Automated on cron schedule | Manual, one-time upload |
| Volume | Unlimited (streams entire buckets) | Single file per upload |
| Format support | CSV, Parquet, JSON Lines, Avro | CSV, Excel, Parquet, JSON, Avro |
| Schema | Auto-detected from file structure | Auto-detected with preview and manual override |
| Use case | Recurring data feeds, data lake ingestion | Ad-hoc data loading, one-time imports |
| Documentation | This page | File Import |