Cloud Storage Connectors

Cloud storage connectors extract data from files stored in object storage systems and file servers. They support reading structured files (CSV, Parquet, JSON, Avro) from cloud buckets and SFTP servers, making it possible to ingest data from data lakes, file drops, and partner data exchanges.

Amazon S3

The S3 connector reads files from Amazon S3 buckets. It supports path pattern matching, file format detection, and incremental sync based on file modification time.

Configuration

{
  "name": "s3-data-lake",
  "connectorType": "s3",
  "connectionConfig": {
    "bucket": "company-data-lake",
    "aws_access_key_id": "AKIA...",
    "aws_secret_access_key": "********",
    "region_name": "us-east-1",
    "path_prefix": "raw/sales/",
    "streams": [
      {
        "name": "sales_data",
        "format": {
          "filetype": "parquet"
        },
        "globs": ["raw/sales/**/*.parquet"]
      }
    ]
  }
}

Configuration Fields

Field	Type	Required	Default	Description
`bucket`	string	Yes	--	S3 bucket name
`aws_access_key_id`	string	Yes*	--	AWS access key ID. Not required if using IAM role.
`aws_secret_access_key`	string	Yes*	--	AWS secret access key
`region_name`	string	No	`us-east-1`	AWS region
`path_prefix`	string	No	--	Prefix to filter S3 objects
`streams[].name`	string	Yes	--	Logical stream name for the extracted data
`streams[].format.filetype`	string	Yes	--	File format: `csv`, `parquet`, `jsonl`, `avro`
`streams[].globs`	string[]	No	--	Glob patterns to match files

Supported File Formats

Format	Extensions	Features
CSV	`.csv`, `.tsv`, `.txt`	Configurable delimiter, quoting, encoding, header detection
Parquet	`.parquet`	Full schema preservation, predicate pushdown
JSON Lines	`.jsonl`, `.ndjson`	One JSON object per line
Avro	`.avro`	Schema registry integration

IAM Role Authentication

For deployments running on AWS, you can use IAM role-based authentication instead of access keys. Attach an IAM role with s3:GetObject and s3:ListBucket permissions to the Airbyte worker pods.

Google Cloud Storage

The GCS connector reads files from Google Cloud Storage buckets.

Configuration

{
  "name": "gcs-analytics",
  "connectorType": "gcs",
  "connectionConfig": {
    "service_account": "{\"type\":\"service_account\",\"project_id\":\"my-project\",...}",
    "bucket": "analytics-data",
    "streams": [
      {
        "name": "web_events",
        "format": {
          "filetype": "jsonl"
        },
        "globs": ["events/**/*.jsonl"]
      }
    ]
  }
}

Configuration Fields

Field	Type	Required	Default	Description
`service_account`	string	Yes	--	GCP service account JSON key (stringified)
`bucket`	string	Yes	--	GCS bucket name
`streams[].name`	string	Yes	--	Logical stream name
`streams[].format.filetype`	string	Yes	--	File format: `csv`, `parquet`, `jsonl`, `avro`
`streams[].globs`	string[]	No	--	Glob patterns to match objects

Required Permissions

The service account must have the following IAM roles:

roles/storage.objectViewer -- read access to objects
roles/storage.legacyBucketReader -- list objects in the bucket

Azure Blob Storage

The Azure Blob Storage connector reads files from Azure Storage containers.

Configuration

{
  "name": "azure-blob-reports",
  "connectorType": "azure-blob-storage",
  "connectionConfig": {
    "azure_blob_storage_account_name": "mystorageaccount",
    "azure_blob_storage_account_key": "********",
    "azure_blob_storage_container_name": "reports",
    "streams": [
      {
        "name": "monthly_reports",
        "format": {
          "filetype": "csv"
        },
        "globs": ["reports/2024/**/*.csv"]
      }
    ]
  }
}

Configuration Fields

Field	Type	Required	Default	Description
`azure_blob_storage_account_name`	string	Yes	--	Storage account name
`azure_blob_storage_account_key`	string	Yes*	--	Storage account access key
`azure_blob_storage_sas_token`	string	Yes*	--	SAS token (alternative to account key)
`azure_blob_storage_container_name`	string	Yes	--	Container name
`streams[].name`	string	Yes	--	Logical stream name
`streams[].format.filetype`	string	Yes	--	File format: `csv`, `parquet`, `jsonl`, `avro`
`streams[].globs`	string[]	No	--	Glob patterns to match blobs

SFTP

The SFTP connector reads files from SFTP servers. This is commonly used for partner data exchanges and legacy system integrations.

Configuration

{
  "name": "sftp-partner-data",
  "connectorType": "sftp-bulk",
  "connectionConfig": {
    "host": "sftp.partner.com",
    "port": 22,
    "username": "matih_user",
    "credentials": {
      "auth_type": "password",
      "password": "********"
    },
    "streams": [
      {
        "name": "daily_feed",
        "format": {
          "filetype": "csv",
          "delimiter": "|",
          "encoding": "utf-8"
        },
        "globs": ["/data/feeds/daily_*.csv"]
      }
    ]
  }
}

Configuration Fields

Field	Type	Required	Default	Description
`host`	string	Yes	--	SFTP server hostname
`port`	integer	No	`22`	SFTP server port
`username`	string	Yes	--	SFTP username
`credentials.auth_type`	string	Yes	--	`password` or `ssh_key`
`credentials.password`	string	Conditional	--	Password (if auth_type is `password`)
`credentials.private_key`	string	Conditional	--	SSH private key (if auth_type is `ssh_key`)
`streams[].name`	string	Yes	--	Logical stream name
`streams[].format.filetype`	string	Yes	--	File format: `csv`, `parquet`, `jsonl`, `avro`
`streams[].format.delimiter`	string	No	`,`	Column delimiter for CSV files
`streams[].format.encoding`	string	No	`utf-8`	File encoding
`streams[].globs`	string[]	No	--	Glob patterns to match files

Cloud Storage vs. File Import

The platform provides two ways to ingest file-based data. Choose the appropriate method based on your use case.

Feature	Cloud Storage Connector	File Import (Direct Upload)
Source	Files in S3, GCS, Azure, SFTP	Files on your local machine
Schedule	Automated on cron schedule	Manual, one-time upload
Volume	Unlimited (streams entire buckets)	Single file per upload
Format support	CSV, Parquet, JSON Lines, Avro	CSV, Excel, Parquet, JSON, Avro
Schema	Auto-detected from file structure	Auto-detected with preview and manual override
Use case	Recurring data feeds, data lake ingestion	Ad-hoc data loading, one-time imports
Documentation	This page	File Import

SaaS Connectors Overview