dbt Integration
The AI Service integrates with dbt (data build tool) to import semantic model definitions, metric specifications, and materialization metadata. This integration enriches the Text-to-SQL pipeline with business context from dbt projects, enabling the AI to understand metric definitions, relationships between models, and data freshness.
Integration Architecture
The dbt integration operates through a synchronization pipeline that reads dbt artifacts and loads them into the AI Service schema context:
dbt Project --> manifest.json + catalog.json --> dbt Sync Service --> Schema Context --> RAG PipelineImported Artifacts
| Artifact | Source File | Usage in AI Service |
|---|---|---|
| Models | manifest.json | Table descriptions, column metadata |
| Metrics | manifest.json (semantic layer) | Business metric definitions |
| Sources | manifest.json | Raw data source mappings |
| Tests | manifest.json | Data quality expectations |
| Documentation | catalog.json | Column descriptions, business glossary |
| Freshness | sources.yml | Data staleness tracking |
Semantic Model Import
dbt semantic layer metrics are imported as first-class objects in the AI Service:
{
"metric_name": "monthly_recurring_revenue",
"description": "Sum of recurring revenue for active subscriptions",
"type": "derived",
"sql": "SUM(CASE WHEN status = 'active' THEN mrr ELSE 0 END)",
"dimensions": ["plan_type", "region", "customer_segment"],
"time_grains": ["day", "week", "month", "quarter"],
"filters": [
{"field": "status", "operator": "=", "value": "active"}
]
}Synchronization
The sync process runs on a configurable schedule or can be triggered manually:
Automatic Sync
# Periodic sync every 15 minutes
MATIH_DBT_SYNC_INTERVAL=900
MATIH_DBT_MANIFEST_PATH=/data/dbt/target/manifest.json
MATIH_DBT_CATALOG_PATH=/data/dbt/target/catalog.jsonManual Sync
POST /api/v1/integrations/dbt/sync{
"manifest_url": "https://storage.example.com/dbt/manifest.json",
"catalog_url": "https://storage.example.com/dbt/catalog.json",
"tenant_id": "acme-corp"
}Schema Enrichment
Imported dbt metadata enriches the schema context used by the SQL generator:
| Enrichment | Source | Impact |
|---|---|---|
| Table descriptions | dbt model descriptions | Improves semantic matching in RAG |
| Column descriptions | dbt column docs | More accurate column selection |
| Metric definitions | dbt metrics | Enables metric-aware SQL generation |
| Relationships | dbt refs and relationships | Better JOIN inference |
| Data types | dbt catalog | Type-aware aggregation functions |
Configuration
| Environment Variable | Default | Description |
|---|---|---|
DBT_INTEGRATION_ENABLED | true | Enable dbt integration |
DBT_MANIFEST_PATH | /data/dbt/target/manifest.json | Path to manifest |
DBT_CATALOG_PATH | /data/dbt/target/catalog.json | Path to catalog |
DBT_SYNC_INTERVAL | 900 | Sync interval in seconds |
DBT_CLOUD_API_TOKEN | none | dbt Cloud API token (optional) |
DBT_CLOUD_ACCOUNT_ID | none | dbt Cloud account ID (optional) |
dbt Cloud Integration
For teams using dbt Cloud, the AI Service can pull artifacts directly via the dbt Cloud API:
GET https://cloud.getdbt.com/api/v2/accounts/:account_id/runs/:run_id/artifacts/manifest.jsonThis eliminates the need for local file system access and ensures the AI Service always uses the latest production manifest.