Natural Language Data Engineering: Speak, Don't Code

March 2026 · 11 min read

A Three-Week Project in Thirty Seconds

"Create a pipeline to forecast Q4 revenue using the last 3 years of sales data, broken down by region."

In 2024, this sentence starts a three-week project. A product manager writes a requirements document. A ticket is filed. It sits in a sprint backlog for five days. A data engineer picks it up, spends two days understanding the data model, writes a dbt pipeline to join and aggregate historical sales, builds an Airflow DAG to orchestrate it, creates a feature engineering step for the ML model, connects it to a training pipeline, validates the outputs, writes tests, gets a code review, deploys to staging, runs a smoke test, deploys to production, and sets up monitoring.

Three weeks. Four engineers involved. The product manager who asked the question has already moved on to a different priority.

In 2026, the same sentence is typed into a conversational interface. The platform identifies the intent, routes it to the right specialized agents, generates a pipeline specification with the correct source tables and transformation logic, previews the output, asks for approval, and deploys -- all within a single conversation that takes less than five minutes.

This is not science fiction. It is the natural consequence of everything this series has built: an ontology that defines business terms unambiguously, a context graph that maps data lineage and dependencies, specialized agents that know how to use data tools, governance that ensures every action is safe, and decision traces that capture what worked. Natural language data engineering is what happens when all those layers work together.

The Bottleneck Nobody Talks About

Every data-driven organization has the same structural bottleneck: the people who understand the data (business users) are not the people who can build with it (data engineers), and the communication channel between them is a ticket queue.

The business user knows exactly what question they need answered. They know which metrics matter, which time periods are relevant, which segments to compare. But they cannot write SQL, configure Airflow, or build a dbt model. So they write a ticket. And they wait.

The data engineer is skilled and well-intentioned. But they do not know the business context as deeply as the person who filed the ticket. They make reasonable assumptions that turn out to be wrong. "Revenue" could mean gross or net. "By region" could mean sales region or geographic region. "Last 3 years" could mean calendar years or fiscal years. Each assumption creates a round-trip: build, review, revise, rebuild.

The real cost is not engineering time. It is the questions that never get asked. For every request that enters the ticket queue, there are ten questions the business user did not bother asking because the turnaround time made it impractical. Those unasked questions represent decisions made on intuition instead of data -- and in a competitive market, that is a strategic liability.

24 Intent Categories

Natural language data engineering starts with understanding what the user wants to do. The platform classifies every utterance into one of 24 intent categories, each of which maps to a specialized agent with the right tools for the job:

Intent Family	Example Utterances	Routing Target
QUERY	"What was revenue last quarter?"	SQL Intelligence Agent
PIPELINE_CREATE	"Build a daily pipeline to aggregate orders by region"	Pipeline Builder Agent
PIPELINE_MODIFY	"Add a data quality check to the customer pipeline"	Pipeline Builder Agent
SCHEMA_CHANGE	"Add a `loyalty_tier` column to the customers table"	Schema Migrator Agent
MODEL_TRAIN	"Train a churn prediction model on the last 6 months"	ML Training Orchestrator
MODEL_DEPLOY	"Deploy the latest revenue forecast to production"	ML Deployment Agent
DASHBOARD_CREATE	"Create a dashboard showing weekly active users by cohort"	Dashboard Creator Agent
INCIDENT_RESPONSE	"Why did the revenue pipeline fail last night?"	Incident Responder Agent
DATA_EXPLORE	"What tables contain customer data?"	Catalog Explorer Agent
EXPLAIN	"How is customer lifetime value calculated?"	Documentation Agent

The remaining 14 categories cover operations like access requests, scheduling changes, alert configuration, data profiling, compliance checks, and cost optimization queries. Each maps to an agent or agent combination with the specific MCP tools required.

Intent classification is not simple keyword matching. "Show me revenue" is a QUERY intent. "Build a pipeline that shows revenue daily" is a PIPELINE_CREATE intent. "Why is the revenue pipeline slow?" is an INCIDENT_RESPONSE intent. The classifier uses the full conversational context -- previous turns, user role, active session state -- to disambiguate.

From Intent to Execution: A Walkthrough

Here is what actually happens when a user says "Create a pipeline to forecast Q4 revenue using the last 3 years of sales by region":

Turn 1: Intent classification and routing. The platform classifies this as PIPELINE_CREATE with sub-intents of ML_FORECAST and AGGREGATION. It routes to the Pipeline Builder Agent, which coordinates with the ML Training Orchestrator.

Turn 2: Ontology resolution. The agent consults the ontology to resolve "revenue" to its canonical definition: SUM(fact_sales.net_amount) - SUM(fact_returns.return_amount), filtered by transaction_type != 'intercompany'. "Region" resolves to dim_geography.sales_region. "Last 3 years" resolves to fiscal years based on the tenant's fiscal calendar configuration.

Turn 3: Pipeline specification. The agent generates a pipeline specification:

Source: fact_sales joined with dim_geography and dim_date
Filter: fiscal years 2023-2025
Aggregation: monthly revenue by sales region
Feature engineering: lag features (1, 3, 6, 12 months), rolling averages, year-over-year growth
Model: time series forecast (Prophet + ensemble)
Output: forecast_q4_revenue_by_region table with confidence intervals

The agent presents this specification to the user in plain language: "I will build a monthly pipeline that aggregates net revenue by sales region for fiscal years 2023-2025, engineers time series features, trains a Prophet forecast model, and outputs Q4 2026 projections with 80% and 95% confidence intervals. The pipeline will run daily to incorporate new actuals. Shall I proceed?"

Turn 4: User approval. The user reviews the specification, confirms it matches their intent, and approves. The governance layer validates that the user has permission to create pipelines, access the source tables, and consume compute resources for model training.

Turn 5: Deployment. The agent generates the actual artifacts -- dbt models for transformation, an Airflow DAG for orchestration, a training script for the forecast model -- deploys them to the pipeline runtime, triggers the first execution, and sets up monitoring. The user receives a link to the pipeline status page and a notification when the first run completes.

Total elapsed time: approximately 4 minutes. Total engineers involved: zero.

Multi-Turn Refinement

The initial pipeline is rarely the final pipeline. Business requirements evolve, and natural language makes iteration as fast as thinking of the next question.

"Actually, break it down by product category too." The Pipeline Builder Agent modifies the aggregation to include dim_product.category, regenerates the dbt models, and shows the updated specification. No ticket. No code review. No sprint planning.

"Can you add a data quality check that flags if any region has less than 100 transactions in a month?" The agent adds a Great Expectations assertion to the pipeline, configuring it to raise a warning if any region-month combination has fewer than 100 rows.

"What if we used a different model -- something that handles seasonality better?" The ML Training Orchestrator suggests switching from Prophet to a SARIMAX model or an ensemble approach, explains the trade-offs, and offers to retrain with both and compare results.

Each turn builds on the previous context. The agents remember the full conversation history, the pipeline they built, the data they accessed, and the decisions the user made. Multi-turn refinement is not a new request -- it is a continuation of the same investigation, with full context preservation.

Session Checkpointing

Every conversation turn is checkpointed. This means the user can:

Replay from any point. Made a mistake three turns ago? Roll back to that checkpoint and branch in a different direction. The pipeline state, data preview, and model configuration all reset to that point.
Share context. Send a colleague a link to turn 3 of the conversation. They see the same pipeline specification, the same data preview, and the same agent reasoning -- and they can continue the conversation from that point.
Debug failures. If a deployed pipeline fails, the full conversation that created it is available. The agent can trace from the natural language request through every generated artifact to the failure point.

Session checkpoints are not chat history. They are full state snapshots: the pipeline specification, the generated code, the data profile at that point, the ontology resolutions, and the governance decisions. They are the audit trail that connects "what the user asked for" to "what the system built."

Guardrails in the Loop

Natural language does not bypass governance. Every operation goes through the same AGENT governance framework as any programmatic request.

When a user says "Drop the old_customers table," the agent does not comply because the user used polite natural language. It checks: Does this user have schema-admin permissions? Is this table classified as RESTRICTED? Are there downstream dependencies? The governance layer evaluates the request identically to a programmatic DROP TABLE command.

This is critical because natural language lowers the barrier to entry. Users who could never write SQL can now modify schemas, create pipelines, and trigger model retraining. The governance layer ensures that lower barriers do not mean lower standards.

The Traditional Workflow vs. The NL Workflow

Step	Traditional Workflow	NL Workflow
Request	Product manager writes Jira ticket with requirements	User describes what they need in conversation
Clarification	2-3 email threads to resolve ambiguities over 3 days	Agent asks clarifying questions in real-time
Design	Data engineer designs pipeline, writes design doc	Agent generates pipeline specification from ontology
Implementation	5-10 days of coding, testing, debugging	Agent generates artifacts in seconds
Review	Code review by senior engineer, 1-2 day turnaround	User reviews plain-language specification and data preview
Deployment	CI/CD pipeline, staging validation, production rollout	One-click approval, automated deployment
Iteration	New ticket, new sprint, new cycle	"Actually, add product category" -- done in 30 seconds
Total time	2-4 weeks	5-15 minutes
People involved	3-5 (PM, DE, reviewer, ops, user)	1 (the person who asked)

The traditional workflow is not wrong. It produces reliable, well-tested, well-documented pipelines. But for 80% of data engineering requests -- the routine queries, the standard aggregations, the incremental pipeline modifications -- the three-week cycle is wildly disproportionate to the complexity of the task.

Natural language data engineering handles the 80%. The data engineering team focuses on the 20% that genuinely requires human judgment: novel architectures, complex performance optimization, cross-system integrations, and strategic data modeling decisions. Their expertise becomes a force multiplier rather than a ticket queue.

Previously in this series, we explored Proactive Intelligence -- how the platform detects and resolves issues autonomously. Natural language data engineering is the user-facing expression of that same autonomous intelligence. Next, the capstone: The Autonomous Data Team -- what happens when all these capabilities converge into a single, coordinated system that solves problems humans cannot.

MATIH is building the unified data and AI platform where the barrier between "having a question" and "getting an answer" is a conversation, not a ticket queue. Learn more about our architecture or try the platform.

19. Proactive Intelligence 21. The Autonomous Data Team