ML Engineer Journey: Demand Forecasting at Scale
Persona: Marcus, ML Engineer at NovaMart Objective: Build and operate a production demand forecasting system predicting product-level sales 14 days ahead Timeline: 6 weeks from design to full production Primary Workbenches: ML Workbench, Data Workbench, Pipeline Service
Stage 1: Ingestion
Marcus needs three data sources for demand forecasting: sales transactions, inventory levels, and web traffic. He configures each in the Data Workbench > Sources panel.
Shopify Orders API
NovaMart uses Shopify as its storefront. Marcus sets up an incremental sync using the Airbyte Shopify connector:
{
"connector": "source-shopify",
"config": {
"shop": "novamart.myshopify.com",
"credentials": {
"auth_method": "api_password",
"api_password": "${SHOPIFY_API_KEY}"
}
},
"streams": [
{ "name": "orders", "sync_mode": "incremental", "cursor_field": "updated_at" },
{ "name": "products", "sync_mode": "incremental", "cursor_field": "updated_at" },
{ "name": "inventory_levels", "sync_mode": "full_refresh" }
],
"schedule": { "type": "cron", "expression": "0 * * * *" }
}PostgreSQL Inventory Database
The internal inventory system tracks warehouse stock levels and supplier shipments:
{
"connector": "source-postgres",
"config": {
"host": "inventory-db.novamart.internal",
"port": 5432,
"database": "inventory",
"schemas": ["public"],
"replication_method": { "method": "standard" }
},
"streams": [
{ "name": "inventory", "sync_mode": "full_refresh" },
{ "name": "supplier_shipments", "sync_mode": "incremental", "cursor_field": "updated_at" },
{ "name": "warehouses", "sync_mode": "full_refresh" }
],
"schedule": { "type": "cron", "expression": "0 */6 * * *" }
}Google Analytics Sessions
Web traffic data provides demand signals like product page views and search trends:
{
"connector": "source-google-analytics-data-api",
"config": {
"property_id": "properties/349871234",
"credentials": { "credentials_json": "${GA_SERVICE_ACCOUNT_JSON}" },
"date_ranges_start_date": "2024-01-01"
},
"streams": [
{ "name": "daily_active_users", "sync_mode": "incremental" },
{ "name": "pages", "sync_mode": "incremental" },
{ "name": "events", "sync_mode": "incremental" }
],
"schedule": { "type": "cron", "expression": "0 6 * * *" },
"transformation": {
"schema_mapping": {
"pagePath": "page_url",
"activeUsers": "unique_visitors"
},
"deduplication": {
"key": ["date", "pagePath"],
"strategy": "last_write_wins"
}
}
}Stage 2: Discovery
With all three sources syncing, Marcus explores the data catalog to understand the join graph and discover existing assets.
Finding Join Keys
In Data Workbench > Catalog, Marcus searches for tables containing product_id:
| Table | Schema | Has product_id | Row Count |
|---|---|---|---|
products | shopify.public | PK | 45,291 |
order_items | shopify.public | FK | 38.7M |
inventory | inventory.public | FK | 45,291 |
product_categories | analytics.derived | FK | 45,291 |
supplier_shipments | inventory.public | FK | 156K |
He clicks into product_categories and sees it was created by the BI team, with
a last_modified_by: sofia@novamart.com tag. The table maps each product to its
category hierarchy (Electronics > Audio > Headphones) and includes a seasonal_flag
column -- useful for forecasting.
Checking Lineage for Circular Dependencies
Before building his pipeline, Marcus verifies there are no circular dependencies in the tables he plans to use:
shopify.orders ──▶ shopify.order_items ──┐
├──▶ analytics.daily_product_sales (Marcus creates)
shopify.products ────────────────────────┘ │
▼
analytics.product_categories ──────────────▶ demand_features (Marcus creates)
│
inventory.inventory ───────────────────────────────┘No circular dependencies. The BI team's product_categories is an input to Marcus's
pipeline but does not depend on any of his outputs.
Stage 3: Query
Marcus writes the feature engineering SQL using the Query Editor. He leverages DuckDB for efficient window functions on S3-hosted parquet files, and PostgreSQL/Snowflake for transactional data.
Feature Engineering SQL
-- Demand forecasting feature set: one row per product per day
-- Uses rolling aggregates, seasonality encoding, and price signals
CREATE TABLE analytics.derived.demand_features AS
WITH daily_sales AS (
SELECT
oi.product_id,
DATE(o.order_date) AS sale_date,
SUM(oi.quantity) AS units_sold,
SUM(oi.quantity * oi.unit_price) AS revenue,
COUNT(DISTINCT o.order_id) AS num_orders,
AVG(oi.unit_price) AS avg_selling_price,
AVG(oi.discount) AS avg_discount_pct
FROM postgresql.shopify.order_items oi
JOIN postgresql.shopify.orders o ON oi.order_id = o.order_id
WHERE o.order_date >= DATE '2024-01-01'
GROUP BY oi.product_id, DATE(o.order_date)
),
rolling_features AS (
SELECT
product_id,
sale_date,
units_sold,
revenue,
avg_selling_price,
avg_discount_pct,
-- Rolling aggregates
AVG(units_sold) OVER (
PARTITION BY product_id ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS sales_rolling_7d,
AVG(units_sold) OVER (
PARTITION BY product_id ORDER BY sale_date
ROWS BETWEEN 13 PRECEDING AND CURRENT ROW
) AS sales_rolling_14d,
AVG(units_sold) OVER (
PARTITION BY product_id ORDER BY sale_date
ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
) AS sales_rolling_30d,
STDDEV(units_sold) OVER (
PARTITION BY product_id ORDER BY sale_date
ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
) AS sales_volatility_30d,
-- Lag features
LAG(units_sold, 7) OVER (PARTITION BY product_id ORDER BY sale_date) AS sales_lag_7d,
LAG(units_sold, 14) OVER (PARTITION BY product_id ORDER BY sale_date) AS sales_lag_14d,
LAG(units_sold, 365) OVER (PARTITION BY product_id ORDER BY sale_date) AS sales_same_day_last_year
FROM daily_sales
),
seasonality AS (
SELECT
product_id,
sale_date,
EXTRACT(DOW FROM sale_date) AS day_of_week,
EXTRACT(MONTH FROM sale_date) AS month,
CASE
WHEN EXTRACT(DOW FROM sale_date) IN (0, 6) THEN 1 ELSE 0
END AS is_weekend,
CASE
WHEN sale_date BETWEEN DATE '2025-11-28' AND DATE '2025-12-02' THEN 1 -- Black Friday
WHEN sale_date BETWEEN DATE '2025-12-20' AND DATE '2025-12-26' THEN 1 -- Holiday
WHEN sale_date = DATE '2025-02-14' THEN 1 -- Valentine's
ELSE 0
END AS is_holiday_period
FROM rolling_features
)
SELECT
rf.*,
s.day_of_week,
s.month,
s.is_weekend,
s.is_holiday_period,
pc.category,
pc.subcategory,
pc.seasonal_flag,
inv.quantity_on_hand,
inv.reorder_point,
CASE WHEN inv.quantity_on_hand <= inv.reorder_point THEN 1 ELSE 0 END AS is_low_stock
FROM rolling_features rf
JOIN seasonality s ON rf.product_id = s.product_id AND rf.sale_date = s.sale_date
LEFT JOIN postgresql.inventory.inventory inv ON rf.product_id = inv.product_id
LEFT JOIN postgresql.analytics.product_categories pc ON rf.product_id = pc.product_id
WHERE rf.sale_date >= DATE '2024-02-01'; -- Need 30 days for rolling featuresResult: 14.2M rows (45K products x ~315 days with data). Execution time: 1m 47s.
Stage 4: Orchestration
Marcus builds a five-stage Temporal pipeline that runs the full forecasting workflow daily.
Pipeline DAG
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ INGEST │──▶│ FEATURE │──▶│ TRAIN │──▶│ VALIDATE │──▶│ DEPLOY │
│ │ │ ENG │ │ │ │ │ │ │
│ Sync │ │ Run SQL, │ │ Ray │ │ MAPE < │ │ Register │
│ latest │ │ compute │ │ Train │ │ 15%? │ │ model, │
│ data │ │ rolling │ │ per- │ │ Bias │ │ update │
│ from all │ │ features │ │ category │ │ check │ │ Ray │
│ sources │ │ │ │ models │ │ │ │ Serve │
└──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘Pipeline Configuration
{
"pipeline_id": "demand-forecast-daily",
"name": "Daily Demand Forecast Pipeline",
"schedule": "0 2 * * *",
"timeout": "2h",
"tasks": [
{
"task_id": "sync_sources",
"type": "ingestion_trigger",
"config": {
"connections": [
"shopify-orders",
"inventory-db",
"google-analytics"
],
"wait_for_completion": true,
"timeout": "30m"
}
},
{
"task_id": "feature_engineering",
"type": "sql_execute",
"depends_on": ["sync_sources"],
"config": {
"query_ref": "analytics.queries.demand_features",
"target_table": "analytics.derived.demand_features",
"write_mode": "overwrite"
},
"retry_policy": {
"max_attempts": 3,
"backoff_coefficient": 2,
"initial_interval": "60s"
}
},
{
"task_id": "model_training",
"type": "ray_train",
"depends_on": ["feature_engineering"],
"config": {
"training_script": "demand_forecast/train.py",
"dataset": "analytics.derived.demand_features",
"ray_config": {
"num_workers": 4,
"use_gpu": false,
"resources_per_worker": { "CPU": 2, "memory": 4294967296 }
},
"hyperparameters": {
"model_type": "lightgbm",
"forecast_horizon": 14,
"n_estimators": 500,
"learning_rate": 0.05,
"category_column": "category",
"per_category_models": true
},
"mlflow_experiment": "demand-forecast-daily"
}
},
{
"task_id": "model_validation",
"type": "model_evaluate",
"depends_on": ["model_training"],
"config": {
"validation_window": "14d",
"metrics": ["mape", "rmse", "bias"],
"thresholds": {
"mape_max": 0.15,
"bias_range": [-0.05, 0.05]
},
"fail_on_threshold_breach": true,
"holdout_split": "last_14_days"
}
},
{
"task_id": "model_deployment",
"type": "ray_serve_update",
"depends_on": ["model_validation"],
"config": {
"deployment_name": "demand-forecaster",
"model_source": "latest_validated",
"rollout_strategy": "blue_green",
"health_check_timeout": "5m",
"rollback_on_failure": true
}
}
],
"notifications": {
"on_failure": {
"channel": "slack",
"target": "#ml-ops-alerts",
"include_logs": true
},
"on_success": {
"channel": "slack",
"target": "#data-pipelines",
"include_metrics": true
}
},
"failure_handling": {
"task_level": {
"sync_sources": "retry_then_fail",
"feature_engineering": "retry_then_fail",
"model_training": "retry_then_fail",
"model_validation": "skip_deployment_keep_current",
"model_deployment": "rollback_to_previous"
}
}
}Stage 5: Analysis
Before the pipeline goes to production, Marcus validates the training data thoroughly.
Data Quality Rules
{
"suite_name": "demand_features_validation",
"expectations": [
{
"type": "expect_column_values_to_be_between",
"kwargs": { "column": "units_sold", "min_value": 0, "max_value": 10000 },
"notes": "Flag extreme outliers -- max single-day product sales historically is ~5000"
},
{
"type": "expect_column_values_to_be_between",
"kwargs": { "column": "avg_selling_price", "min_value": 0.01, "max_value": 9999.99 }
},
{
"type": "expect_column_values_to_not_be_null",
"kwargs": { "column": "sales_rolling_7d" }
},
{
"type": "expect_compound_columns_to_be_unique",
"kwargs": { "column_list": ["product_id", "sale_date"] }
}
]
}Product History Filtering
Marcus checks how many products have sufficient history for model training:
SELECT
CASE
WHEN day_count >= 365 THEN '12+ months'
WHEN day_count >= 180 THEN '6-12 months'
WHEN day_count >= 90 THEN '3-6 months'
WHEN day_count >= 30 THEN '1-3 months'
ELSE 'Less than 1 month'
END AS history_bucket,
COUNT(*) AS product_count,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 1) AS pct
FROM (
SELECT product_id, COUNT(DISTINCT sale_date) AS day_count
FROM analytics.derived.demand_features
GROUP BY product_id
) t
GROUP BY 1 ORDER BY MIN(day_count) DESC;history_bucket | product_count | pct
────────────────────┼───────────────┼──────
12+ months | 28,412 | 62.7%
6-12 months | 8,934 | 19.7%
3-6 months | 4,213 | 9.3%
1-3 months | 2,847 | 6.3%
Less than 1 month | 885 | 2.0%Products with less than 30 days of history (885 SKUs) are excluded from model training and handled by a category-level fallback model instead.
Feature Correlation Profiling
Marcus profiles the top correlations with units_sold to sanity-check his features:
Feature Correlations with units_sold (Pearson r):
─────────────────────────────────────────────────
sales_rolling_7d: 0.91 [expected -- autoregressive]
sales_rolling_14d: 0.87
sales_lag_7d: 0.84
sales_rolling_30d: 0.82
sales_same_day_last_year: 0.64 [good seasonal signal]
avg_discount_pct: 0.31 [promotions drive volume]
is_holiday_period: 0.28
is_weekend: 0.12
quantity_on_hand: -0.08 [low stock suppresses sales]
avg_selling_price: -0.14 [higher price, fewer units]Stage 6: Productionization
Marcus trains category-specific models using Ray Train (distributed training) and deploys an ensemble to Ray Serve.
Distributed Training with Ray Train
The training runs across 4 workers, with each worker handling a subset of product categories:
Training Summary: demand-forecast-daily run #47
────────────────────────────────────────────────
Training Date: 2025-12-15 02:34:12 UTC
Duration: 18m 42s
Workers: 4 (CPU only)
Categories: 20 product categories
Models Trained: 20 (per-category LightGBM)
Total Samples: 14.2M rows
Per-Category Validation Results (14-day holdout):
─────────────────────────────────────────────────
Category MAPE RMSE Bias Status
Electronics 8.2% 12.4 -0.01 PASS
Clothing 11.4% 18.7 0.02 PASS
Home & Garden 9.1% 14.2 -0.03 PASS
Beauty 7.8% 9.3 0.01 PASS
Sports & Outdoors 10.3% 15.8 -0.02 PASS
Books & Media 6.4% 7.1 0.00 PASS
Food & Grocery 12.1% 22.4 0.04 PASS
Toys & Games 13.8% 19.6 -0.01 PASS
Automotive 9.7% 11.2 0.01 PASS
Pet Supplies 8.9% 10.8 -0.02 PASS
... (10 more categories, all PASS)
─────────────────────────────────────────────────
Overall Weighted MAPE: 9.6% [PASS: threshold 15%]
Overall Bias: -0.004 [PASS: threshold +/-5%]Ray Serve Deployment
{
"deployment_name": "demand-forecaster",
"model_type": "ensemble",
"models": {
"per_category": {
"source": "mlflow://demand-forecast-daily/latest",
"routing_key": "category"
},
"fallback": {
"source": "mlflow://demand-forecast-global-fallback/latest",
"description": "Used for new products with < 30 days history"
}
},
"serving_config": {
"num_replicas": 2,
"max_concurrent_queries": 200,
"autoscaling_config": {
"min_replicas": 2,
"max_replicas": 8,
"target_num_ongoing_requests_per_replica": 25,
"upscale_delay_s": 30,
"downscale_delay_s": 300
}
},
"batch_inference": {
"enabled": true,
"schedule": "0 5 * * *",
"output_table": "analytics.forecasts.demand_14d",
"description": "Daily batch predictions for all 45K products, 14-day horizon"
}
}Batch Inference Output
Every morning at 5 AM, the pipeline writes forecasts for all products:
-- Sample output from analytics.forecasts.demand_14d
SELECT product_id, product_name, category, forecast_date,
predicted_units, prediction_interval_lower, prediction_interval_upper,
model_version
FROM analytics.forecasts.demand_14d
WHERE forecast_date = CURRENT_DATE + INTERVAL '1' DAY
ORDER BY predicted_units DESC
LIMIT 5;product_id | product_name | category | forecast_date | predicted | lower | upper | model
───────────┼──────────────────────────────┼─────────────┼───────────────┼───────────┼───────┼───────┼───────
PRD-10482 | Wireless Noise-Cancel Buds | Electronics | 2025-12-16 | 342 | 289 | 401 | v47
PRD-20193 | Organic Cotton T-Shirt | Clothing | 2025-12-16 | 287 | 241 | 339 | v47
PRD-30847 | Stainless Steel Water Bottle | Home | 2025-12-16 | 264 | 218 | 314 | v47
PRD-15293 | Vitamin C Serum 30ml | Beauty | 2025-12-16 | 251 | 212 | 297 | v47
PRD-40122 | Resistance Band Set | Sports | 2025-12-16 | 228 | 187 | 273 | v47Stage 7: Feedback
Marcus configures comprehensive production monitoring in the ML Workbench.
Monitoring Configuration
{
"monitor_name": "demand-forecast-production",
"monitors": [
{
"type": "accuracy_tracking",
"metric": "mape",
"granularity": "category",
"evaluation_lag_days": 14,
"alert_threshold": 0.15,
"schedule": "0 8 * * *"
},
{
"type": "feature_freshness",
"checks": [
{ "table": "shopify.order_items", "max_age_hours": 6 },
{ "table": "inventory.inventory", "max_age_hours": 12 },
{ "table": "analytics.derived.demand_features", "max_age_hours": 24 }
]
},
{
"type": "prediction_quality",
"checks": [
{ "name": "coverage", "description": "All active products have forecasts", "threshold": 0.99 },
{ "name": "non_negative", "description": "No negative predictions", "threshold": 1.0 },
{ "name": "range_check", "description": "No prediction > 10x historical max", "threshold": 1.0 }
]
},
{
"type": "automatic_retraining",
"trigger": {
"condition": "mape_overall > 0.12 for 3 consecutive days",
"action": "trigger_pipeline",
"pipeline_id": "demand-forecast-daily",
"notification": "slack:#ml-ops-alerts"
}
}
]
}Production Dashboard (After 30 Days)
Demand Forecaster - Production Health
──────────────────────────────────────
Period: 2025-12-15 to 2026-01-14
Pipeline Runs: 31 (30 success, 1 retry-then-success)
Avg Training Time: 19m 12s
Avg Batch Inference: 7m 34s (45,291 products x 14 days)
Accuracy Trend (MAPE):
Week 1: 9.6% ██████████
Week 2: 9.8% ██████████
Week 3: 10.4% ██████████▌ (holiday period)
Week 4: 11.1% ███████████ (post-holiday normalization)
Feature Freshness SLA:
shopify.order_items: 99.8% within 6h [OK]
inventory.inventory: 100% within 12h [OK]
demand_features: 100% within 24h [OK]
Retraining Triggers: 0 (MAPE never exceeded 12% for 3 days)
Serving Metrics:
Batch inference p50: 7m 34s
API p50 latency: 45ms
API p99 latency: 128ms
Error rate: 0.01%Stage 8: Experimentation
Marcus compares three modeling approaches to find the best architecture per product category.
Model Comparison Experiment
He logs all experiments to MLflow and evaluates on the same 14-day holdout period:
Experiment: demand-model-comparison
Models: LightGBM, LSTM, Prophet
Evaluation: 14-day holdout MAPE per category (20 categories)
Results Matrix (MAPE % -- lower is better):
──────────────────────────────────────────────────────────────
Category LightGBM LSTM Prophet Winner
──────────────────────────────────────────────────────────────
Electronics 8.2% 9.1% 11.3% LightGBM
Clothing 11.4% 9.8% 13.2% LSTM
Home & Garden 9.1% 10.2% 10.8% LightGBM
Beauty 7.8% 8.4% 9.6% LightGBM
Sports & Outdoors 10.3% 11.7% 12.1% LightGBM
Books & Media 6.4% 7.2% 6.8% LightGBM
Food & Grocery 12.1% 9.4% 14.7% LSTM
Toys & Games 13.8% 10.1% 11.4% LSTM
Automotive 9.7% 10.8% 10.2% LightGBM
Pet Supplies 8.9% 9.3% 9.1% LightGBM
Jewelry 10.5% 8.7% 12.3% LSTM
Office Supplies 7.1% 8.9% 8.2% LightGBM
Garden (Seasonal) 14.2% 11.3% 9.8% Prophet
Holiday Decor 16.1% 12.4% 10.2% Prophet
Swimwear 15.3% 11.8% 9.4% Prophet
Winter Apparel 14.7% 10.9% 10.1% LSTM
Baby Products 8.3% 9.1% 9.4% LightGBM
Health Supplements 7.5% 8.2% 8.8% LightGBM
Kitchen Appliances 9.8% 10.4% 11.2% LightGBM
Outdoor Furniture 13.4% 11.1% 8.9% Prophet
──────────────────────────────────────────────────────────────
Wins: 14 2 4
──────────────────────────────────────────────────────────────Key Findings
| Observation | Impact |
|---|---|
| LightGBM wins 14/20 categories | Best default model for non-seasonal products |
| LSTM wins for high-variance categories (Food, Toys, Clothing) | Better at capturing complex temporal patterns |
| Prophet wins all 4 highly seasonal categories | Explicit seasonality decomposition matters for holiday-driven products |
| Training time: LightGBM 18m, LSTM 2h 14m, Prophet 45m | LightGBM is 7x faster than LSTM |
Per-Category Model Routing
Based on these results, Marcus implements a routing strategy:
{
"routing_strategy": "per_category",
"default_model": "lightgbm",
"overrides": [
{ "categories": ["Clothing", "Food & Grocery", "Toys & Games", "Winter Apparel"], "model": "lstm" },
{ "categories": ["Garden (Seasonal)", "Holiday Decor", "Swimwear", "Outdoor Furniture"], "model": "prophet" }
],
"fallback_model": "lightgbm",
"evaluation_schedule": "monthly",
"auto_reassign": {
"enabled": true,
"condition": "alternative_model_mape < current_mape * 0.9 for 30 days",
"notification": "slack:#ml-ops-alerts"
}
}This routing reduced the overall weighted MAPE from 9.6% (LightGBM only) to 8.1% (routed ensemble) -- a 15.6% improvement.
Summary
| Stage | What Marcus Did | Platform Components Used |
|---|---|---|
| Ingestion | Connected Shopify API, PostgreSQL, Google Analytics | Ingestion Service (Airbyte), Data Workbench |
| Discovery | Mapped join keys, found BI team's category table | Catalog Service, Data Workbench |
| Query | Built rolling aggregate features with federated SQL | Query Engine (Trino + DuckDB), Data Workbench |
| Orchestration | Built 5-stage daily pipeline with retry/rollback | Pipeline Service (Temporal) |
| Analysis | Validated data quality, profiled feature correlations | Data Quality Service (Great Expectations) |
| Productionization | Trained per-category models, deployed to Ray Serve | ML Service (Ray Train, MLflow, Ray Serve) |
| Feedback | Configured accuracy tracking, freshness SLAs, auto-retrain | ML Workbench Monitoring |
| Experimentation | Compared LightGBM/LSTM/Prophet, built routed ensemble | ML Workbench Experiments (MLflow) |
Related Walkthroughs
- Data Scientist Journey -- Priya builds the customer churn model
- BI Lead Journey -- Sofia consumes Marcus's forecasts in revenue dashboards
- Executive Journey -- David uses forecasts for strategic inventory decisions
- Retail Overview -- NovaMart company profile and dataset descriptions