MATIH Platform is in active MVP development. Documentation reflects current implementation status.
21. Industry Examples & Walkthroughs
Retail & E-Commerce
ML Engineer Journey

ML Engineer Journey: Demand Forecasting at Scale

Persona: Marcus, ML Engineer at NovaMart Objective: Build and operate a production demand forecasting system predicting product-level sales 14 days ahead Timeline: 6 weeks from design to full production Primary Workbenches: ML Workbench, Data Workbench, Pipeline Service


Stage 1: Ingestion

Marcus needs three data sources for demand forecasting: sales transactions, inventory levels, and web traffic. He configures each in the Data Workbench > Sources panel.

Shopify Orders API

NovaMart uses Shopify as its storefront. Marcus sets up an incremental sync using the Airbyte Shopify connector:

{
  "connector": "source-shopify",
  "config": {
    "shop": "novamart.myshopify.com",
    "credentials": {
      "auth_method": "api_password",
      "api_password": "${SHOPIFY_API_KEY}"
    }
  },
  "streams": [
    { "name": "orders", "sync_mode": "incremental", "cursor_field": "updated_at" },
    { "name": "products", "sync_mode": "incremental", "cursor_field": "updated_at" },
    { "name": "inventory_levels", "sync_mode": "full_refresh" }
  ],
  "schedule": { "type": "cron", "expression": "0 * * * *" }
}

PostgreSQL Inventory Database

The internal inventory system tracks warehouse stock levels and supplier shipments:

{
  "connector": "source-postgres",
  "config": {
    "host": "inventory-db.novamart.internal",
    "port": 5432,
    "database": "inventory",
    "schemas": ["public"],
    "replication_method": { "method": "standard" }
  },
  "streams": [
    { "name": "inventory", "sync_mode": "full_refresh" },
    { "name": "supplier_shipments", "sync_mode": "incremental", "cursor_field": "updated_at" },
    { "name": "warehouses", "sync_mode": "full_refresh" }
  ],
  "schedule": { "type": "cron", "expression": "0 */6 * * *" }
}

Google Analytics Sessions

Web traffic data provides demand signals like product page views and search trends:

{
  "connector": "source-google-analytics-data-api",
  "config": {
    "property_id": "properties/349871234",
    "credentials": { "credentials_json": "${GA_SERVICE_ACCOUNT_JSON}" },
    "date_ranges_start_date": "2024-01-01"
  },
  "streams": [
    { "name": "daily_active_users", "sync_mode": "incremental" },
    { "name": "pages", "sync_mode": "incremental" },
    { "name": "events", "sync_mode": "incremental" }
  ],
  "schedule": { "type": "cron", "expression": "0 6 * * *" },
  "transformation": {
    "schema_mapping": {
      "pagePath": "page_url",
      "activeUsers": "unique_visitors"
    },
    "deduplication": {
      "key": ["date", "pagePath"],
      "strategy": "last_write_wins"
    }
  }
}

Stage 2: Discovery

With all three sources syncing, Marcus explores the data catalog to understand the join graph and discover existing assets.

Finding Join Keys

In Data Workbench > Catalog, Marcus searches for tables containing product_id:

TableSchemaHas product_idRow Count
productsshopify.publicPK45,291
order_itemsshopify.publicFK38.7M
inventoryinventory.publicFK45,291
product_categoriesanalytics.derivedFK45,291
supplier_shipmentsinventory.publicFK156K

He clicks into product_categories and sees it was created by the BI team, with a last_modified_by: sofia@novamart.com tag. The table maps each product to its category hierarchy (Electronics > Audio > Headphones) and includes a seasonal_flag column -- useful for forecasting.

Checking Lineage for Circular Dependencies

Before building his pipeline, Marcus verifies there are no circular dependencies in the tables he plans to use:

shopify.orders ──▶ shopify.order_items ──┐
                                         ├──▶ analytics.daily_product_sales (Marcus creates)
shopify.products ────────────────────────┘         │

analytics.product_categories ──────────────▶ demand_features (Marcus creates)

inventory.inventory ───────────────────────────────┘

No circular dependencies. The BI team's product_categories is an input to Marcus's pipeline but does not depend on any of his outputs.


Stage 3: Query

Marcus writes the feature engineering SQL using the Query Editor. He leverages DuckDB for efficient window functions on S3-hosted parquet files, and PostgreSQL/Snowflake for transactional data.

Feature Engineering SQL

-- Demand forecasting feature set: one row per product per day
-- Uses rolling aggregates, seasonality encoding, and price signals
CREATE TABLE analytics.derived.demand_features AS
WITH daily_sales AS (
    SELECT
        oi.product_id,
        DATE(o.order_date) AS sale_date,
        SUM(oi.quantity) AS units_sold,
        SUM(oi.quantity * oi.unit_price) AS revenue,
        COUNT(DISTINCT o.order_id) AS num_orders,
        AVG(oi.unit_price) AS avg_selling_price,
        AVG(oi.discount) AS avg_discount_pct
    FROM postgresql.shopify.order_items oi
    JOIN postgresql.shopify.orders o ON oi.order_id = o.order_id
    WHERE o.order_date >= DATE '2024-01-01'
    GROUP BY oi.product_id, DATE(o.order_date)
),
rolling_features AS (
    SELECT
        product_id,
        sale_date,
        units_sold,
        revenue,
        avg_selling_price,
        avg_discount_pct,
        -- Rolling aggregates
        AVG(units_sold) OVER (
            PARTITION BY product_id ORDER BY sale_date
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
        ) AS sales_rolling_7d,
        AVG(units_sold) OVER (
            PARTITION BY product_id ORDER BY sale_date
            ROWS BETWEEN 13 PRECEDING AND CURRENT ROW
        ) AS sales_rolling_14d,
        AVG(units_sold) OVER (
            PARTITION BY product_id ORDER BY sale_date
            ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
        ) AS sales_rolling_30d,
        STDDEV(units_sold) OVER (
            PARTITION BY product_id ORDER BY sale_date
            ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
        ) AS sales_volatility_30d,
        -- Lag features
        LAG(units_sold, 7) OVER (PARTITION BY product_id ORDER BY sale_date) AS sales_lag_7d,
        LAG(units_sold, 14) OVER (PARTITION BY product_id ORDER BY sale_date) AS sales_lag_14d,
        LAG(units_sold, 365) OVER (PARTITION BY product_id ORDER BY sale_date) AS sales_same_day_last_year
    FROM daily_sales
),
seasonality AS (
    SELECT
        product_id,
        sale_date,
        EXTRACT(DOW FROM sale_date) AS day_of_week,
        EXTRACT(MONTH FROM sale_date) AS month,
        CASE
            WHEN EXTRACT(DOW FROM sale_date) IN (0, 6) THEN 1 ELSE 0
        END AS is_weekend,
        CASE
            WHEN sale_date BETWEEN DATE '2025-11-28' AND DATE '2025-12-02' THEN 1  -- Black Friday
            WHEN sale_date BETWEEN DATE '2025-12-20' AND DATE '2025-12-26' THEN 1  -- Holiday
            WHEN sale_date = DATE '2025-02-14' THEN 1  -- Valentine's
            ELSE 0
        END AS is_holiday_period
    FROM rolling_features
)
SELECT
    rf.*,
    s.day_of_week,
    s.month,
    s.is_weekend,
    s.is_holiday_period,
    pc.category,
    pc.subcategory,
    pc.seasonal_flag,
    inv.quantity_on_hand,
    inv.reorder_point,
    CASE WHEN inv.quantity_on_hand <= inv.reorder_point THEN 1 ELSE 0 END AS is_low_stock
FROM rolling_features rf
JOIN seasonality s ON rf.product_id = s.product_id AND rf.sale_date = s.sale_date
LEFT JOIN postgresql.inventory.inventory inv ON rf.product_id = inv.product_id
LEFT JOIN postgresql.analytics.product_categories pc ON rf.product_id = pc.product_id
WHERE rf.sale_date >= DATE '2024-02-01';  -- Need 30 days for rolling features

Result: 14.2M rows (45K products x ~315 days with data). Execution time: 1m 47s.


Stage 4: Orchestration

Marcus builds a five-stage Temporal pipeline that runs the full forecasting workflow daily.

Pipeline DAG

┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐
│  INGEST  │──▶│ FEATURE  │──▶│  TRAIN   │──▶│ VALIDATE │──▶│  DEPLOY  │
│          │   │  ENG     │   │          │   │          │   │          │
│ Sync     │   │ Run SQL, │   │ Ray      │   │ MAPE <   │   │ Register │
│ latest   │   │ compute  │   │ Train    │   │ 15%?     │   │ model,   │
│ data     │   │ rolling  │   │ per-     │   │ Bias     │   │ update   │
│ from all │   │ features │   │ category │   │ check    │   │ Ray      │
│ sources  │   │          │   │ models   │   │          │   │ Serve    │
└──────────┘   └──────────┘   └──────────┘   └──────────┘   └──────────┘

Pipeline Configuration

{
  "pipeline_id": "demand-forecast-daily",
  "name": "Daily Demand Forecast Pipeline",
  "schedule": "0 2 * * *",
  "timeout": "2h",
  "tasks": [
    {
      "task_id": "sync_sources",
      "type": "ingestion_trigger",
      "config": {
        "connections": [
          "shopify-orders",
          "inventory-db",
          "google-analytics"
        ],
        "wait_for_completion": true,
        "timeout": "30m"
      }
    },
    {
      "task_id": "feature_engineering",
      "type": "sql_execute",
      "depends_on": ["sync_sources"],
      "config": {
        "query_ref": "analytics.queries.demand_features",
        "target_table": "analytics.derived.demand_features",
        "write_mode": "overwrite"
      },
      "retry_policy": {
        "max_attempts": 3,
        "backoff_coefficient": 2,
        "initial_interval": "60s"
      }
    },
    {
      "task_id": "model_training",
      "type": "ray_train",
      "depends_on": ["feature_engineering"],
      "config": {
        "training_script": "demand_forecast/train.py",
        "dataset": "analytics.derived.demand_features",
        "ray_config": {
          "num_workers": 4,
          "use_gpu": false,
          "resources_per_worker": { "CPU": 2, "memory": 4294967296 }
        },
        "hyperparameters": {
          "model_type": "lightgbm",
          "forecast_horizon": 14,
          "n_estimators": 500,
          "learning_rate": 0.05,
          "category_column": "category",
          "per_category_models": true
        },
        "mlflow_experiment": "demand-forecast-daily"
      }
    },
    {
      "task_id": "model_validation",
      "type": "model_evaluate",
      "depends_on": ["model_training"],
      "config": {
        "validation_window": "14d",
        "metrics": ["mape", "rmse", "bias"],
        "thresholds": {
          "mape_max": 0.15,
          "bias_range": [-0.05, 0.05]
        },
        "fail_on_threshold_breach": true,
        "holdout_split": "last_14_days"
      }
    },
    {
      "task_id": "model_deployment",
      "type": "ray_serve_update",
      "depends_on": ["model_validation"],
      "config": {
        "deployment_name": "demand-forecaster",
        "model_source": "latest_validated",
        "rollout_strategy": "blue_green",
        "health_check_timeout": "5m",
        "rollback_on_failure": true
      }
    }
  ],
  "notifications": {
    "on_failure": {
      "channel": "slack",
      "target": "#ml-ops-alerts",
      "include_logs": true
    },
    "on_success": {
      "channel": "slack",
      "target": "#data-pipelines",
      "include_metrics": true
    }
  },
  "failure_handling": {
    "task_level": {
      "sync_sources": "retry_then_fail",
      "feature_engineering": "retry_then_fail",
      "model_training": "retry_then_fail",
      "model_validation": "skip_deployment_keep_current",
      "model_deployment": "rollback_to_previous"
    }
  }
}

Stage 5: Analysis

Before the pipeline goes to production, Marcus validates the training data thoroughly.

Data Quality Rules

{
  "suite_name": "demand_features_validation",
  "expectations": [
    {
      "type": "expect_column_values_to_be_between",
      "kwargs": { "column": "units_sold", "min_value": 0, "max_value": 10000 },
      "notes": "Flag extreme outliers -- max single-day product sales historically is ~5000"
    },
    {
      "type": "expect_column_values_to_be_between",
      "kwargs": { "column": "avg_selling_price", "min_value": 0.01, "max_value": 9999.99 }
    },
    {
      "type": "expect_column_values_to_not_be_null",
      "kwargs": { "column": "sales_rolling_7d" }
    },
    {
      "type": "expect_compound_columns_to_be_unique",
      "kwargs": { "column_list": ["product_id", "sale_date"] }
    }
  ]
}

Product History Filtering

Marcus checks how many products have sufficient history for model training:

SELECT
    CASE
        WHEN day_count >= 365 THEN '12+ months'
        WHEN day_count >= 180 THEN '6-12 months'
        WHEN day_count >= 90  THEN '3-6 months'
        WHEN day_count >= 30  THEN '1-3 months'
        ELSE 'Less than 1 month'
    END AS history_bucket,
    COUNT(*) AS product_count,
    ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 1) AS pct
FROM (
    SELECT product_id, COUNT(DISTINCT sale_date) AS day_count
    FROM analytics.derived.demand_features
    GROUP BY product_id
) t
GROUP BY 1 ORDER BY MIN(day_count) DESC;
history_bucket      | product_count | pct
────────────────────┼───────────────┼──────
12+ months          | 28,412        | 62.7%
6-12 months         |  8,934        | 19.7%
3-6 months          |  4,213        |  9.3%
1-3 months          |  2,847        |  6.3%
Less than 1 month   |    885        |  2.0%

Products with less than 30 days of history (885 SKUs) are excluded from model training and handled by a category-level fallback model instead.

Feature Correlation Profiling

Marcus profiles the top correlations with units_sold to sanity-check his features:

Feature Correlations with units_sold (Pearson r):
─────────────────────────────────────────────────
sales_rolling_7d:          0.91  [expected -- autoregressive]
sales_rolling_14d:         0.87
sales_lag_7d:              0.84
sales_rolling_30d:         0.82
sales_same_day_last_year:  0.64  [good seasonal signal]
avg_discount_pct:          0.31  [promotions drive volume]
is_holiday_period:         0.28
is_weekend:                0.12
quantity_on_hand:          -0.08  [low stock suppresses sales]
avg_selling_price:         -0.14  [higher price, fewer units]

Stage 6: Productionization

Marcus trains category-specific models using Ray Train (distributed training) and deploys an ensemble to Ray Serve.

Distributed Training with Ray Train

The training runs across 4 workers, with each worker handling a subset of product categories:

Training Summary: demand-forecast-daily run #47
────────────────────────────────────────────────
Training Date:    2025-12-15 02:34:12 UTC
Duration:         18m 42s
Workers:          4 (CPU only)
Categories:       20 product categories
Models Trained:   20 (per-category LightGBM)
Total Samples:    14.2M rows

Per-Category Validation Results (14-day holdout):
─────────────────────────────────────────────────
Category               MAPE    RMSE    Bias     Status
Electronics            8.2%    12.4    -0.01    PASS
Clothing              11.4%    18.7     0.02    PASS
Home & Garden          9.1%    14.2    -0.03    PASS
Beauty                 7.8%     9.3     0.01    PASS
Sports & Outdoors     10.3%    15.8    -0.02    PASS
Books & Media          6.4%     7.1     0.00    PASS
Food & Grocery        12.1%    22.4     0.04    PASS
Toys & Games          13.8%    19.6    -0.01    PASS
Automotive             9.7%    11.2     0.01    PASS
Pet Supplies           8.9%    10.8    -0.02    PASS
... (10 more categories, all PASS)
─────────────────────────────────────────────────
Overall Weighted MAPE:  9.6%   [PASS: threshold 15%]
Overall Bias:          -0.004  [PASS: threshold +/-5%]

Ray Serve Deployment

{
  "deployment_name": "demand-forecaster",
  "model_type": "ensemble",
  "models": {
    "per_category": {
      "source": "mlflow://demand-forecast-daily/latest",
      "routing_key": "category"
    },
    "fallback": {
      "source": "mlflow://demand-forecast-global-fallback/latest",
      "description": "Used for new products with < 30 days history"
    }
  },
  "serving_config": {
    "num_replicas": 2,
    "max_concurrent_queries": 200,
    "autoscaling_config": {
      "min_replicas": 2,
      "max_replicas": 8,
      "target_num_ongoing_requests_per_replica": 25,
      "upscale_delay_s": 30,
      "downscale_delay_s": 300
    }
  },
  "batch_inference": {
    "enabled": true,
    "schedule": "0 5 * * *",
    "output_table": "analytics.forecasts.demand_14d",
    "description": "Daily batch predictions for all 45K products, 14-day horizon"
  }
}

Batch Inference Output

Every morning at 5 AM, the pipeline writes forecasts for all products:

-- Sample output from analytics.forecasts.demand_14d
SELECT product_id, product_name, category, forecast_date,
       predicted_units, prediction_interval_lower, prediction_interval_upper,
       model_version
FROM analytics.forecasts.demand_14d
WHERE forecast_date = CURRENT_DATE + INTERVAL '1' DAY
ORDER BY predicted_units DESC
LIMIT 5;
product_id | product_name                 | category    | forecast_date | predicted | lower | upper | model
───────────┼──────────────────────────────┼─────────────┼───────────────┼───────────┼───────┼───────┼───────
PRD-10482  | Wireless Noise-Cancel Buds   | Electronics | 2025-12-16    | 342       | 289   | 401   | v47
PRD-20193  | Organic Cotton T-Shirt       | Clothing    | 2025-12-16    | 287       | 241   | 339   | v47
PRD-30847  | Stainless Steel Water Bottle | Home        | 2025-12-16    | 264       | 218   | 314   | v47
PRD-15293  | Vitamin C Serum 30ml         | Beauty      | 2025-12-16    | 251       | 212   | 297   | v47
PRD-40122  | Resistance Band Set          | Sports      | 2025-12-16    | 228       | 187   | 273   | v47

Stage 7: Feedback

Marcus configures comprehensive production monitoring in the ML Workbench.

Monitoring Configuration

{
  "monitor_name": "demand-forecast-production",
  "monitors": [
    {
      "type": "accuracy_tracking",
      "metric": "mape",
      "granularity": "category",
      "evaluation_lag_days": 14,
      "alert_threshold": 0.15,
      "schedule": "0 8 * * *"
    },
    {
      "type": "feature_freshness",
      "checks": [
        { "table": "shopify.order_items", "max_age_hours": 6 },
        { "table": "inventory.inventory", "max_age_hours": 12 },
        { "table": "analytics.derived.demand_features", "max_age_hours": 24 }
      ]
    },
    {
      "type": "prediction_quality",
      "checks": [
        { "name": "coverage", "description": "All active products have forecasts", "threshold": 0.99 },
        { "name": "non_negative", "description": "No negative predictions", "threshold": 1.0 },
        { "name": "range_check", "description": "No prediction > 10x historical max", "threshold": 1.0 }
      ]
    },
    {
      "type": "automatic_retraining",
      "trigger": {
        "condition": "mape_overall > 0.12 for 3 consecutive days",
        "action": "trigger_pipeline",
        "pipeline_id": "demand-forecast-daily",
        "notification": "slack:#ml-ops-alerts"
      }
    }
  ]
}

Production Dashboard (After 30 Days)

Demand Forecaster - Production Health
──────────────────────────────────────
Period: 2025-12-15 to 2026-01-14

Pipeline Runs:        31 (30 success, 1 retry-then-success)
Avg Training Time:    19m 12s
Avg Batch Inference:  7m 34s (45,291 products x 14 days)

Accuracy Trend (MAPE):
  Week 1:   9.6%  ██████████
  Week 2:   9.8%  ██████████
  Week 3:  10.4%  ██████████▌  (holiday period)
  Week 4:  11.1%  ███████████  (post-holiday normalization)

Feature Freshness SLA:
  shopify.order_items:   99.8% within 6h   [OK]
  inventory.inventory:   100% within 12h   [OK]
  demand_features:       100% within 24h   [OK]

Retraining Triggers:     0 (MAPE never exceeded 12% for 3 days)

Serving Metrics:
  Batch inference p50:   7m 34s
  API p50 latency:       45ms
  API p99 latency:       128ms
  Error rate:            0.01%

Stage 8: Experimentation

Marcus compares three modeling approaches to find the best architecture per product category.

Model Comparison Experiment

He logs all experiments to MLflow and evaluates on the same 14-day holdout period:

Experiment: demand-model-comparison
Models: LightGBM, LSTM, Prophet
Evaluation: 14-day holdout MAPE per category (20 categories)

Results Matrix (MAPE % -- lower is better):
──────────────────────────────────────────────────────────────
Category               LightGBM    LSTM      Prophet    Winner
──────────────────────────────────────────────────────────────
Electronics              8.2%      9.1%      11.3%     LightGBM
Clothing                11.4%      9.8%      13.2%     LSTM
Home & Garden            9.1%     10.2%      10.8%     LightGBM
Beauty                   7.8%      8.4%       9.6%     LightGBM
Sports & Outdoors       10.3%     11.7%      12.1%     LightGBM
Books & Media            6.4%      7.2%       6.8%     LightGBM
Food & Grocery          12.1%      9.4%      14.7%     LSTM
Toys & Games            13.8%     10.1%      11.4%     LSTM
Automotive               9.7%     10.8%      10.2%     LightGBM
Pet Supplies             8.9%      9.3%       9.1%     LightGBM
Jewelry                 10.5%      8.7%      12.3%     LSTM
Office Supplies          7.1%      8.9%       8.2%     LightGBM
Garden (Seasonal)       14.2%     11.3%       9.8%     Prophet
Holiday Decor           16.1%     12.4%      10.2%     Prophet
Swimwear                15.3%     11.8%       9.4%     Prophet
Winter Apparel          14.7%     10.9%      10.1%     LSTM
Baby Products            8.3%      9.1%       9.4%     LightGBM
Health Supplements       7.5%      8.2%       8.8%     LightGBM
Kitchen Appliances       9.8%     10.4%      11.2%     LightGBM
Outdoor Furniture       13.4%     11.1%       8.9%     Prophet
──────────────────────────────────────────────────────────────
Wins:                    14         2          4
──────────────────────────────────────────────────────────────

Key Findings

ObservationImpact
LightGBM wins 14/20 categoriesBest default model for non-seasonal products
LSTM wins for high-variance categories (Food, Toys, Clothing)Better at capturing complex temporal patterns
Prophet wins all 4 highly seasonal categoriesExplicit seasonality decomposition matters for holiday-driven products
Training time: LightGBM 18m, LSTM 2h 14m, Prophet 45mLightGBM is 7x faster than LSTM

Per-Category Model Routing

Based on these results, Marcus implements a routing strategy:

{
  "routing_strategy": "per_category",
  "default_model": "lightgbm",
  "overrides": [
    { "categories": ["Clothing", "Food & Grocery", "Toys & Games", "Winter Apparel"], "model": "lstm" },
    { "categories": ["Garden (Seasonal)", "Holiday Decor", "Swimwear", "Outdoor Furniture"], "model": "prophet" }
  ],
  "fallback_model": "lightgbm",
  "evaluation_schedule": "monthly",
  "auto_reassign": {
    "enabled": true,
    "condition": "alternative_model_mape < current_mape * 0.9 for 30 days",
    "notification": "slack:#ml-ops-alerts"
  }
}

This routing reduced the overall weighted MAPE from 9.6% (LightGBM only) to 8.1% (routed ensemble) -- a 15.6% improvement.


Summary

StageWhat Marcus DidPlatform Components Used
IngestionConnected Shopify API, PostgreSQL, Google AnalyticsIngestion Service (Airbyte), Data Workbench
DiscoveryMapped join keys, found BI team's category tableCatalog Service, Data Workbench
QueryBuilt rolling aggregate features with federated SQLQuery Engine (Trino + DuckDB), Data Workbench
OrchestrationBuilt 5-stage daily pipeline with retry/rollbackPipeline Service (Temporal)
AnalysisValidated data quality, profiled feature correlationsData Quality Service (Great Expectations)
ProductionizationTrained per-category models, deployed to Ray ServeML Service (Ray Train, MLflow, Ray Serve)
FeedbackConfigured accuracy tracking, freshness SLAs, auto-retrainML Workbench Monitoring
ExperimentationCompared LightGBM/LSTM/Prophet, built routed ensembleML Workbench Experiments (MLflow)

Related Walkthroughs