Retail & E-Commerce

End-to-end walkthroughs showing how a mid-size e-commerce company uses the MATIH Platform to unify customer data, predict demand, optimize revenue, and make AI-driven strategic decisions.

Industry Context

Retail and e-commerce businesses generate some of the most diverse data in any industry. A single customer interaction can touch transactional databases, clickstream analytics, payment processors, marketing platforms, inventory systems, and shipping providers. The challenge is not data scarcity -- it is data fragmentation.

Most retail teams operate with siloed tools: one dashboard for revenue, another for marketing attribution, a third for inventory. Data scientists copy CSVs between systems. Executives wait days for ad-hoc analyses. The MATIH Platform consolidates these workflows into a single governed environment where every persona -- from data scientist to CEO -- works from the same trusted data.

Company Profile: NovaMart

All walkthroughs in this section follow employees at NovaMart, a fictional mid-size e-commerce company with the following profile:

Attribute	Value
Annual Revenue	$180M
Active Customers	2.1M
Product SKUs	45,000
Order Volume	~8,000 orders/day
Channels	Web, mobile app, 12 retail stores
Data Team	4 data scientists, 2 ML engineers, 3 BI analysts, 1 VP Strategy

Sample Datasets

These are the core datasets used across all four walkthroughs. In a production deployment, these tables live in their respective source systems and are ingested into the platform via Airbyte connectors or file imports.

Dataset	Source	Rows	Description
`orders`	PostgreSQL	12.4M	Order headers -- order_id, customer_id, order_date, total_amount, status, channel
`order_items`	PostgreSQL	38.7M	Line items -- order_id, product_id, quantity, unit_price, discount
`customers`	PostgreSQL	2.1M	Customer profiles -- customer_id, email, signup_date, segment, lifetime_value
`products`	PostgreSQL	45K	Product catalog -- product_id, name, category, subcategory, cost, current_price
`inventory`	PostgreSQL	45K	Current stock levels -- product_id, warehouse_id, quantity_on_hand, reorder_point
`returns`	PostgreSQL	1.8M	Return records -- return_id, order_id, reason_code, refund_amount, return_date
`clickstream`	Snowflake	340M	Web/app events -- session_id, customer_id, event_type, page_url, timestamp
`marketing_campaigns`	Google Ads API	2.3K	Campaign performance -- campaign_id, spend, impressions, clicks, conversions
`supplier_shipments`	CSV Import	156K	Inbound shipments -- shipment_id, supplier_id, product_id, quantity, eta, actual_arrival
`customer_surveys`	CSV Import	48K	NPS and satisfaction scores -- customer_id, survey_date, nps_score, comments

Data Sources

NovaMart's data lives in five systems. The platform connects to all of them through the Ingestion Service (Airbyte connectors) and the Query Engine (SQL federation).

Source	Type	Connector	Sync Mode	Frequency
NovaMart PostgreSQL	Transactional DB	Airbyte PostgreSQL CDC	Incremental (WAL)	Every 15 min
Snowflake DWH	Analytics Warehouse	Airbyte Snowflake	Incremental (timestamp)	Hourly
Shopify	E-Commerce API	Airbyte Shopify	Incremental (API cursor)	Hourly
Google Analytics / Ads	Marketing	Airbyte Google Ads	Full refresh	Daily
CSV Files	Manual exports	File Import (Data Workbench)	One-time / on-demand	As needed

Data Flow Architecture

                           NovaMart Data Flow
  ┌──────────────────────────────────────────────────────────────────┐
  │                        DATA SOURCES                             │
  │  ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌────────┐ ┌─────┐  │
  │  │PostgreSQL │ │ Snowflake │ │  Shopify  │ │ Google │ │ CSV │  │
  │  │  (OLTP)   │ │  (DWH)    │ │  (API)    │ │  Ads   │ │     │  │
  │  └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ └───┬────┘ └──┬──┘  │
  └────────┼──────────────┼─────────────┼───────────┼─────────┼─────┘
           │              │             │           │         │
           ▼              ▼             ▼           ▼         ▼
  ┌──────────────────────────────────────────────────────────────────┐
  │                   INGESTION SERVICE (Airbyte)                   │
  │         600+ connectors  |  CDC  |  Schema mapping              │
  └──────────────────────────────┬──────────────────────────────────┘
                                 │
                                 ▼
  ┌──────────────────────────────────────────────────────────────────┐
  │                     PLATFORM DATA LAYER                         │
  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌────────────────┐  │
  │  │ Catalog  │  │  Query   │  │ Quality  │  │  Governance    │  │
  │  │ Service  │  │  Engine  │  │ Service  │  │  Service       │  │
  │  │          │  │ (Trino)  │  │ (GX)     │  │ (masking,ACL)  │  │
  │  └──────────┘  └──────────┘  └──────────┘  └────────────────┘  │
  └──────────────────────────────┬──────────────────────────────────┘
                                 │
           ┌─────────────────────┼─────────────────────┐
           ▼                     ▼                     ▼
  ┌──────────────┐    ┌──────────────┐     ┌──────────────────┐
  │ ML Workbench │    │ BI Workbench │     │ Agentic Workbench│
  │              │    │              │     │                  │
  │ Models,      │    │ Dashboards,  │     │ NL queries,      │
  │ Experiments, │    │ Semantic     │     │ Multi-agent,     │
  │ Feature      │    │ Layer,       │     │ Workflow         │
  │ Store        │    │ Reports      │     │ Generation       │
  └──────────────┘    └──────────────┘     └──────────────────┘

Business KPIs

NovaMart tracks these key performance indicators across all workbenches and dashboards. Each walkthrough shows how the platform computes, monitors, and acts on these metrics.

KPI	Definition	Current Value	Target
Gross Merchandise Value (GMV)	Total sales before returns and discounts	$15.2M/month	$18M/month
Average Order Value (AOV)	Revenue / number of orders	$67.40	$75.00
Customer Churn Rate	% of customers with no purchase in 90 days	18.3%	< 15%
Customer Lifetime Value (CLTV)	Predicted total revenue per customer over 3 years	$412	$500
Inventory Turnover	COGS / average inventory value	8.2x/year	10x/year
Conversion Rate	Orders / unique sessions	3.1%	4.0%
Return Rate	Returns / orders	14.6%	< 12%
Customer Acquisition Cost (CAC)	Marketing spend / new customers acquired	$34.20	< $30
Return on Ad Spend (ROAS)	Revenue from ads / ad spend	4.8x	6.0x
Net Promoter Score (NPS)	Customer satisfaction metric (-100 to 100)	42	50+

Persona Walkthroughs

Each walkthrough follows one persona through all eight lifecycle stages, using real NovaMart data and scenarios. Start with the role closest to yours, or read all four to see how the platform enables cross-functional collaboration.

Walkthrough	Persona	Scenario	Primary Workbenches
Data Scientist Journey	Priya, Senior Data Scientist	Predicting customer churn to reduce the 18.3% churn rate	ML Workbench, Data Workbench
ML Engineer Journey	Marcus, ML Engineer	Building a production demand forecasting system for 45K SKUs	ML Workbench, Pipeline Service
BI Lead Journey	Sofia, BI Lead	Creating a real-time revenue command center for the executive team	BI Workbench, Semantic Layer
Executive Leadership Journey	David, VP of Strategy	Using AI-assisted analysis for strategic planning and board reporting	Agentic Workbench, BI Dashboards

How the Walkthroughs Connect

These four personas work on the same data at NovaMart. Their work products feed into each other:

  Priya (Data Scientist)              Marcus (ML Engineer)
  ┌──────────────────────┐            ┌──────────────────────┐
  │ Churn prediction     │            │ Demand forecasting   │
  │ model (AUC 0.87)     │───────────▶│ pipeline (daily)     │
  │                      │  model     │                      │
  │ Feature engineering  │  registry  │ Ray Serve deployment │
  └──────────┬───────────┘            └──────────┬───────────┘
             │ churn scores                      │ forecasts
             ▼                                   ▼
  ┌──────────────────────┐            ┌──────────────────────┐
  │ Sofia (BI Lead)      │            │ David (VP Strategy)  │
  │                      │◀───────────│                      │
  │ Revenue Command      │ dashboard  │ AI-driven strategic  │
  │ Center dashboards    │ access     │ scenario analysis    │
  │                      │            │                      │
  └──────────────────────┘            └──────────────────────┘

Priya's churn model scores feed into Sofia's customer health dashboards. Marcus's demand forecasts inform inventory decisions that David reviews in strategic planning. The semantic layer ensures all four personas use the same metric definitions.

Prerequisites

Before following these walkthroughs, ensure you have:

A running MATIH Platform instance (see Installation)
The NovaMart sample dataset loaded (available in the platform's sample data catalog)
Completed the Quickstart Tutorials for the workbenches you plan to use

Related Chapters

Data Ingestion -- Configuring Airbyte connectors and file imports
Query Engine -- SQL federation and the Trino query engine
Data Catalog -- Metadata management, profiling, and lineage
Pipelines -- Temporal-based orchestration
ML Service -- Model training, registry, and serving
AI Service -- Text-to-SQL and multi-agent chat

Overview Data Scientist Journey