Retail & E-Commerce
End-to-end walkthroughs showing how a mid-size e-commerce company uses the MATIH Platform to unify customer data, predict demand, optimize revenue, and make AI-driven strategic decisions.
Industry Context
Retail and e-commerce businesses generate some of the most diverse data in any industry. A single customer interaction can touch transactional databases, clickstream analytics, payment processors, marketing platforms, inventory systems, and shipping providers. The challenge is not data scarcity -- it is data fragmentation.
Most retail teams operate with siloed tools: one dashboard for revenue, another for marketing attribution, a third for inventory. Data scientists copy CSVs between systems. Executives wait days for ad-hoc analyses. The MATIH Platform consolidates these workflows into a single governed environment where every persona -- from data scientist to CEO -- works from the same trusted data.
Company Profile: NovaMart
All walkthroughs in this section follow employees at NovaMart, a fictional mid-size e-commerce company with the following profile:
| Attribute | Value |
|---|---|
| Annual Revenue | $180M |
| Active Customers | 2.1M |
| Product SKUs | 45,000 |
| Order Volume | ~8,000 orders/day |
| Channels | Web, mobile app, 12 retail stores |
| Data Team | 4 data scientists, 2 ML engineers, 3 BI analysts, 1 VP Strategy |
Sample Datasets
These are the core datasets used across all four walkthroughs. In a production deployment, these tables live in their respective source systems and are ingested into the platform via Airbyte connectors or file imports.
| Dataset | Source | Rows | Description |
|---|---|---|---|
orders | PostgreSQL | 12.4M | Order headers -- order_id, customer_id, order_date, total_amount, status, channel |
order_items | PostgreSQL | 38.7M | Line items -- order_id, product_id, quantity, unit_price, discount |
customers | PostgreSQL | 2.1M | Customer profiles -- customer_id, email, signup_date, segment, lifetime_value |
products | PostgreSQL | 45K | Product catalog -- product_id, name, category, subcategory, cost, current_price |
inventory | PostgreSQL | 45K | Current stock levels -- product_id, warehouse_id, quantity_on_hand, reorder_point |
returns | PostgreSQL | 1.8M | Return records -- return_id, order_id, reason_code, refund_amount, return_date |
clickstream | Snowflake | 340M | Web/app events -- session_id, customer_id, event_type, page_url, timestamp |
marketing_campaigns | Google Ads API | 2.3K | Campaign performance -- campaign_id, spend, impressions, clicks, conversions |
supplier_shipments | CSV Import | 156K | Inbound shipments -- shipment_id, supplier_id, product_id, quantity, eta, actual_arrival |
customer_surveys | CSV Import | 48K | NPS and satisfaction scores -- customer_id, survey_date, nps_score, comments |
Data Sources
NovaMart's data lives in five systems. The platform connects to all of them through the Ingestion Service (Airbyte connectors) and the Query Engine (SQL federation).
| Source | Type | Connector | Sync Mode | Frequency |
|---|---|---|---|---|
| NovaMart PostgreSQL | Transactional DB | Airbyte PostgreSQL CDC | Incremental (WAL) | Every 15 min |
| Snowflake DWH | Analytics Warehouse | Airbyte Snowflake | Incremental (timestamp) | Hourly |
| Shopify | E-Commerce API | Airbyte Shopify | Incremental (API cursor) | Hourly |
| Google Analytics / Ads | Marketing | Airbyte Google Ads | Full refresh | Daily |
| CSV Files | Manual exports | File Import (Data Workbench) | One-time / on-demand | As needed |
Data Flow Architecture
NovaMart Data Flow
┌──────────────────────────────────────────────────────────────────┐
│ DATA SOURCES │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌────────┐ ┌─────┐ │
│ │PostgreSQL │ │ Snowflake │ │ Shopify │ │ Google │ │ CSV │ │
│ │ (OLTP) │ │ (DWH) │ │ (API) │ │ Ads │ │ │ │
│ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ └───┬────┘ └──┬──┘ │
└────────┼──────────────┼─────────────┼───────────┼─────────┼─────┘
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
┌──────────────────────────────────────────────────────────────────┐
│ INGESTION SERVICE (Airbyte) │
│ 600+ connectors | CDC | Schema mapping │
└──────────────────────────────┬──────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ PLATFORM DATA LAYER │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────────────┐ │
│ │ Catalog │ │ Query │ │ Quality │ │ Governance │ │
│ │ Service │ │ Engine │ │ Service │ │ Service │ │
│ │ │ │ (Trino) │ │ (GX) │ │ (masking,ACL) │ │
│ └──────────┘ └──────────┘ └──────────┘ └────────────────┘ │
└──────────────────────────────┬──────────────────────────────────┘
│
┌─────────────────────┼─────────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────────┐
│ ML Workbench │ │ BI Workbench │ │ Agentic Workbench│
│ │ │ │ │ │
│ Models, │ │ Dashboards, │ │ NL queries, │
│ Experiments, │ │ Semantic │ │ Multi-agent, │
│ Feature │ │ Layer, │ │ Workflow │
│ Store │ │ Reports │ │ Generation │
└──────────────┘ └──────────────┘ └──────────────────┘Business KPIs
NovaMart tracks these key performance indicators across all workbenches and dashboards. Each walkthrough shows how the platform computes, monitors, and acts on these metrics.
| KPI | Definition | Current Value | Target |
|---|---|---|---|
| Gross Merchandise Value (GMV) | Total sales before returns and discounts | $15.2M/month | $18M/month |
| Average Order Value (AOV) | Revenue / number of orders | $67.40 | $75.00 |
| Customer Churn Rate | % of customers with no purchase in 90 days | 18.3% | < 15% |
| Customer Lifetime Value (CLTV) | Predicted total revenue per customer over 3 years | $412 | $500 |
| Inventory Turnover | COGS / average inventory value | 8.2x/year | 10x/year |
| Conversion Rate | Orders / unique sessions | 3.1% | 4.0% |
| Return Rate | Returns / orders | 14.6% | < 12% |
| Customer Acquisition Cost (CAC) | Marketing spend / new customers acquired | $34.20 | < $30 |
| Return on Ad Spend (ROAS) | Revenue from ads / ad spend | 4.8x | 6.0x |
| Net Promoter Score (NPS) | Customer satisfaction metric (-100 to 100) | 42 | 50+ |
Persona Walkthroughs
Each walkthrough follows one persona through all eight lifecycle stages, using real NovaMart data and scenarios. Start with the role closest to yours, or read all four to see how the platform enables cross-functional collaboration.
| Walkthrough | Persona | Scenario | Primary Workbenches |
|---|---|---|---|
| Data Scientist Journey | Priya, Senior Data Scientist | Predicting customer churn to reduce the 18.3% churn rate | ML Workbench, Data Workbench |
| ML Engineer Journey | Marcus, ML Engineer | Building a production demand forecasting system for 45K SKUs | ML Workbench, Pipeline Service |
| BI Lead Journey | Sofia, BI Lead | Creating a real-time revenue command center for the executive team | BI Workbench, Semantic Layer |
| Executive Leadership Journey | David, VP of Strategy | Using AI-assisted analysis for strategic planning and board reporting | Agentic Workbench, BI Dashboards |
How the Walkthroughs Connect
These four personas work on the same data at NovaMart. Their work products feed into each other:
Priya (Data Scientist) Marcus (ML Engineer)
┌──────────────────────┐ ┌──────────────────────┐
│ Churn prediction │ │ Demand forecasting │
│ model (AUC 0.87) │───────────▶│ pipeline (daily) │
│ │ model │ │
│ Feature engineering │ registry │ Ray Serve deployment │
└──────────┬───────────┘ └──────────┬───────────┘
│ churn scores │ forecasts
▼ ▼
┌──────────────────────┐ ┌──────────────────────┐
│ Sofia (BI Lead) │ │ David (VP Strategy) │
│ │◀───────────│ │
│ Revenue Command │ dashboard │ AI-driven strategic │
│ Center dashboards │ access │ scenario analysis │
│ │ │ │
└──────────────────────┘ └──────────────────────┘Priya's churn model scores feed into Sofia's customer health dashboards. Marcus's demand forecasts inform inventory decisions that David reviews in strategic planning. The semantic layer ensures all four personas use the same metric definitions.
Prerequisites
Before following these walkthroughs, ensure you have:
- A running MATIH Platform instance (see Installation)
- The NovaMart sample dataset loaded (available in the platform's sample data catalog)
- Completed the Quickstart Tutorials for the workbenches you plan to use
Related Chapters
- Data Ingestion -- Configuring Airbyte connectors and file imports
- Query Engine -- SQL federation and the Trino query engine
- Data Catalog -- Metadata management, profiling, and lineage
- Pipelines -- Temporal-based orchestration
- ML Service -- Model training, registry, and serving
- AI Service -- Text-to-SQL and multi-agent chat