SaaS & Technology

End-to-end walkthroughs showing how a B2B SaaS company uses the MATIH Platform to unify product analytics, predict churn, optimize product-led growth, and make data-driven decisions about pricing, features, and infrastructure investments.

Industry Context

SaaS and technology companies are inherently data-native, but that does not mean they are data-mature. Most SaaS teams face a paradox: they instrument everything yet struggle to answer fundamental business questions. Product events stream into Segment or RudderStack, billing data lives in Stripe, CRM records sit in Salesforce, infrastructure costs accumulate in AWS Cost Explorer, and support tickets flow through Zendesk. Each system has its own dashboard, its own definitions, and its own version of the truth.

The result is a fragmented analytics landscape where the product team calculates DAU differently from the finance team, "churn" means something different to sales versus customer success, and nobody can answer "what is the actual cost to serve this customer segment?" without a week of manual data wrangling.

Challenge	Impact	Platform Response
Product-led growth metrics	Feature adoption, activation funnels, and usage cohorts span multiple systems	Query Engine federates PostgreSQL, Snowflake, and Kafka-sourced event data in a single SQL query
Revenue analytics	MRR, ARR, NDR, and churn calculations vary by team and tool	Semantic Layer provides canonical SaaS metric definitions, validated against finance
Usage-based pricing	Metering, entitlement enforcement, and billing reconciliation require real-time data	Streaming ingestion from Kafka events, Pipeline Service for aggregation, BI dashboards for monitoring
Infrastructure cost optimization	Cloud spend grows faster than revenue; cost-per-user is opaque	Federated queries across AWS Cost Explorer, product usage, and billing data
Customer health scoring	Churn signals are spread across product, support, billing, and CRM	ML Workbench for predictive models, Pipeline Service for daily scoring, CRM sync for action
SOC2 and compliance	Enterprise customers require audit trails, data governance, and access controls	Governance Service with policy enforcement, column masking, and audit logging

Company Profile: CloudFlow

All walkthroughs in this section follow employees at CloudFlow, a fictional B2B project management SaaS company with the following profile:

Attribute	Value
Annual Recurring Revenue (ARR)	$20M
Monthly Recurring Revenue (MRR)	$1.72M
Active Users	100,000 across 4,200 workspaces
Plans	Free (0-5 users), Pro ( $15/user/mo), Business ($ 29/user/mo), Enterprise (custom)
Channels	Web app, desktop (Electron), mobile (iOS/Android), API
Data Team	3 data scientists, 2 ML engineers, 2 BI analysts, 1 data engineering lead, CEO
Growth Model	Product-led growth with sales-assisted Enterprise tier
Compliance	SOC2 Type II, GDPR, CCPA

Sample Datasets

These are the core datasets used across all four walkthroughs. In production, these tables live in their respective source systems and are ingested into the platform via Airbyte connectors, Kafka consumers, or file imports.

Product Data

Dataset	Source	Rows	Description
`events`	Kafka (via Segment)	500M	User action events -- event_id, user_id, workspace_id, event_type, properties (JSON), timestamp, session_id
`users`	PostgreSQL	100K	User profiles -- user_id, email, workspace_id, role, signup_date, last_active_at, plan_type
`workspaces`	PostgreSQL	4,200	Workspace (account) records -- workspace_id, name, plan, created_at, owner_user_id, seat_count
`projects`	PostgreSQL	180K	Projects within workspaces -- project_id, workspace_id, name, created_at, task_count, last_activity
`feature_flags`	PostgreSQL	200	Feature flag configurations -- flag_id, name, rollout_percentage, target_segments, enabled, created_at

Business Data

Dataset	Source	Rows	Description
`subscriptions`	Stripe API	80K	Subscription records -- subscription_id, workspace_id, plan, mrr, start_date, status, trial_end
`invoices`	Stripe API	420K	Invoice history -- invoice_id, subscription_id, amount, currency, paid_at, period_start, period_end
`crm_accounts`	Salesforce	4,200	CRM account records -- account_id, workspace_id, csm_owner, health_score, renewal_date, arr
`crm_opportunities`	Salesforce	8,500	Sales pipeline -- opportunity_id, account_id, stage, amount, close_date, owner
`support_tickets`	Zendesk	50K	Support tickets -- ticket_id, workspace_id, user_id, subject, priority, status, created_at, resolved_at, satisfaction_rating

Operational Data

Dataset	Source	Rows	Description
`api_requests`	Kafka (internal)	1B	API call logs -- request_id, workspace_id, endpoint, method, status_code, latency_ms, timestamp
`infrastructure_costs`	AWS Cost Explorer	365K	Daily cost per service -- date, service_name, region, cost_usd, usage_quantity, usage_unit
`incidents`	PagerDuty API	1,200	Production incidents -- incident_id, severity, service, duration_minutes, created_at, resolved_at
`nps_surveys`	CSV Import	20K	NPS survey responses -- survey_id, user_id, workspace_id, score, comment, submitted_at
`ab_test_assignments`	PostgreSQL	2.4M	A/B test cohort assignments -- user_id, experiment_id, variant, assigned_at

Data Sources and Connectivity

                          CloudFlow Data Flow
 ┌────────────────────────────────────────────────────────────────────────┐
 │                          DATA SOURCES                                 │
 │ ┌──────────┐ ┌──────────┐ ┌────────┐ ┌──────────┐ ┌───────┐ ┌─────┐ │
 │ │Product   │ │ Segment  │ │ Stripe │ │Salesforce│ │ AWS   │ │Zendes│ │
 │ │PostgreSQL│ │ (Kafka)  │ │Billing │ │  CRM     │ │ Cost  │ │ k   │ │
 │ │          │ │          │ │  API   │ │          │ │Explorer│ │     │ │
 │ │users     │ │events    │ │invoices│ │accounts  │ │daily  │ │ticket│ │
 │ │workspaces│ │500M/mo   │ │subs    │ │opps      │ │costs  │ │ s   │ │
 │ │projects  │ │          │ │        │ │          │ │       │ │     │ │
 │ └────┬─────┘ └────┬─────┘ └───┬────┘ └────┬─────┘ └───┬───┘ └──┬──┘ │
 └──────┼────────────┼───────────┼───────────┼───────────┼────────┼────┘
        │            │           │           │           │        │
        ▼            ▼           ▼           ▼           ▼        ▼
 ┌────────────────────────────────────────────────────────────────────────┐
 │                    INGESTION SERVICE (Airbyte)                        │
 │        600+ connectors  |  CDC  |  Streaming  |  Schema mapping      │
 └─────────────────────────────────┬──────────────────────────────────────┘
                                   │
                                   ▼
 ┌────────────────────────────────────────────────────────────────────────┐
 │                      PLATFORM DATA LAYER                              │
 │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌────────────────┐        │
 │  │ Catalog  │  │  Query   │  │ Quality  │  │  Governance    │        │
 │  │ Service  │  │  Engine  │  │ Service  │  │  Service       │        │
 │  │          │  │ (Trino)  │  │ (GX)     │  │ (SOC2, GDPR)   │        │
 │  └──────────┘  └──────────┘  └──────────┘  └────────────────┘        │
 └─────────────────────────────────┬──────────────────────────────────────┘
                                   │
            ┌──────────────────────┼──────────────────────┐
            ▼                      ▼                      ▼
 ┌──────────────────┐   ┌──────────────────┐   ┌──────────────────┐
 │  ML Workbench    │   │  BI Workbench    │   │ Agentic Workbench│
 │                  │   │                  │   │                  │
 │ Churn models,    │   │ SaaS dashboards, │   │ NL queries,      │
 │ Recommendations, │   │ PLG metrics,     │   │ Scenario models, │
 │ Feature store    │   │ Cohort analysis  │   │ Board prep       │
 └──────────────────┘   └──────────────────┘   └──────────────────┘

Source	Connector Type	Sync Mode	Frequency
Product PostgreSQL	Airbyte PostgreSQL CDC	Incremental (WAL)	Every 15 minutes
Segment Events (Kafka)	Airbyte Kafka connector	Streaming (real-time)	Continuous
Stripe Billing API	Airbyte Stripe connector	Incremental (API cursor)	Hourly
Salesforce CRM	Airbyte Salesforce connector	Incremental (API cursor)	Every 30 minutes
AWS Cost Explorer	Airbyte AWS Cost Explorer	Full refresh	Daily
Zendesk Support	Airbyte Zendesk connector	Incremental (API cursor)	Every 15 minutes
PagerDuty Incidents	Airbyte PagerDuty connector	Incremental	Hourly
NPS Surveys	File Import (Data Workbench)	One-time / on-demand	Quarterly

Business KPIs

CloudFlow tracks these key performance indicators across all workbenches and dashboards. Each walkthrough shows how the platform computes, monitors, and acts on these metrics.

Revenue Metrics

KPI	Definition	Current	Target
Monthly Recurring Revenue (MRR)	Sum of all active subscription monthly amounts	$1.72M	$2.1M
Annual Recurring Revenue (ARR)	MRR x 12	$20M	$25M
Net Dollar Retention (NDR)	(Starting MRR + expansion - contraction - churn) / Starting MRR	108%	120%
Logo Churn Rate	Workspaces cancelled / total workspaces (annual)	18%	< 12%
Revenue Churn Rate	MRR lost / starting MRR (annual)	25%	< 15%
Average Revenue Per Account (ARPA)	MRR / active workspaces	$410/mo	$500/mo

Growth Metrics

KPI	Definition	Current	Target
Daily Active Users (DAU)	Unique users with >= 1 event per day	34,000	45,000
DAU/MAU Ratio	DAU / Monthly Active Users (stickiness)	0.41	0.55
Feature Adoption Rate	% of users who use a feature within 30 days of release	23%	40%
Activation Rate	% of signups completing 3 key actions within 7 days	31%	50%
Time to Value	Median days from signup to first project created	2.4 days	< 1 day

Unit Economics

KPI	Definition	Current	Target
Customer Acquisition Cost (CAC)	Total S&M spend / new paying workspaces	$2,800	< $2,000
LTV:CAC Ratio	Customer Lifetime Value / CAC	3.2x	> 5x
Burn Rate	Monthly net cash outflow	$420K/mo	< $300K/mo
Infrastructure Cost per User	Total cloud spend / active users	$1.82/mo	< $1.20/mo
Gross Margin	(Revenue - COGS) / Revenue	72%	> 80%

SaaS Metrics Semantic Model

The Semantic Layer defines canonical formulas for all SaaS metrics, ensuring consistent calculations across dashboards, reports, and AI-generated analyses:

-- Semantic Layer metric definitions (simplified)
-- These are registered in the BI Workbench Semantic Layer
 
-- MRR Calculation
SELECT
    date_trunc('month', period_start)        AS month,
    SUM(CASE WHEN status = 'active' THEN mrr ELSE 0 END) AS mrr,
    SUM(CASE WHEN status = 'active'
         AND created_at >= date_trunc('month', period_start)
         THEN mrr ELSE 0 END)                AS new_mrr,
    SUM(CASE WHEN expansion_mrr > 0
         THEN expansion_mrr ELSE 0 END)      AS expansion_mrr,
    SUM(CASE WHEN status = 'cancelled'
         THEN prev_mrr ELSE 0 END)           AS churned_mrr
FROM subscriptions_monthly
GROUP BY 1;
 
-- Net Dollar Retention (trailing 12 months)
SELECT
    cohort_month,
    SUM(mrr_month_12) / NULLIF(SUM(mrr_month_0), 0) AS ndr_12m
FROM workspace_cohort_mrr
GROUP BY 1;

Persona Walkthroughs

Each walkthrough follows one persona through all eight lifecycle stages, using real CloudFlow data and scenarios. Start with the role closest to yours, or read all four to see how the platform enables cross-functional collaboration.

Walkthrough	Persona	Scenario	Primary Workbenches
Data Scientist Journey	Zara Ahmed, Senior Data Scientist	Building a churn prediction model to identify at-risk B2B accounts before renewal	ML Workbench, Data Workbench
ML Engineer Journey	Raj Patel, ML Engineer	Building an intelligent feature recommendation engine using collaborative filtering	ML Workbench, Pipeline Service
BI Lead Journey	Emily Park, BI Lead	Creating the PLG analytics platform -- SaaS metrics, cohorts, and self-service analytics	BI Workbench, Semantic Layer
Executive Leadership Journey	Michael Torres, CEO	Strategic intelligence -- board prep, pricing analysis, and growth scenario modeling	Agentic Workbench, BI Dashboards

How the Walkthroughs Connect

These four personas work on the same data at CloudFlow. Their work products feed into each other:

  Zara (Data Scientist)              Raj (ML Engineer)
  ┌──────────────────────┐           ┌──────────────────────┐
  │ Churn prediction     │           │ Feature recommend-   │
  │ model (AUC 0.83)     │──────────▶│ ation engine         │
  │                      │  model    │                      │
  │ Feature engineering  │  registry │ Ray Serve deployment │
  └──────────┬───────────┘           └──────────┬───────────┘
             │ churn scores                     │ adoption data
             ▼                                  ▼
  ┌──────────────────────┐           ┌──────────────────────┐
  │ Emily (BI Lead)      │           │ Michael (CEO)        │
  │                      │◀──────────│                      │
  │ PLG metrics          │ dashboard │ Strategic scenario   │
  │ dashboards, cohort   │ access    │ analysis, board      │
  │ analysis             │           │ reporting            │
  └──────────────────────┘           └──────────────────────┘

Zara's churn model scores feed into Emily's customer health dashboards and trigger CSM outreach workflows. Raj's recommendation engine drives the feature adoption metrics that Emily tracks. Emily's SaaS metrics dashboards provide the board-ready numbers that Michael reviews. The semantic layer ensures all four personas calculate MRR, churn, and NDR the same way.

Prerequisites

Before following these walkthroughs, ensure you have:

A running MATIH Platform instance (see Installation)
The CloudFlow sample dataset loaded (available in the platform's sample data catalog)
Completed the Quickstart Tutorials for the workbenches you plan to use

Related Chapters

Data Ingestion -- Configuring Airbyte connectors and Kafka streaming
Query Engine -- SQL federation across PostgreSQL, Snowflake, and event stores
Data Catalog -- Metadata management, profiling, and lineage
Pipelines -- Temporal-based orchestration for ETL and ML workflows
ML Service -- Model training, registry, and Ray Serve deployment
AI Service -- Text-to-SQL and multi-agent conversational analytics

Executive Leadership Journey Data Scientist Journey