SaaS & Technology
End-to-end walkthroughs showing how a B2B SaaS company uses the MATIH Platform to unify product analytics, predict churn, optimize product-led growth, and make data-driven decisions about pricing, features, and infrastructure investments.
Industry Context
SaaS and technology companies are inherently data-native, but that does not mean they are data-mature. Most SaaS teams face a paradox: they instrument everything yet struggle to answer fundamental business questions. Product events stream into Segment or RudderStack, billing data lives in Stripe, CRM records sit in Salesforce, infrastructure costs accumulate in AWS Cost Explorer, and support tickets flow through Zendesk. Each system has its own dashboard, its own definitions, and its own version of the truth.
The result is a fragmented analytics landscape where the product team calculates DAU differently from the finance team, "churn" means something different to sales versus customer success, and nobody can answer "what is the actual cost to serve this customer segment?" without a week of manual data wrangling.
| Challenge | Impact | Platform Response |
|---|---|---|
| Product-led growth metrics | Feature adoption, activation funnels, and usage cohorts span multiple systems | Query Engine federates PostgreSQL, Snowflake, and Kafka-sourced event data in a single SQL query |
| Revenue analytics | MRR, ARR, NDR, and churn calculations vary by team and tool | Semantic Layer provides canonical SaaS metric definitions, validated against finance |
| Usage-based pricing | Metering, entitlement enforcement, and billing reconciliation require real-time data | Streaming ingestion from Kafka events, Pipeline Service for aggregation, BI dashboards for monitoring |
| Infrastructure cost optimization | Cloud spend grows faster than revenue; cost-per-user is opaque | Federated queries across AWS Cost Explorer, product usage, and billing data |
| Customer health scoring | Churn signals are spread across product, support, billing, and CRM | ML Workbench for predictive models, Pipeline Service for daily scoring, CRM sync for action |
| SOC2 and compliance | Enterprise customers require audit trails, data governance, and access controls | Governance Service with policy enforcement, column masking, and audit logging |
Company Profile: CloudFlow
All walkthroughs in this section follow employees at CloudFlow, a fictional B2B project management SaaS company with the following profile:
| Attribute | Value |
|---|---|
| Annual Recurring Revenue (ARR) | $20M |
| Monthly Recurring Revenue (MRR) | $1.72M |
| Active Users | 100,000 across 4,200 workspaces |
| Plans | Free (0-5 users), Pro (29/user/mo), Enterprise (custom) |
| Channels | Web app, desktop (Electron), mobile (iOS/Android), API |
| Data Team | 3 data scientists, 2 ML engineers, 2 BI analysts, 1 data engineering lead, CEO |
| Growth Model | Product-led growth with sales-assisted Enterprise tier |
| Compliance | SOC2 Type II, GDPR, CCPA |
Sample Datasets
These are the core datasets used across all four walkthroughs. In production, these tables live in their respective source systems and are ingested into the platform via Airbyte connectors, Kafka consumers, or file imports.
Product Data
| Dataset | Source | Rows | Description |
|---|---|---|---|
events | Kafka (via Segment) | 500M | User action events -- event_id, user_id, workspace_id, event_type, properties (JSON), timestamp, session_id |
users | PostgreSQL | 100K | User profiles -- user_id, email, workspace_id, role, signup_date, last_active_at, plan_type |
workspaces | PostgreSQL | 4,200 | Workspace (account) records -- workspace_id, name, plan, created_at, owner_user_id, seat_count |
projects | PostgreSQL | 180K | Projects within workspaces -- project_id, workspace_id, name, created_at, task_count, last_activity |
feature_flags | PostgreSQL | 200 | Feature flag configurations -- flag_id, name, rollout_percentage, target_segments, enabled, created_at |
Business Data
| Dataset | Source | Rows | Description |
|---|---|---|---|
subscriptions | Stripe API | 80K | Subscription records -- subscription_id, workspace_id, plan, mrr, start_date, status, trial_end |
invoices | Stripe API | 420K | Invoice history -- invoice_id, subscription_id, amount, currency, paid_at, period_start, period_end |
crm_accounts | Salesforce | 4,200 | CRM account records -- account_id, workspace_id, csm_owner, health_score, renewal_date, arr |
crm_opportunities | Salesforce | 8,500 | Sales pipeline -- opportunity_id, account_id, stage, amount, close_date, owner |
support_tickets | Zendesk | 50K | Support tickets -- ticket_id, workspace_id, user_id, subject, priority, status, created_at, resolved_at, satisfaction_rating |
Operational Data
| Dataset | Source | Rows | Description |
|---|---|---|---|
api_requests | Kafka (internal) | 1B | API call logs -- request_id, workspace_id, endpoint, method, status_code, latency_ms, timestamp |
infrastructure_costs | AWS Cost Explorer | 365K | Daily cost per service -- date, service_name, region, cost_usd, usage_quantity, usage_unit |
incidents | PagerDuty API | 1,200 | Production incidents -- incident_id, severity, service, duration_minutes, created_at, resolved_at |
nps_surveys | CSV Import | 20K | NPS survey responses -- survey_id, user_id, workspace_id, score, comment, submitted_at |
ab_test_assignments | PostgreSQL | 2.4M | A/B test cohort assignments -- user_id, experiment_id, variant, assigned_at |
Data Sources and Connectivity
CloudFlow Data Flow
┌────────────────────────────────────────────────────────────────────────┐
│ DATA SOURCES │
│ ┌──────────┐ ┌──────────┐ ┌────────┐ ┌──────────┐ ┌───────┐ ┌─────┐ │
│ │Product │ │ Segment │ │ Stripe │ │Salesforce│ │ AWS │ │Zendes│ │
│ │PostgreSQL│ │ (Kafka) │ │Billing │ │ CRM │ │ Cost │ │ k │ │
│ │ │ │ │ │ API │ │ │ │Explorer│ │ │ │
│ │users │ │events │ │invoices│ │accounts │ │daily │ │ticket│ │
│ │workspaces│ │500M/mo │ │subs │ │opps │ │costs │ │ s │ │
│ │projects │ │ │ │ │ │ │ │ │ │ │ │
│ └────┬─────┘ └────┬─────┘ └───┬────┘ └────┬─────┘ └───┬───┘ └──┬──┘ │
└──────┼────────────┼───────────┼───────────┼───────────┼────────┼────┘
│ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼
┌────────────────────────────────────────────────────────────────────────┐
│ INGESTION SERVICE (Airbyte) │
│ 600+ connectors | CDC | Streaming | Schema mapping │
└─────────────────────────────────┬──────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────────────────────┐
│ PLATFORM DATA LAYER │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────────────┐ │
│ │ Catalog │ │ Query │ │ Quality │ │ Governance │ │
│ │ Service │ │ Engine │ │ Service │ │ Service │ │
│ │ │ │ (Trino) │ │ (GX) │ │ (SOC2, GDPR) │ │
│ └──────────┘ └──────────┘ └──────────┘ └────────────────┘ │
└─────────────────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────┼──────────────────────┐
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ ML Workbench │ │ BI Workbench │ │ Agentic Workbench│
│ │ │ │ │ │
│ Churn models, │ │ SaaS dashboards, │ │ NL queries, │
│ Recommendations, │ │ PLG metrics, │ │ Scenario models, │
│ Feature store │ │ Cohort analysis │ │ Board prep │
└──────────────────┘ └──────────────────┘ └──────────────────┘| Source | Connector Type | Sync Mode | Frequency |
|---|---|---|---|
| Product PostgreSQL | Airbyte PostgreSQL CDC | Incremental (WAL) | Every 15 minutes |
| Segment Events (Kafka) | Airbyte Kafka connector | Streaming (real-time) | Continuous |
| Stripe Billing API | Airbyte Stripe connector | Incremental (API cursor) | Hourly |
| Salesforce CRM | Airbyte Salesforce connector | Incremental (API cursor) | Every 30 minutes |
| AWS Cost Explorer | Airbyte AWS Cost Explorer | Full refresh | Daily |
| Zendesk Support | Airbyte Zendesk connector | Incremental (API cursor) | Every 15 minutes |
| PagerDuty Incidents | Airbyte PagerDuty connector | Incremental | Hourly |
| NPS Surveys | File Import (Data Workbench) | One-time / on-demand | Quarterly |
Business KPIs
CloudFlow tracks these key performance indicators across all workbenches and dashboards. Each walkthrough shows how the platform computes, monitors, and acts on these metrics.
Revenue Metrics
| KPI | Definition | Current | Target |
|---|---|---|---|
| Monthly Recurring Revenue (MRR) | Sum of all active subscription monthly amounts | $1.72M | $2.1M |
| Annual Recurring Revenue (ARR) | MRR x 12 | $20M | $25M |
| Net Dollar Retention (NDR) | (Starting MRR + expansion - contraction - churn) / Starting MRR | 108% | 120% |
| Logo Churn Rate | Workspaces cancelled / total workspaces (annual) | 18% | < 12% |
| Revenue Churn Rate | MRR lost / starting MRR (annual) | 25% | < 15% |
| Average Revenue Per Account (ARPA) | MRR / active workspaces | $410/mo | $500/mo |
Growth Metrics
| KPI | Definition | Current | Target |
|---|---|---|---|
| Daily Active Users (DAU) | Unique users with >= 1 event per day | 34,000 | 45,000 |
| DAU/MAU Ratio | DAU / Monthly Active Users (stickiness) | 0.41 | 0.55 |
| Feature Adoption Rate | % of users who use a feature within 30 days of release | 23% | 40% |
| Activation Rate | % of signups completing 3 key actions within 7 days | 31% | 50% |
| Time to Value | Median days from signup to first project created | 2.4 days | < 1 day |
Unit Economics
| KPI | Definition | Current | Target |
|---|---|---|---|
| Customer Acquisition Cost (CAC) | Total S&M spend / new paying workspaces | $2,800 | < $2,000 |
| LTV:CAC Ratio | Customer Lifetime Value / CAC | 3.2x | > 5x |
| Burn Rate | Monthly net cash outflow | $420K/mo | < $300K/mo |
| Infrastructure Cost per User | Total cloud spend / active users | $1.82/mo | < $1.20/mo |
| Gross Margin | (Revenue - COGS) / Revenue | 72% | > 80% |
SaaS Metrics Semantic Model
The Semantic Layer defines canonical formulas for all SaaS metrics, ensuring consistent calculations across dashboards, reports, and AI-generated analyses:
-- Semantic Layer metric definitions (simplified)
-- These are registered in the BI Workbench Semantic Layer
-- MRR Calculation
SELECT
date_trunc('month', period_start) AS month,
SUM(CASE WHEN status = 'active' THEN mrr ELSE 0 END) AS mrr,
SUM(CASE WHEN status = 'active'
AND created_at >= date_trunc('month', period_start)
THEN mrr ELSE 0 END) AS new_mrr,
SUM(CASE WHEN expansion_mrr > 0
THEN expansion_mrr ELSE 0 END) AS expansion_mrr,
SUM(CASE WHEN status = 'cancelled'
THEN prev_mrr ELSE 0 END) AS churned_mrr
FROM subscriptions_monthly
GROUP BY 1;
-- Net Dollar Retention (trailing 12 months)
SELECT
cohort_month,
SUM(mrr_month_12) / NULLIF(SUM(mrr_month_0), 0) AS ndr_12m
FROM workspace_cohort_mrr
GROUP BY 1;Persona Walkthroughs
Each walkthrough follows one persona through all eight lifecycle stages, using real CloudFlow data and scenarios. Start with the role closest to yours, or read all four to see how the platform enables cross-functional collaboration.
| Walkthrough | Persona | Scenario | Primary Workbenches |
|---|---|---|---|
| Data Scientist Journey | Zara Ahmed, Senior Data Scientist | Building a churn prediction model to identify at-risk B2B accounts before renewal | ML Workbench, Data Workbench |
| ML Engineer Journey | Raj Patel, ML Engineer | Building an intelligent feature recommendation engine using collaborative filtering | ML Workbench, Pipeline Service |
| BI Lead Journey | Emily Park, BI Lead | Creating the PLG analytics platform -- SaaS metrics, cohorts, and self-service analytics | BI Workbench, Semantic Layer |
| Executive Leadership Journey | Michael Torres, CEO | Strategic intelligence -- board prep, pricing analysis, and growth scenario modeling | Agentic Workbench, BI Dashboards |
How the Walkthroughs Connect
These four personas work on the same data at CloudFlow. Their work products feed into each other:
Zara (Data Scientist) Raj (ML Engineer)
┌──────────────────────┐ ┌──────────────────────┐
│ Churn prediction │ │ Feature recommend- │
│ model (AUC 0.83) │──────────▶│ ation engine │
│ │ model │ │
│ Feature engineering │ registry │ Ray Serve deployment │
└──────────┬───────────┘ └──────────┬───────────┘
│ churn scores │ adoption data
▼ ▼
┌──────────────────────┐ ┌──────────────────────┐
│ Emily (BI Lead) │ │ Michael (CEO) │
│ │◀──────────│ │
│ PLG metrics │ dashboard │ Strategic scenario │
│ dashboards, cohort │ access │ analysis, board │
│ analysis │ │ reporting │
└──────────────────────┘ └──────────────────────┘Zara's churn model scores feed into Emily's customer health dashboards and trigger CSM outreach workflows. Raj's recommendation engine drives the feature adoption metrics that Emily tracks. Emily's SaaS metrics dashboards provide the board-ready numbers that Michael reviews. The semantic layer ensures all four personas calculate MRR, churn, and NDR the same way.
Prerequisites
Before following these walkthroughs, ensure you have:
- A running MATIH Platform instance (see Installation)
- The CloudFlow sample dataset loaded (available in the platform's sample data catalog)
- Completed the Quickstart Tutorials for the workbenches you plan to use
Related Chapters
- Data Ingestion -- Configuring Airbyte connectors and Kafka streaming
- Query Engine -- SQL federation across PostgreSQL, Snowflake, and event stores
- Data Catalog -- Metadata management, profiling, and lineage
- Pipelines -- Temporal-based orchestration for ETL and ML workflows
- ML Service -- Model training, registry, and Ray Serve deployment
- AI Service -- Text-to-SQL and multi-agent conversational analytics