MATIH Platform is in active MVP development. Documentation reflects current implementation status.
21. Industry Examples & Walkthroughs
SaaS & Technology
Industry Overview

SaaS & Technology

End-to-end walkthroughs showing how a B2B SaaS company uses the MATIH Platform to unify product analytics, predict churn, optimize product-led growth, and make data-driven decisions about pricing, features, and infrastructure investments.


Industry Context

SaaS and technology companies are inherently data-native, but that does not mean they are data-mature. Most SaaS teams face a paradox: they instrument everything yet struggle to answer fundamental business questions. Product events stream into Segment or RudderStack, billing data lives in Stripe, CRM records sit in Salesforce, infrastructure costs accumulate in AWS Cost Explorer, and support tickets flow through Zendesk. Each system has its own dashboard, its own definitions, and its own version of the truth.

The result is a fragmented analytics landscape where the product team calculates DAU differently from the finance team, "churn" means something different to sales versus customer success, and nobody can answer "what is the actual cost to serve this customer segment?" without a week of manual data wrangling.

ChallengeImpactPlatform Response
Product-led growth metricsFeature adoption, activation funnels, and usage cohorts span multiple systemsQuery Engine federates PostgreSQL, Snowflake, and Kafka-sourced event data in a single SQL query
Revenue analyticsMRR, ARR, NDR, and churn calculations vary by team and toolSemantic Layer provides canonical SaaS metric definitions, validated against finance
Usage-based pricingMetering, entitlement enforcement, and billing reconciliation require real-time dataStreaming ingestion from Kafka events, Pipeline Service for aggregation, BI dashboards for monitoring
Infrastructure cost optimizationCloud spend grows faster than revenue; cost-per-user is opaqueFederated queries across AWS Cost Explorer, product usage, and billing data
Customer health scoringChurn signals are spread across product, support, billing, and CRMML Workbench for predictive models, Pipeline Service for daily scoring, CRM sync for action
SOC2 and complianceEnterprise customers require audit trails, data governance, and access controlsGovernance Service with policy enforcement, column masking, and audit logging

Company Profile: CloudFlow

All walkthroughs in this section follow employees at CloudFlow, a fictional B2B project management SaaS company with the following profile:

AttributeValue
Annual Recurring Revenue (ARR)$20M
Monthly Recurring Revenue (MRR)$1.72M
Active Users100,000 across 4,200 workspaces
PlansFree (0-5 users), Pro (15/user/mo),Business(15/user/mo), Business (29/user/mo), Enterprise (custom)
ChannelsWeb app, desktop (Electron), mobile (iOS/Android), API
Data Team3 data scientists, 2 ML engineers, 2 BI analysts, 1 data engineering lead, CEO
Growth ModelProduct-led growth with sales-assisted Enterprise tier
ComplianceSOC2 Type II, GDPR, CCPA

Sample Datasets

These are the core datasets used across all four walkthroughs. In production, these tables live in their respective source systems and are ingested into the platform via Airbyte connectors, Kafka consumers, or file imports.

Product Data

DatasetSourceRowsDescription
eventsKafka (via Segment)500MUser action events -- event_id, user_id, workspace_id, event_type, properties (JSON), timestamp, session_id
usersPostgreSQL100KUser profiles -- user_id, email, workspace_id, role, signup_date, last_active_at, plan_type
workspacesPostgreSQL4,200Workspace (account) records -- workspace_id, name, plan, created_at, owner_user_id, seat_count
projectsPostgreSQL180KProjects within workspaces -- project_id, workspace_id, name, created_at, task_count, last_activity
feature_flagsPostgreSQL200Feature flag configurations -- flag_id, name, rollout_percentage, target_segments, enabled, created_at

Business Data

DatasetSourceRowsDescription
subscriptionsStripe API80KSubscription records -- subscription_id, workspace_id, plan, mrr, start_date, status, trial_end
invoicesStripe API420KInvoice history -- invoice_id, subscription_id, amount, currency, paid_at, period_start, period_end
crm_accountsSalesforce4,200CRM account records -- account_id, workspace_id, csm_owner, health_score, renewal_date, arr
crm_opportunitiesSalesforce8,500Sales pipeline -- opportunity_id, account_id, stage, amount, close_date, owner
support_ticketsZendesk50KSupport tickets -- ticket_id, workspace_id, user_id, subject, priority, status, created_at, resolved_at, satisfaction_rating

Operational Data

DatasetSourceRowsDescription
api_requestsKafka (internal)1BAPI call logs -- request_id, workspace_id, endpoint, method, status_code, latency_ms, timestamp
infrastructure_costsAWS Cost Explorer365KDaily cost per service -- date, service_name, region, cost_usd, usage_quantity, usage_unit
incidentsPagerDuty API1,200Production incidents -- incident_id, severity, service, duration_minutes, created_at, resolved_at
nps_surveysCSV Import20KNPS survey responses -- survey_id, user_id, workspace_id, score, comment, submitted_at
ab_test_assignmentsPostgreSQL2.4MA/B test cohort assignments -- user_id, experiment_id, variant, assigned_at

Data Sources and Connectivity

                          CloudFlow Data Flow
 ┌────────────────────────────────────────────────────────────────────────┐
 │                          DATA SOURCES                                 │
 │ ┌──────────┐ ┌──────────┐ ┌────────┐ ┌──────────┐ ┌───────┐ ┌─────┐ │
 │ │Product   │ │ Segment  │ │ Stripe │ │Salesforce│ │ AWS   │ │Zendes│ │
 │ │PostgreSQL│ │ (Kafka)  │ │Billing │ │  CRM     │ │ Cost  │ │ k   │ │
 │ │          │ │          │ │  API   │ │          │ │Explorer│ │     │ │
 │ │users     │ │events    │ │invoices│ │accounts  │ │daily  │ │ticket│ │
 │ │workspaces│ │500M/mo   │ │subs    │ │opps      │ │costs  │ │ s   │ │
 │ │projects  │ │          │ │        │ │          │ │       │ │     │ │
 │ └────┬─────┘ └────┬─────┘ └───┬────┘ └────┬─────┘ └───┬───┘ └──┬──┘ │
 └──────┼────────────┼───────────┼───────────┼───────────┼────────┼────┘
        │            │           │           │           │        │
        ▼            ▼           ▼           ▼           ▼        ▼
 ┌────────────────────────────────────────────────────────────────────────┐
 │                    INGESTION SERVICE (Airbyte)                        │
 │        600+ connectors  |  CDC  |  Streaming  |  Schema mapping      │
 └─────────────────────────────────┬──────────────────────────────────────┘


 ┌────────────────────────────────────────────────────────────────────────┐
 │                      PLATFORM DATA LAYER                              │
 │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌────────────────┐        │
 │  │ Catalog  │  │  Query   │  │ Quality  │  │  Governance    │        │
 │  │ Service  │  │  Engine  │  │ Service  │  │  Service       │        │
 │  │          │  │ (Trino)  │  │ (GX)     │  │ (SOC2, GDPR)   │        │
 │  └──────────┘  └──────────┘  └──────────┘  └────────────────┘        │
 └─────────────────────────────────┬──────────────────────────────────────┘

            ┌──────────────────────┼──────────────────────┐
            ▼                      ▼                      ▼
 ┌──────────────────┐   ┌──────────────────┐   ┌──────────────────┐
 │  ML Workbench    │   │  BI Workbench    │   │ Agentic Workbench│
 │                  │   │                  │   │                  │
 │ Churn models,    │   │ SaaS dashboards, │   │ NL queries,      │
 │ Recommendations, │   │ PLG metrics,     │   │ Scenario models, │
 │ Feature store    │   │ Cohort analysis  │   │ Board prep       │
 └──────────────────┘   └──────────────────┘   └──────────────────┘
SourceConnector TypeSync ModeFrequency
Product PostgreSQLAirbyte PostgreSQL CDCIncremental (WAL)Every 15 minutes
Segment Events (Kafka)Airbyte Kafka connectorStreaming (real-time)Continuous
Stripe Billing APIAirbyte Stripe connectorIncremental (API cursor)Hourly
Salesforce CRMAirbyte Salesforce connectorIncremental (API cursor)Every 30 minutes
AWS Cost ExplorerAirbyte AWS Cost ExplorerFull refreshDaily
Zendesk SupportAirbyte Zendesk connectorIncremental (API cursor)Every 15 minutes
PagerDuty IncidentsAirbyte PagerDuty connectorIncrementalHourly
NPS SurveysFile Import (Data Workbench)One-time / on-demandQuarterly

Business KPIs

CloudFlow tracks these key performance indicators across all workbenches and dashboards. Each walkthrough shows how the platform computes, monitors, and acts on these metrics.

Revenue Metrics

KPIDefinitionCurrentTarget
Monthly Recurring Revenue (MRR)Sum of all active subscription monthly amounts$1.72M$2.1M
Annual Recurring Revenue (ARR)MRR x 12$20M$25M
Net Dollar Retention (NDR)(Starting MRR + expansion - contraction - churn) / Starting MRR108%120%
Logo Churn RateWorkspaces cancelled / total workspaces (annual)18%< 12%
Revenue Churn RateMRR lost / starting MRR (annual)25%< 15%
Average Revenue Per Account (ARPA)MRR / active workspaces$410/mo$500/mo

Growth Metrics

KPIDefinitionCurrentTarget
Daily Active Users (DAU)Unique users with >= 1 event per day34,00045,000
DAU/MAU RatioDAU / Monthly Active Users (stickiness)0.410.55
Feature Adoption Rate% of users who use a feature within 30 days of release23%40%
Activation Rate% of signups completing 3 key actions within 7 days31%50%
Time to ValueMedian days from signup to first project created2.4 days< 1 day

Unit Economics

KPIDefinitionCurrentTarget
Customer Acquisition Cost (CAC)Total S&M spend / new paying workspaces$2,800< $2,000
LTV:CAC RatioCustomer Lifetime Value / CAC3.2x> 5x
Burn RateMonthly net cash outflow$420K/mo< $300K/mo
Infrastructure Cost per UserTotal cloud spend / active users$1.82/mo< $1.20/mo
Gross Margin(Revenue - COGS) / Revenue72%> 80%

SaaS Metrics Semantic Model

The Semantic Layer defines canonical formulas for all SaaS metrics, ensuring consistent calculations across dashboards, reports, and AI-generated analyses:

-- Semantic Layer metric definitions (simplified)
-- These are registered in the BI Workbench Semantic Layer
 
-- MRR Calculation
SELECT
    date_trunc('month', period_start)        AS month,
    SUM(CASE WHEN status = 'active' THEN mrr ELSE 0 END) AS mrr,
    SUM(CASE WHEN status = 'active'
         AND created_at >= date_trunc('month', period_start)
         THEN mrr ELSE 0 END)                AS new_mrr,
    SUM(CASE WHEN expansion_mrr > 0
         THEN expansion_mrr ELSE 0 END)      AS expansion_mrr,
    SUM(CASE WHEN status = 'cancelled'
         THEN prev_mrr ELSE 0 END)           AS churned_mrr
FROM subscriptions_monthly
GROUP BY 1;
 
-- Net Dollar Retention (trailing 12 months)
SELECT
    cohort_month,
    SUM(mrr_month_12) / NULLIF(SUM(mrr_month_0), 0) AS ndr_12m
FROM workspace_cohort_mrr
GROUP BY 1;

Persona Walkthroughs

Each walkthrough follows one persona through all eight lifecycle stages, using real CloudFlow data and scenarios. Start with the role closest to yours, or read all four to see how the platform enables cross-functional collaboration.

WalkthroughPersonaScenarioPrimary Workbenches
Data Scientist JourneyZara Ahmed, Senior Data ScientistBuilding a churn prediction model to identify at-risk B2B accounts before renewalML Workbench, Data Workbench
ML Engineer JourneyRaj Patel, ML EngineerBuilding an intelligent feature recommendation engine using collaborative filteringML Workbench, Pipeline Service
BI Lead JourneyEmily Park, BI LeadCreating the PLG analytics platform -- SaaS metrics, cohorts, and self-service analyticsBI Workbench, Semantic Layer
Executive Leadership JourneyMichael Torres, CEOStrategic intelligence -- board prep, pricing analysis, and growth scenario modelingAgentic Workbench, BI Dashboards

How the Walkthroughs Connect

These four personas work on the same data at CloudFlow. Their work products feed into each other:

  Zara (Data Scientist)              Raj (ML Engineer)
  ┌──────────────────────┐           ┌──────────────────────┐
  │ Churn prediction     │           │ Feature recommend-   │
  │ model (AUC 0.83)     │──────────▶│ ation engine         │
  │                      │  model    │                      │
  │ Feature engineering  │  registry │ Ray Serve deployment │
  └──────────┬───────────┘           └──────────┬───────────┘
             │ churn scores                     │ adoption data
             ▼                                  ▼
  ┌──────────────────────┐           ┌──────────────────────┐
  │ Emily (BI Lead)      │           │ Michael (CEO)        │
  │                      │◀──────────│                      │
  │ PLG metrics          │ dashboard │ Strategic scenario   │
  │ dashboards, cohort   │ access    │ analysis, board      │
  │ analysis             │           │ reporting            │
  └──────────────────────┘           └──────────────────────┘

Zara's churn model scores feed into Emily's customer health dashboards and trigger CSM outreach workflows. Raj's recommendation engine drives the feature adoption metrics that Emily tracks. Emily's SaaS metrics dashboards provide the board-ready numbers that Michael reviews. The semantic layer ensures all four personas calculate MRR, churn, and NDR the same way.


Prerequisites

Before following these walkthroughs, ensure you have:

  1. A running MATIH Platform instance (see Installation)
  2. The CloudFlow sample dataset loaded (available in the platform's sample data catalog)
  3. Completed the Quickstart Tutorials for the workbenches you plan to use

Related Chapters

  • Data Ingestion -- Configuring Airbyte connectors and Kafka streaming
  • Query Engine -- SQL federation across PostgreSQL, Snowflake, and event stores
  • Data Catalog -- Metadata management, profiling, and lineage
  • Pipelines -- Temporal-based orchestration for ETL and ML workflows
  • ML Service -- Model training, registry, and Ray Serve deployment
  • AI Service -- Text-to-SQL and multi-agent conversational analytics