MATIH Platform is in active MVP development. Documentation reflects current implementation status.
21. Industry Examples & Walkthroughs
SaaS & Technology
Data Scientist Journey

Data Scientist Journey: User Churn Prediction for a B2B SaaS Product

Persona: Zara Ahmed, Senior Data Scientist at CloudFlow Goal: Build a churn prediction model to identify at-risk B2B accounts before renewal, enabling the Customer Success team to intervene proactively and reduce the 18% annual logo churn rate.

Primary Workbenches: ML Workbench, Data Workbench Supporting Services: Ingestion Service, Catalog Service, Query Engine, Pipeline Service, Data Quality Service, ML Service (Ray Serve)


Business Context

CloudFlow's churn problem is not uniform. The company's 18% annual logo churn rate masks a more alarming reality: 25% revenue churn. Larger accounts with higher ARR are churning at a disproportionate rate, and the Customer Success team only learns about at-risk accounts when the cancellation request arrives -- too late for meaningful intervention.

Zara's mandate is clear: build a model that identifies at-risk accounts 60 days before renewal, with enough lead time for the CSM team to engage, diagnose the problem, and offer targeted retention actions.

  Current State                         Target State
  ┌────────────────────────┐            ┌────────────────────────┐
  │ CSM learns about       │            │ Model flags account    │
  │ churn at cancellation   │            │ 60 days before renewal │
  │                        │            │                        │
  │ Reactive: 0 days lead  │  ────▶     │ Proactive: 60 day lead │
  │ time, 5% save rate     │            │ time, 35% save rate    │
  │                        │            │                        │
  │ Revenue churn: 25%     │            │ Revenue churn target:  │
  │                        │            │ < 15%                  │
  └────────────────────────┘            └────────────────────────┘

Stage 1: Ingestion

Zara begins by ensuring all the data sources she needs are connected and syncing. She opens the Data Workbench and navigates to the Ingestion panel to review existing connections and configure new ones.

Existing Connections

The data engineering team has already connected the primary data sources:

SourceConnectorSync ModeFrequencyStatus
Product PostgreSQLAirbyte PostgreSQL CDCIncremental (WAL)Every 15 minActive
Segment Events (Kafka)Airbyte KafkaStreamingContinuousActive
Stripe BillingAirbyte StripeIncrementalHourlyActive

New Connections Needed

Zara needs two additional sources for her churn model -- CRM data (for renewal dates and CSM notes) and support ticket data (for sentiment signals):

Salesforce CRM connection:

{
  "source": "salesforce",
  "connector": "airbyte/source-salesforce",
  "config": {
    "streams": ["Account", "Opportunity", "Task", "Note"],
    "sync_mode": "incremental",
    "cursor_field": "SystemModstamp",
    "frequency": "every_30_minutes"
  },
  "schema_mapping": {
    "Account": {
      "destination_table": "crm_accounts",
      "primary_key": "Id",
      "field_mapping": {
        "Id": "account_id",
        "CloudFlow_Workspace_Id__c": "workspace_id",
        "OwnerId": "csm_owner",
        "Health_Score__c": "health_score",
        "Contract_End_Date__c": "renewal_date",
        "ARR__c": "arr"
      }
    }
  }
}

Zendesk Support connection:

{
  "source": "zendesk",
  "connector": "airbyte/source-zendesk-support",
  "config": {
    "streams": ["tickets", "ticket_comments", "satisfaction_ratings"],
    "sync_mode": "incremental",
    "cursor_field": "updated_at",
    "frequency": "every_15_minutes"
  }
}

Sync Configuration Strategy

For churn prediction, data freshness requirements vary by source:

Data TypeFreshness RequirementRationale
Product eventsReal-time (streaming)Usage drop-offs are the strongest churn signal
Support tickets< 15 minutesUrgent tickets signal immediate dissatisfaction
Billing data< 1 hourDowngrade signals matter but are less time-sensitive
CRM data< 30 minutesCSM notes and health scores update infrequently
NPS surveysDaily batchQuarterly surveys, batch processing is sufficient

Stage 2: Discovery

With data flowing, Zara switches to the Data Workbench Catalog to explore what is available, understand data quality, and discover existing assets she can build on.

Catalog Exploration

Zara searches the catalog for "user activity" and discovers 347 distinct event types in the events table. She profiles the event distribution:

-- Data Workbench: Catalog profiling query
SELECT
    event_type,
    COUNT(*)                                    AS event_count,
    COUNT(DISTINCT user_id)                     AS unique_users,
    COUNT(DISTINCT workspace_id)                AS unique_workspaces,
    MIN(timestamp)                              AS first_seen,
    MAX(timestamp)                              AS last_seen
FROM events
WHERE timestamp >= CURRENT_DATE - INTERVAL '90' DAY
GROUP BY event_type
ORDER BY event_count DESC
LIMIT 20;

Top events discovered:

Event TypeCount (90d)Unique UsersSignal Type
page_viewed142M78KEngagement
task_created28M52KCore action
task_completed19M44KCore action
comment_added12M38KCollaboration
file_uploaded8.4M31KDepth
invite_sent620K12KExpansion
integration_connected180K8.2KDepth
export_requested95K6.1KPossible churn signal

Key Discovery: User Activity Profiling

Zara profiles user activity levels and finds a concerning distribution:

-- Profile monthly active days per user
SELECT
    CASE
        WHEN active_days >= 20 THEN 'Power User (20+ days)'
        WHEN active_days >= 10 THEN 'Regular (10-19 days)'
        WHEN active_days >= 5  THEN 'Light (5-9 days)'
        ELSE 'At Risk (< 5 days)'
    END                          AS user_segment,
    COUNT(*)                     AS user_count,
    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 1) AS pct
FROM (
    SELECT user_id, COUNT(DISTINCT DATE(timestamp)) AS active_days
    FROM events
    WHERE timestamp >= CURRENT_DATE - INTERVAL '30' DAY
    GROUP BY user_id
) user_activity
GROUP BY 1
ORDER BY 2 DESC;
SegmentUsersPercentage
At Risk (< 5 days)40,00040%
Light (5-9 days)22,00022%
Regular (10-19 days)21,00021%
Power User (20+ days)17,00017%

40% of users are active fewer than 5 days per month -- a much larger at-risk population than the team assumed.

Lineage Discovery

Zara traces the data lineage for an existing health_score column in the crm_accounts table and discovers it was built by the product team using a simple formula:

event (raw)
  └──▶ session_aggregate (Pipeline Service, hourly)
         └──▶ user_activity_daily (Pipeline Service, daily)
                └──▶ workspace_health_score (Pipeline Service, daily)
                       └──▶ crm_accounts.health_score (Salesforce sync, bidirectional)

The existing health score uses only 3 signals (login frequency, task creation, team size). Zara's model will incorporate 15+ features for a more accurate prediction.


Stage 3: Query

Zara builds her churn feature set using the Query Engine, federating across PostgreSQL (product data), Snowflake (event warehouse), and Salesforce (CRM) in a single query session.

Feature Engineering: Usage Trends

Rolling usage trends are the strongest churn predictor. Zara computes 7-day, 14-day, and 30-day rolling averages and their rate of change:

-- Feature: Usage frequency trends with rolling windows
-- Federated query: PostgreSQL (users) + Snowflake (events)
WITH daily_usage AS (
    SELECT
        u.workspace_id,
        DATE(e.timestamp)                       AS activity_date,
        COUNT(DISTINCT e.user_id)               AS active_users,
        COUNT(*)                                AS total_events,
        COUNT(DISTINCT e.event_type)            AS distinct_actions,
        COUNT(CASE WHEN e.event_type IN ('task_created', 'task_completed',
              'comment_added', 'file_uploaded') THEN 1 END) AS core_actions
    FROM snowflake.events e
    JOIN postgresql.users u ON e.user_id = u.user_id
    WHERE e.timestamp >= CURRENT_DATE - INTERVAL '90' DAY
    GROUP BY u.workspace_id, DATE(e.timestamp)
),
rolling_features AS (
    SELECT
        workspace_id,
        activity_date,
        -- 7-day rolling averages
        AVG(active_users)   OVER (PARTITION BY workspace_id
            ORDER BY activity_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
            AS avg_dau_7d,
        AVG(core_actions)   OVER (PARTITION BY workspace_id
            ORDER BY activity_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
            AS avg_core_actions_7d,
        -- 30-day rolling averages
        AVG(active_users)   OVER (PARTITION BY workspace_id
            ORDER BY activity_date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW)
            AS avg_dau_30d,
        AVG(core_actions)   OVER (PARTITION BY workspace_id
            ORDER BY activity_date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW)
            AS avg_core_actions_30d
    FROM daily_usage
)
SELECT
    workspace_id,
    activity_date,
    avg_dau_7d,
    avg_dau_30d,
    -- Usage trend: ratio of recent to longer-term average
    -- < 1.0 means declining usage (churn signal)
    ROUND(avg_dau_7d / NULLIF(avg_dau_30d, 0), 3)        AS usage_trend_ratio,
    ROUND(avg_core_actions_7d / NULLIF(avg_core_actions_30d, 0), 3)
                                                           AS action_trend_ratio
FROM rolling_features
WHERE activity_date = CURRENT_DATE - INTERVAL '1' DAY;

Feature Engineering: Feature Adoption Depth

Accounts using fewer than 3 core features churn at 4x the rate of accounts using 5+:

-- Feature: Feature adoption depth per workspace
WITH feature_usage AS (
    SELECT
        u.workspace_id,
        CASE
            WHEN e.event_type LIKE 'board_%'       THEN 'kanban_boards'
            WHEN e.event_type LIKE 'gantt_%'        THEN 'gantt_charts'
            WHEN e.event_type LIKE 'timeline_%'     THEN 'timelines'
            WHEN e.event_type LIKE 'automation_%'   THEN 'automations'
            WHEN e.event_type LIKE 'form_%'         THEN 'forms'
            WHEN e.event_type LIKE 'dashboard_%'    THEN 'dashboards'
            WHEN e.event_type LIKE 'integration_%'  THEN 'integrations'
            WHEN e.event_type LIKE 'report_%'       THEN 'reports'
            WHEN e.event_type LIKE 'api_%'          THEN 'api_access'
            WHEN e.event_type = 'invite_sent'       THEN 'team_invites'
        END AS feature_category
    FROM snowflake.events e
    JOIN postgresql.users u ON e.user_id = u.user_id
    WHERE e.timestamp >= CURRENT_DATE - INTERVAL '30' DAY
      AND e.event_type NOT IN ('page_viewed', 'session_start', 'session_end')
)
SELECT
    workspace_id,
    COUNT(DISTINCT feature_category)          AS features_adopted,
    ARRAY_AGG(DISTINCT feature_category)      AS feature_list
FROM feature_usage
WHERE feature_category IS NOT NULL
GROUP BY workspace_id;

Feature Engineering: Collaboration and Support Signals

-- Feature: Team collaboration patterns + support ticket sentiment
SELECT
    w.workspace_id,
    -- Collaboration signals
    COUNT(DISTINCT CASE WHEN e.event_type = 'comment_added'
          THEN e.user_id END)                         AS users_commenting,
    COUNT(CASE WHEN e.event_type = 'invite_sent'
          THEN 1 END)                                 AS invites_sent_30d,
    w.seat_count,
    ROUND(COUNT(DISTINCT e.user_id) * 1.0
          / NULLIF(w.seat_count, 0), 2)               AS seat_utilization,
    -- Support signals (from Zendesk)
    COUNT(DISTINCT t.ticket_id)                       AS support_tickets_30d,
    AVG(t.satisfaction_rating)                         AS avg_csat,
    COUNT(CASE WHEN t.priority IN ('high', 'urgent')
          THEN 1 END)                                 AS urgent_tickets_30d,
    -- Billing signals (from Stripe)
    s.plan                                            AS current_plan,
    CASE WHEN s.mrr < LAG(s.mrr) OVER (PARTITION BY s.workspace_id
         ORDER BY s.period_start) THEN 1 ELSE 0 END  AS downgraded_last_period
FROM postgresql.workspaces w
JOIN snowflake.events e ON w.workspace_id = e.workspace_id
    AND e.timestamp >= CURRENT_DATE - INTERVAL '30' DAY
LEFT JOIN zendesk.support_tickets t ON w.workspace_id = t.workspace_id
    AND t.created_at >= CURRENT_DATE - INTERVAL '30' DAY
LEFT JOIN stripe.subscriptions s ON w.workspace_id = s.workspace_id
    AND s.status = 'active'
GROUP BY w.workspace_id, w.seat_count, s.plan, s.mrr, s.period_start;

Stage 4: Orchestration

Zara builds a daily pipeline using the Pipeline Service (Temporal) that computes all churn features, runs the model, and pushes scores to the CRM.

Pipeline Definition

{
  "pipeline": {
    "name": "churn-prediction-daily",
    "schedule": "0 6 * * *",
    "description": "Daily churn risk scoring for all active workspaces",
    "stages": [
      {
        "name": "extract_features",
        "type": "sql_transform",
        "query_ref": "churn_features_v3.sql",
        "output_table": "ml.churn_features_daily",
        "timeout_minutes": 30
      },
      {
        "name": "quality_gate",
        "type": "data_quality",
        "depends_on": ["extract_features"],
        "checks": [
          {
            "name": "event_data_freshness",
            "type": "freshness",
            "table": "events",
            "column": "timestamp",
            "max_stale_hours": 2,
            "severity": "critical",
            "on_failure": "abort_pipeline"
          },
          {
            "name": "feature_completeness",
            "type": "column_not_null",
            "table": "ml.churn_features_daily",
            "columns": ["workspace_id", "avg_dau_7d", "features_adopted"],
            "max_null_pct": 5,
            "severity": "warning"
          },
          {
            "name": "workspace_count",
            "type": "row_count",
            "table": "ml.churn_features_daily",
            "min_rows": 3500,
            "max_rows": 5000,
            "severity": "critical"
          }
        ]
      },
      {
        "name": "score_accounts",
        "type": "model_inference",
        "depends_on": ["quality_gate"],
        "model": "churn-predictor-v3",
        "model_registry": "ml_workbench",
        "input_table": "ml.churn_features_daily",
        "output_table": "ml.churn_scores_daily",
        "batch_size": 500
      },
      {
        "name": "flag_at_risk",
        "type": "sql_transform",
        "depends_on": ["score_accounts"],
        "query": "INSERT INTO ml.at_risk_accounts SELECT workspace_id, churn_probability, risk_tier, scored_at FROM ml.churn_scores_daily WHERE churn_probability >= 0.65",
        "output_table": "ml.at_risk_accounts"
      },
      {
        "name": "sync_to_crm",
        "type": "reverse_sync",
        "depends_on": ["flag_at_risk"],
        "destination": "salesforce",
        "object": "Account",
        "mapping": {
          "workspace_id": "CloudFlow_Workspace_Id__c",
          "churn_probability": "Churn_Risk_Score__c",
          "risk_tier": "Risk_Tier__c"
        }
      }
    ]
  }
}

Quality Gate: Event Freshness

The quality gate on event data freshness is critical. If Segment has a lag > 2 hours, the usage features will understate activity and produce false positives:

Pipeline Run: 2026-02-28 06:00 UTC
  ├── extract_features .............. PASSED (12m 34s)
  ├── quality_gate
  │   ├── event_data_freshness ..... PASSED (latest event: 05:58 UTC, lag: 2 min)
  │   ├── feature_completeness ..... PASSED (null rate: 1.2%)
  │   └── workspace_count .......... PASSED (4,187 workspaces scored)
  ├── score_accounts ............... PASSED (3m 12s)
  ├── flag_at_risk ................. PASSED (847 accounts flagged)
  └── sync_to_crm .................. PASSED (847 records synced)

Stage 5: Analysis

Before training the model, Zara performs deep analysis on churn patterns to validate her feature hypotheses and check for data quality issues.

Churn Pattern Analysis

-- Analyze churn rates by customer segment
SELECT
    CASE
        WHEN s.mrr >= 5000  THEN 'Enterprise ($5K+)'
        WHEN s.mrr >= 1000  THEN 'Mid-Market ($1K-$5K)'
        WHEN s.mrr >= 200   THEN 'SMB ($200-$1K)'
        ELSE 'Self-Serve (< $200)'
    END                                             AS segment,
    COUNT(DISTINCT w.workspace_id)                  AS total_accounts,
    COUNT(DISTINCT CASE WHEN w.status = 'cancelled'
          THEN w.workspace_id END)                  AS churned,
    ROUND(COUNT(DISTINCT CASE WHEN w.status = 'cancelled'
          THEN w.workspace_id END) * 100.0
          / COUNT(DISTINCT w.workspace_id), 1)      AS logo_churn_pct,
    SUM(CASE WHEN w.status = 'cancelled'
        THEN s.mrr ELSE 0 END)                      AS churned_mrr,
    ROUND(SUM(CASE WHEN w.status = 'cancelled' THEN s.mrr ELSE 0 END)
          * 100.0 / SUM(s.mrr), 1)                  AS revenue_churn_pct
FROM workspaces w
JOIN subscriptions s ON w.workspace_id = s.workspace_id
WHERE s.period_start >= CURRENT_DATE - INTERVAL '365' DAY
GROUP BY 1
ORDER BY revenue_churn_pct DESC;
SegmentAccountsChurnedLogo ChurnChurned MRRRevenue Churn
Enterprise ($5K+)1201411.7%$98K32.1%
Mid-Market (1K1K-5K)4807215.0%$158K27.3%
SMB (200200-1K)1,80032418.0%$121K19.8%
Self-Serve (< $200)1,80034219.0%$53K16.4%

Key finding: Enterprise accounts have the lowest logo churn but the highest revenue churn -- losing one 7K/monthaccounthurtsmorethanlosingten7K/month account hurts more than losing ten 50/month accounts.

Feature Adoption and Churn Correlation

-- Churn rate by number of core features adopted
SELECT
    features_adopted,
    COUNT(*)                                        AS account_count,
    COUNT(CASE WHEN churned = 1 THEN 1 END)         AS churned_count,
    ROUND(COUNT(CASE WHEN churned = 1 THEN 1 END)
          * 100.0 / COUNT(*), 1)                    AS churn_rate
FROM ml.churn_features_daily
GROUP BY features_adopted
ORDER BY features_adopted;
Features AdoptedAccountsChurnedChurn Rate
1-21,24037230.0%
3-41,58022114.0%
5-6840597.0%
7+540193.5%

Accounts using fewer than 3 core features churn at 4x the rate of those using 5+. This validates feature adoption depth as a top predictor.

Survivorship Bias Check

Zara checks for survivorship bias in the training data -- if she only includes currently active accounts, she will miss the patterns of accounts that already churned:

-- Ensure training data includes churned accounts from the past 12 months
SELECT
    CASE WHEN status = 'cancelled' THEN 'Churned' ELSE 'Active' END AS label,
    COUNT(*)                                                          AS count,
    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 1)               AS pct
FROM workspaces
WHERE created_at <= CURRENT_DATE - INTERVAL '90' DAY  -- exclude very new accounts
GROUP BY 1;
LabelCountPercentage
Active3,45082.3%
Churned74217.7%

The class imbalance (82/18) is manageable. Zara will use stratified sampling and evaluate with AUC-PR in addition to AUC-ROC.


Stage 6: Productionization

Zara trains and deploys her churn model using the ML Workbench.

Model Training

# ML Workbench: Notebook - churn_model_training_v3.ipynb
import pandas as pd
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.metrics import roc_auc_score, precision_recall_curve, auc
import lightgbm as lgb
import mlflow
 
# Load feature set from Query Engine
features_df = matih.query("""
    SELECT * FROM ml.churn_training_features
    WHERE snapshot_date BETWEEN DATE '2025-03-01' AND DATE '2026-02-28'
""")
 
# Feature columns
feature_cols = [
    'avg_dau_7d', 'avg_dau_14d', 'avg_dau_30d',
    'usage_trend_ratio_7d_30d', 'action_trend_ratio',
    'features_adopted', 'seat_utilization',
    'invites_sent_30d', 'users_commenting',
    'support_tickets_30d', 'urgent_tickets_30d', 'avg_csat',
    'days_since_last_login', 'mrr', 'account_age_days',
    'downgraded_last_period', 'nps_score'
]
 
X = features_df[feature_cols]
y = features_df['churned_within_60d']
 
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42
)
 
# Train gradient boosting model
with mlflow.start_run(run_name="churn-gbm-v3"):
    model = lgb.LGBMClassifier(
        n_estimators=500,
        learning_rate=0.05,
        max_depth=6,
        num_leaves=31,
        min_child_samples=20,
        subsample=0.8,
        colsample_bytree=0.8,
        class_weight='balanced',
        random_state=42
    )
    model.fit(
        X_train, y_train,
        eval_set=[(X_test, y_test)],
        callbacks=[lgb.early_stopping(50)]
    )
 
    # Evaluate
    y_pred = model.predict_proba(X_test)[:, 1]
    auc_roc = roc_auc_score(y_test, y_pred)
    precision, recall, _ = precision_recall_curve(y_test, y_pred)
    auc_pr = auc(recall, precision)
 
    mlflow.log_metric("auc_roc", auc_roc)       # 0.83
    mlflow.log_metric("auc_pr", auc_pr)          # 0.61
    mlflow.log_metric("precision_at_80_recall", 0.52)
    mlflow.sklearn.log_model(model, "churn-predictor-v3")

Feature Importance

RankFeatureImportanceInterpretation
1usage_trend_ratio_7d_30d0.187Declining usage is the top signal
2features_adopted0.142Low feature adoption = high risk
3seat_utilization0.118Paying for unused seats = waste signal
4days_since_last_login0.096Recency of engagement
5support_tickets_30d0.083Frustration signal
6avg_dau_7d0.074Raw activity level
7nps_score0.068Direct satisfaction measure
8downgraded_last_period0.062Explicit contraction signal

Deploy to Ray Serve

Zara registers the model and deploys it for two use cases -- real-time scoring on login and daily batch scoring:

{
  "deployment": {
    "name": "churn-predictor-v3",
    "model_uri": "models:/churn-predictor-v3/production",
    "serving": {
      "realtime": {
        "endpoint": "/api/v1/ml/churn-score",
        "runtime": "ray_serve",
        "num_replicas": 2,
        "max_batch_size": 32,
        "latency_sla_ms": 200,
        "trigger": "user_login_event"
      },
      "batch": {
        "schedule": "0 6 * * *",
        "pipeline": "churn-prediction-daily",
        "output_table": "ml.churn_scores_daily"
      }
    },
    "crm_sync": {
      "destination": "salesforce",
      "object": "Account",
      "field": "Churn_Risk_Score__c",
      "update_frequency": "daily"
    }
  }
}

Stage 7: Feedback

Zara configures comprehensive monitoring for her deployed model using the ML Workbench Model Registry.

Performance Monitoring

{
  "monitoring": {
    "model": "churn-predictor-v3",
    "metrics": [
      {
        "name": "weekly_auc_roc",
        "type": "model_performance",
        "metric": "auc_roc",
        "window": "7d",
        "threshold": 0.78,
        "alert": "slack://ml-alerts"
      },
      {
        "name": "monthly_lift_chart",
        "type": "model_performance",
        "metric": "lift_at_decile",
        "window": "30d",
        "report": "bi_workbench://churn-model-performance"
      },
      {
        "name": "csm_intervention_tracking",
        "type": "business_outcome",
        "metric": "save_rate_for_flagged_accounts",
        "window": "90d",
        "target": 0.35
      }
    ],
    "drift_detection": [
      {
        "feature": "usage_trend_ratio_7d_30d",
        "type": "distribution_shift",
        "method": "psi",
        "threshold": 0.15,
        "note": "Seasonal patterns expected in Dec/Jan -- adjust threshold"
      },
      {
        "feature": "avg_dau_7d",
        "type": "distribution_shift",
        "method": "psi",
        "threshold": 0.10
      }
    ],
    "alerts": [
      {
        "condition": "auc_roc < 0.78 for 2 consecutive weeks",
        "action": "trigger_retraining_pipeline",
        "notify": ["zara@cloudflow.io", "slack://ml-alerts"]
      },
      {
        "condition": "prediction_volume drops > 20%",
        "action": "page_on_call",
        "note": "Likely ingestion pipeline failure, not model issue"
      }
    ]
  }
}

Tracking CSM Intervention Outcomes

The ultimate measure of model value is whether CSM interventions on flagged accounts actually reduce churn:

-- Monthly intervention outcome tracking
SELECT
    DATE_TRUNC('month', s.scored_at)        AS score_month,
    s.risk_tier,
    COUNT(DISTINCT s.workspace_id)          AS accounts_flagged,
    COUNT(DISTINCT CASE WHEN i.intervention_date IS NOT NULL
          THEN s.workspace_id END)          AS accounts_contacted,
    COUNT(DISTINCT CASE WHEN w.status = 'cancelled'
          AND w.cancelled_at <= s.scored_at + INTERVAL '90' DAY
          THEN s.workspace_id END)          AS actually_churned,
    ROUND(COUNT(DISTINCT CASE WHEN w.status = 'cancelled'
          AND w.cancelled_at <= s.scored_at + INTERVAL '90' DAY
          THEN s.workspace_id END) * 100.0
          / COUNT(DISTINCT s.workspace_id), 1) AS churn_rate
FROM ml.churn_scores_daily s
LEFT JOIN csm_interventions i ON s.workspace_id = i.workspace_id
    AND i.intervention_date BETWEEN s.scored_at AND s.scored_at + INTERVAL '14' DAY
LEFT JOIN workspaces w ON s.workspace_id = w.workspace_id
WHERE s.risk_tier = 'high'
GROUP BY 1, 2
ORDER BY 1;

Stage 8: Experimentation

Zara runs structured experiments to improve model accuracy and validate business impact.

Model Comparison

ModelAUC-ROCAUC-PRP@80RLatency (p50)Notes
Logistic Regression (baseline)0.760.440.388msInterpretable but underfits non-linear patterns
Gradient Boosting (current)0.830.610.5224msBest balance of accuracy and speed
Neural Network (MLP)0.850.650.5645msMarginal gain, higher latency, less interpretable
Gradient Boosting + NPS0.860.670.5826msAdding NPS survey data improves AUC by 0.03

Feature Ablation Study

Zara tests the impact of removing individual features to understand dependencies:

Removed FeatureAUC ChangeConclusion
usage_trend_ratio-0.04Critical -- keep
features_adopted-0.03Critical -- keep
nps_score-0.01Valuable but not critical (sparse data)
support_tickets_30d-0.02Important signal for enterprise accounts
account_age_days-0.005Minimal impact -- could remove for simplicity

A/B Test: CSM Outreach Strategies

Zara works with the Customer Success team to A/B test different intervention strategies for high-risk accounts:

{
  "experiment": {
    "name": "csm-outreach-strategy-q1-2026",
    "hypothesis": "Personalized outreach based on specific churn risk factors will have higher save rates than generic check-in calls",
    "variants": [
      {
        "name": "control",
        "description": "Standard quarterly check-in call",
        "allocation": 0.33
      },
      {
        "name": "risk_personalized",
        "description": "Outreach with specific risk factors: 'We noticed your team has not used [feature] recently -- can we schedule a training session?'",
        "allocation": 0.34
      },
      {
        "name": "value_demo",
        "description": "Proactive ROI report showing workspace productivity gains and cost savings",
        "allocation": 0.33
      }
    ],
    "primary_metric": "90_day_retention_rate",
    "secondary_metrics": ["expansion_mrr", "nps_change", "csm_time_per_account"],
    "duration_weeks": 12,
    "min_sample_per_variant": 100
  }
}

Early results (week 8):

VariantAccountsRetainedRetention RateExpansion MRR
Control (standard check-in)1128172.3%$2,100
Risk-personalized outreach1189883.1%$8,400
Proactive value demo1109182.7%$12,300

The risk-personalized outreach shows the highest retention rate, while the proactive value demo drives the most expansion revenue -- suggesting a combined strategy may be optimal.


Summary

StageKey ActionPlatform ComponentOutcome
1. IngestionConnected Salesforce CRM, Zendesk for support + billing signalsIngestion Service (Airbyte)7 data sources flowing, real-time + batch
2. DiscoveryFound 347 event types, profiled 40% at-risk users, traced existing health_score lineageData Workbench CatalogIdentified data gaps and existing assets
3. QueryBuilt 15+ churn features via federated SQL across 3 sourcesQuery Engine (Trino)Usage trends, adoption, collaboration, support, billing features
4. OrchestrationDaily pipeline with quality gates (event freshness, completeness)Pipeline Service (Temporal)Automated scoring with CRM sync
5. AnalysisUncovered 25% revenue churn driven by enterprise segment, 4x churn rate for low-adoption accountsData Workbench + Query EngineData-driven feature selection, survivorship bias check
6. ProductionizationDeployed gradient boosting model (AUC 0.83) to Ray Serve + daily batch scoringML Service (Ray Serve)Real-time and batch scoring, Salesforce integration
7. FeedbackWeekly AUC monitoring, feature drift detection, CSM intervention trackingML Workbench Model RegistryAutomated alerts, retraining triggers, business outcome tracking
8. ExperimentationCompared 4 model architectures, tested CSM outreach strategies via A/B testML Workbench Experiments83% retention with personalized outreach vs 72% baseline

Related Walkthroughs