Data Scientist Journey: User Churn Prediction for a B2B SaaS Product
Persona: Zara Ahmed, Senior Data Scientist at CloudFlow Goal: Build a churn prediction model to identify at-risk B2B accounts before renewal, enabling the Customer Success team to intervene proactively and reduce the 18% annual logo churn rate.
Primary Workbenches: ML Workbench, Data Workbench Supporting Services: Ingestion Service, Catalog Service, Query Engine, Pipeline Service, Data Quality Service, ML Service (Ray Serve)
Business Context
CloudFlow's churn problem is not uniform. The company's 18% annual logo churn rate masks a more alarming reality: 25% revenue churn. Larger accounts with higher ARR are churning at a disproportionate rate, and the Customer Success team only learns about at-risk accounts when the cancellation request arrives -- too late for meaningful intervention.
Zara's mandate is clear: build a model that identifies at-risk accounts 60 days before renewal, with enough lead time for the CSM team to engage, diagnose the problem, and offer targeted retention actions.
Current State Target State
┌────────────────────────┐ ┌────────────────────────┐
│ CSM learns about │ │ Model flags account │
│ churn at cancellation │ │ 60 days before renewal │
│ │ │ │
│ Reactive: 0 days lead │ ────▶ │ Proactive: 60 day lead │
│ time, 5% save rate │ │ time, 35% save rate │
│ │ │ │
│ Revenue churn: 25% │ │ Revenue churn target: │
│ │ │ < 15% │
└────────────────────────┘ └────────────────────────┘Stage 1: Ingestion
Zara begins by ensuring all the data sources she needs are connected and syncing. She opens the Data Workbench and navigates to the Ingestion panel to review existing connections and configure new ones.
Existing Connections
The data engineering team has already connected the primary data sources:
| Source | Connector | Sync Mode | Frequency | Status |
|---|---|---|---|---|
| Product PostgreSQL | Airbyte PostgreSQL CDC | Incremental (WAL) | Every 15 min | Active |
| Segment Events (Kafka) | Airbyte Kafka | Streaming | Continuous | Active |
| Stripe Billing | Airbyte Stripe | Incremental | Hourly | Active |
New Connections Needed
Zara needs two additional sources for her churn model -- CRM data (for renewal dates and CSM notes) and support ticket data (for sentiment signals):
Salesforce CRM connection:
{
"source": "salesforce",
"connector": "airbyte/source-salesforce",
"config": {
"streams": ["Account", "Opportunity", "Task", "Note"],
"sync_mode": "incremental",
"cursor_field": "SystemModstamp",
"frequency": "every_30_minutes"
},
"schema_mapping": {
"Account": {
"destination_table": "crm_accounts",
"primary_key": "Id",
"field_mapping": {
"Id": "account_id",
"CloudFlow_Workspace_Id__c": "workspace_id",
"OwnerId": "csm_owner",
"Health_Score__c": "health_score",
"Contract_End_Date__c": "renewal_date",
"ARR__c": "arr"
}
}
}
}Zendesk Support connection:
{
"source": "zendesk",
"connector": "airbyte/source-zendesk-support",
"config": {
"streams": ["tickets", "ticket_comments", "satisfaction_ratings"],
"sync_mode": "incremental",
"cursor_field": "updated_at",
"frequency": "every_15_minutes"
}
}Sync Configuration Strategy
For churn prediction, data freshness requirements vary by source:
| Data Type | Freshness Requirement | Rationale |
|---|---|---|
| Product events | Real-time (streaming) | Usage drop-offs are the strongest churn signal |
| Support tickets | < 15 minutes | Urgent tickets signal immediate dissatisfaction |
| Billing data | < 1 hour | Downgrade signals matter but are less time-sensitive |
| CRM data | < 30 minutes | CSM notes and health scores update infrequently |
| NPS surveys | Daily batch | Quarterly surveys, batch processing is sufficient |
Stage 2: Discovery
With data flowing, Zara switches to the Data Workbench Catalog to explore what is available, understand data quality, and discover existing assets she can build on.
Catalog Exploration
Zara searches the catalog for "user activity" and discovers 347 distinct event types
in the events table. She profiles the event distribution:
-- Data Workbench: Catalog profiling query
SELECT
event_type,
COUNT(*) AS event_count,
COUNT(DISTINCT user_id) AS unique_users,
COUNT(DISTINCT workspace_id) AS unique_workspaces,
MIN(timestamp) AS first_seen,
MAX(timestamp) AS last_seen
FROM events
WHERE timestamp >= CURRENT_DATE - INTERVAL '90' DAY
GROUP BY event_type
ORDER BY event_count DESC
LIMIT 20;Top events discovered:
| Event Type | Count (90d) | Unique Users | Signal Type |
|---|---|---|---|
page_viewed | 142M | 78K | Engagement |
task_created | 28M | 52K | Core action |
task_completed | 19M | 44K | Core action |
comment_added | 12M | 38K | Collaboration |
file_uploaded | 8.4M | 31K | Depth |
invite_sent | 620K | 12K | Expansion |
integration_connected | 180K | 8.2K | Depth |
export_requested | 95K | 6.1K | Possible churn signal |
Key Discovery: User Activity Profiling
Zara profiles user activity levels and finds a concerning distribution:
-- Profile monthly active days per user
SELECT
CASE
WHEN active_days >= 20 THEN 'Power User (20+ days)'
WHEN active_days >= 10 THEN 'Regular (10-19 days)'
WHEN active_days >= 5 THEN 'Light (5-9 days)'
ELSE 'At Risk (< 5 days)'
END AS user_segment,
COUNT(*) AS user_count,
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 1) AS pct
FROM (
SELECT user_id, COUNT(DISTINCT DATE(timestamp)) AS active_days
FROM events
WHERE timestamp >= CURRENT_DATE - INTERVAL '30' DAY
GROUP BY user_id
) user_activity
GROUP BY 1
ORDER BY 2 DESC;| Segment | Users | Percentage |
|---|---|---|
| At Risk (< 5 days) | 40,000 | 40% |
| Light (5-9 days) | 22,000 | 22% |
| Regular (10-19 days) | 21,000 | 21% |
| Power User (20+ days) | 17,000 | 17% |
40% of users are active fewer than 5 days per month -- a much larger at-risk population than the team assumed.
Lineage Discovery
Zara traces the data lineage for an existing health_score column in the crm_accounts
table and discovers it was built by the product team using a simple formula:
event (raw)
└──▶ session_aggregate (Pipeline Service, hourly)
└──▶ user_activity_daily (Pipeline Service, daily)
└──▶ workspace_health_score (Pipeline Service, daily)
└──▶ crm_accounts.health_score (Salesforce sync, bidirectional)The existing health score uses only 3 signals (login frequency, task creation, team size). Zara's model will incorporate 15+ features for a more accurate prediction.
Stage 3: Query
Zara builds her churn feature set using the Query Engine, federating across PostgreSQL (product data), Snowflake (event warehouse), and Salesforce (CRM) in a single query session.
Feature Engineering: Usage Trends
Rolling usage trends are the strongest churn predictor. Zara computes 7-day, 14-day, and 30-day rolling averages and their rate of change:
-- Feature: Usage frequency trends with rolling windows
-- Federated query: PostgreSQL (users) + Snowflake (events)
WITH daily_usage AS (
SELECT
u.workspace_id,
DATE(e.timestamp) AS activity_date,
COUNT(DISTINCT e.user_id) AS active_users,
COUNT(*) AS total_events,
COUNT(DISTINCT e.event_type) AS distinct_actions,
COUNT(CASE WHEN e.event_type IN ('task_created', 'task_completed',
'comment_added', 'file_uploaded') THEN 1 END) AS core_actions
FROM snowflake.events e
JOIN postgresql.users u ON e.user_id = u.user_id
WHERE e.timestamp >= CURRENT_DATE - INTERVAL '90' DAY
GROUP BY u.workspace_id, DATE(e.timestamp)
),
rolling_features AS (
SELECT
workspace_id,
activity_date,
-- 7-day rolling averages
AVG(active_users) OVER (PARTITION BY workspace_id
ORDER BY activity_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
AS avg_dau_7d,
AVG(core_actions) OVER (PARTITION BY workspace_id
ORDER BY activity_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
AS avg_core_actions_7d,
-- 30-day rolling averages
AVG(active_users) OVER (PARTITION BY workspace_id
ORDER BY activity_date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW)
AS avg_dau_30d,
AVG(core_actions) OVER (PARTITION BY workspace_id
ORDER BY activity_date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW)
AS avg_core_actions_30d
FROM daily_usage
)
SELECT
workspace_id,
activity_date,
avg_dau_7d,
avg_dau_30d,
-- Usage trend: ratio of recent to longer-term average
-- < 1.0 means declining usage (churn signal)
ROUND(avg_dau_7d / NULLIF(avg_dau_30d, 0), 3) AS usage_trend_ratio,
ROUND(avg_core_actions_7d / NULLIF(avg_core_actions_30d, 0), 3)
AS action_trend_ratio
FROM rolling_features
WHERE activity_date = CURRENT_DATE - INTERVAL '1' DAY;Feature Engineering: Feature Adoption Depth
Accounts using fewer than 3 core features churn at 4x the rate of accounts using 5+:
-- Feature: Feature adoption depth per workspace
WITH feature_usage AS (
SELECT
u.workspace_id,
CASE
WHEN e.event_type LIKE 'board_%' THEN 'kanban_boards'
WHEN e.event_type LIKE 'gantt_%' THEN 'gantt_charts'
WHEN e.event_type LIKE 'timeline_%' THEN 'timelines'
WHEN e.event_type LIKE 'automation_%' THEN 'automations'
WHEN e.event_type LIKE 'form_%' THEN 'forms'
WHEN e.event_type LIKE 'dashboard_%' THEN 'dashboards'
WHEN e.event_type LIKE 'integration_%' THEN 'integrations'
WHEN e.event_type LIKE 'report_%' THEN 'reports'
WHEN e.event_type LIKE 'api_%' THEN 'api_access'
WHEN e.event_type = 'invite_sent' THEN 'team_invites'
END AS feature_category
FROM snowflake.events e
JOIN postgresql.users u ON e.user_id = u.user_id
WHERE e.timestamp >= CURRENT_DATE - INTERVAL '30' DAY
AND e.event_type NOT IN ('page_viewed', 'session_start', 'session_end')
)
SELECT
workspace_id,
COUNT(DISTINCT feature_category) AS features_adopted,
ARRAY_AGG(DISTINCT feature_category) AS feature_list
FROM feature_usage
WHERE feature_category IS NOT NULL
GROUP BY workspace_id;Feature Engineering: Collaboration and Support Signals
-- Feature: Team collaboration patterns + support ticket sentiment
SELECT
w.workspace_id,
-- Collaboration signals
COUNT(DISTINCT CASE WHEN e.event_type = 'comment_added'
THEN e.user_id END) AS users_commenting,
COUNT(CASE WHEN e.event_type = 'invite_sent'
THEN 1 END) AS invites_sent_30d,
w.seat_count,
ROUND(COUNT(DISTINCT e.user_id) * 1.0
/ NULLIF(w.seat_count, 0), 2) AS seat_utilization,
-- Support signals (from Zendesk)
COUNT(DISTINCT t.ticket_id) AS support_tickets_30d,
AVG(t.satisfaction_rating) AS avg_csat,
COUNT(CASE WHEN t.priority IN ('high', 'urgent')
THEN 1 END) AS urgent_tickets_30d,
-- Billing signals (from Stripe)
s.plan AS current_plan,
CASE WHEN s.mrr < LAG(s.mrr) OVER (PARTITION BY s.workspace_id
ORDER BY s.period_start) THEN 1 ELSE 0 END AS downgraded_last_period
FROM postgresql.workspaces w
JOIN snowflake.events e ON w.workspace_id = e.workspace_id
AND e.timestamp >= CURRENT_DATE - INTERVAL '30' DAY
LEFT JOIN zendesk.support_tickets t ON w.workspace_id = t.workspace_id
AND t.created_at >= CURRENT_DATE - INTERVAL '30' DAY
LEFT JOIN stripe.subscriptions s ON w.workspace_id = s.workspace_id
AND s.status = 'active'
GROUP BY w.workspace_id, w.seat_count, s.plan, s.mrr, s.period_start;Stage 4: Orchestration
Zara builds a daily pipeline using the Pipeline Service (Temporal) that computes all churn features, runs the model, and pushes scores to the CRM.
Pipeline Definition
{
"pipeline": {
"name": "churn-prediction-daily",
"schedule": "0 6 * * *",
"description": "Daily churn risk scoring for all active workspaces",
"stages": [
{
"name": "extract_features",
"type": "sql_transform",
"query_ref": "churn_features_v3.sql",
"output_table": "ml.churn_features_daily",
"timeout_minutes": 30
},
{
"name": "quality_gate",
"type": "data_quality",
"depends_on": ["extract_features"],
"checks": [
{
"name": "event_data_freshness",
"type": "freshness",
"table": "events",
"column": "timestamp",
"max_stale_hours": 2,
"severity": "critical",
"on_failure": "abort_pipeline"
},
{
"name": "feature_completeness",
"type": "column_not_null",
"table": "ml.churn_features_daily",
"columns": ["workspace_id", "avg_dau_7d", "features_adopted"],
"max_null_pct": 5,
"severity": "warning"
},
{
"name": "workspace_count",
"type": "row_count",
"table": "ml.churn_features_daily",
"min_rows": 3500,
"max_rows": 5000,
"severity": "critical"
}
]
},
{
"name": "score_accounts",
"type": "model_inference",
"depends_on": ["quality_gate"],
"model": "churn-predictor-v3",
"model_registry": "ml_workbench",
"input_table": "ml.churn_features_daily",
"output_table": "ml.churn_scores_daily",
"batch_size": 500
},
{
"name": "flag_at_risk",
"type": "sql_transform",
"depends_on": ["score_accounts"],
"query": "INSERT INTO ml.at_risk_accounts SELECT workspace_id, churn_probability, risk_tier, scored_at FROM ml.churn_scores_daily WHERE churn_probability >= 0.65",
"output_table": "ml.at_risk_accounts"
},
{
"name": "sync_to_crm",
"type": "reverse_sync",
"depends_on": ["flag_at_risk"],
"destination": "salesforce",
"object": "Account",
"mapping": {
"workspace_id": "CloudFlow_Workspace_Id__c",
"churn_probability": "Churn_Risk_Score__c",
"risk_tier": "Risk_Tier__c"
}
}
]
}
}Quality Gate: Event Freshness
The quality gate on event data freshness is critical. If Segment has a lag > 2 hours, the usage features will understate activity and produce false positives:
Pipeline Run: 2026-02-28 06:00 UTC
├── extract_features .............. PASSED (12m 34s)
├── quality_gate
│ ├── event_data_freshness ..... PASSED (latest event: 05:58 UTC, lag: 2 min)
│ ├── feature_completeness ..... PASSED (null rate: 1.2%)
│ └── workspace_count .......... PASSED (4,187 workspaces scored)
├── score_accounts ............... PASSED (3m 12s)
├── flag_at_risk ................. PASSED (847 accounts flagged)
└── sync_to_crm .................. PASSED (847 records synced)Stage 5: Analysis
Before training the model, Zara performs deep analysis on churn patterns to validate her feature hypotheses and check for data quality issues.
Churn Pattern Analysis
-- Analyze churn rates by customer segment
SELECT
CASE
WHEN s.mrr >= 5000 THEN 'Enterprise ($5K+)'
WHEN s.mrr >= 1000 THEN 'Mid-Market ($1K-$5K)'
WHEN s.mrr >= 200 THEN 'SMB ($200-$1K)'
ELSE 'Self-Serve (< $200)'
END AS segment,
COUNT(DISTINCT w.workspace_id) AS total_accounts,
COUNT(DISTINCT CASE WHEN w.status = 'cancelled'
THEN w.workspace_id END) AS churned,
ROUND(COUNT(DISTINCT CASE WHEN w.status = 'cancelled'
THEN w.workspace_id END) * 100.0
/ COUNT(DISTINCT w.workspace_id), 1) AS logo_churn_pct,
SUM(CASE WHEN w.status = 'cancelled'
THEN s.mrr ELSE 0 END) AS churned_mrr,
ROUND(SUM(CASE WHEN w.status = 'cancelled' THEN s.mrr ELSE 0 END)
* 100.0 / SUM(s.mrr), 1) AS revenue_churn_pct
FROM workspaces w
JOIN subscriptions s ON w.workspace_id = s.workspace_id
WHERE s.period_start >= CURRENT_DATE - INTERVAL '365' DAY
GROUP BY 1
ORDER BY revenue_churn_pct DESC;| Segment | Accounts | Churned | Logo Churn | Churned MRR | Revenue Churn |
|---|---|---|---|---|---|
| Enterprise ($5K+) | 120 | 14 | 11.7% | $98K | 32.1% |
| Mid-Market (5K) | 480 | 72 | 15.0% | $158K | 27.3% |
| SMB (1K) | 1,800 | 324 | 18.0% | $121K | 19.8% |
| Self-Serve (< $200) | 1,800 | 342 | 19.0% | $53K | 16.4% |
Key finding: Enterprise accounts have the lowest logo churn but the highest revenue churn -- losing one 50/month accounts.
Feature Adoption and Churn Correlation
-- Churn rate by number of core features adopted
SELECT
features_adopted,
COUNT(*) AS account_count,
COUNT(CASE WHEN churned = 1 THEN 1 END) AS churned_count,
ROUND(COUNT(CASE WHEN churned = 1 THEN 1 END)
* 100.0 / COUNT(*), 1) AS churn_rate
FROM ml.churn_features_daily
GROUP BY features_adopted
ORDER BY features_adopted;| Features Adopted | Accounts | Churned | Churn Rate |
|---|---|---|---|
| 1-2 | 1,240 | 372 | 30.0% |
| 3-4 | 1,580 | 221 | 14.0% |
| 5-6 | 840 | 59 | 7.0% |
| 7+ | 540 | 19 | 3.5% |
Accounts using fewer than 3 core features churn at 4x the rate of those using 5+. This validates feature adoption depth as a top predictor.
Survivorship Bias Check
Zara checks for survivorship bias in the training data -- if she only includes currently active accounts, she will miss the patterns of accounts that already churned:
-- Ensure training data includes churned accounts from the past 12 months
SELECT
CASE WHEN status = 'cancelled' THEN 'Churned' ELSE 'Active' END AS label,
COUNT(*) AS count,
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 1) AS pct
FROM workspaces
WHERE created_at <= CURRENT_DATE - INTERVAL '90' DAY -- exclude very new accounts
GROUP BY 1;| Label | Count | Percentage |
|---|---|---|
| Active | 3,450 | 82.3% |
| Churned | 742 | 17.7% |
The class imbalance (82/18) is manageable. Zara will use stratified sampling and evaluate with AUC-PR in addition to AUC-ROC.
Stage 6: Productionization
Zara trains and deploys her churn model using the ML Workbench.
Model Training
# ML Workbench: Notebook - churn_model_training_v3.ipynb
import pandas as pd
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.metrics import roc_auc_score, precision_recall_curve, auc
import lightgbm as lgb
import mlflow
# Load feature set from Query Engine
features_df = matih.query("""
SELECT * FROM ml.churn_training_features
WHERE snapshot_date BETWEEN DATE '2025-03-01' AND DATE '2026-02-28'
""")
# Feature columns
feature_cols = [
'avg_dau_7d', 'avg_dau_14d', 'avg_dau_30d',
'usage_trend_ratio_7d_30d', 'action_trend_ratio',
'features_adopted', 'seat_utilization',
'invites_sent_30d', 'users_commenting',
'support_tickets_30d', 'urgent_tickets_30d', 'avg_csat',
'days_since_last_login', 'mrr', 'account_age_days',
'downgraded_last_period', 'nps_score'
]
X = features_df[feature_cols]
y = features_df['churned_within_60d']
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, stratify=y, random_state=42
)
# Train gradient boosting model
with mlflow.start_run(run_name="churn-gbm-v3"):
model = lgb.LGBMClassifier(
n_estimators=500,
learning_rate=0.05,
max_depth=6,
num_leaves=31,
min_child_samples=20,
subsample=0.8,
colsample_bytree=0.8,
class_weight='balanced',
random_state=42
)
model.fit(
X_train, y_train,
eval_set=[(X_test, y_test)],
callbacks=[lgb.early_stopping(50)]
)
# Evaluate
y_pred = model.predict_proba(X_test)[:, 1]
auc_roc = roc_auc_score(y_test, y_pred)
precision, recall, _ = precision_recall_curve(y_test, y_pred)
auc_pr = auc(recall, precision)
mlflow.log_metric("auc_roc", auc_roc) # 0.83
mlflow.log_metric("auc_pr", auc_pr) # 0.61
mlflow.log_metric("precision_at_80_recall", 0.52)
mlflow.sklearn.log_model(model, "churn-predictor-v3")Feature Importance
| Rank | Feature | Importance | Interpretation |
|---|---|---|---|
| 1 | usage_trend_ratio_7d_30d | 0.187 | Declining usage is the top signal |
| 2 | features_adopted | 0.142 | Low feature adoption = high risk |
| 3 | seat_utilization | 0.118 | Paying for unused seats = waste signal |
| 4 | days_since_last_login | 0.096 | Recency of engagement |
| 5 | support_tickets_30d | 0.083 | Frustration signal |
| 6 | avg_dau_7d | 0.074 | Raw activity level |
| 7 | nps_score | 0.068 | Direct satisfaction measure |
| 8 | downgraded_last_period | 0.062 | Explicit contraction signal |
Deploy to Ray Serve
Zara registers the model and deploys it for two use cases -- real-time scoring on login and daily batch scoring:
{
"deployment": {
"name": "churn-predictor-v3",
"model_uri": "models:/churn-predictor-v3/production",
"serving": {
"realtime": {
"endpoint": "/api/v1/ml/churn-score",
"runtime": "ray_serve",
"num_replicas": 2,
"max_batch_size": 32,
"latency_sla_ms": 200,
"trigger": "user_login_event"
},
"batch": {
"schedule": "0 6 * * *",
"pipeline": "churn-prediction-daily",
"output_table": "ml.churn_scores_daily"
}
},
"crm_sync": {
"destination": "salesforce",
"object": "Account",
"field": "Churn_Risk_Score__c",
"update_frequency": "daily"
}
}
}Stage 7: Feedback
Zara configures comprehensive monitoring for her deployed model using the ML Workbench Model Registry.
Performance Monitoring
{
"monitoring": {
"model": "churn-predictor-v3",
"metrics": [
{
"name": "weekly_auc_roc",
"type": "model_performance",
"metric": "auc_roc",
"window": "7d",
"threshold": 0.78,
"alert": "slack://ml-alerts"
},
{
"name": "monthly_lift_chart",
"type": "model_performance",
"metric": "lift_at_decile",
"window": "30d",
"report": "bi_workbench://churn-model-performance"
},
{
"name": "csm_intervention_tracking",
"type": "business_outcome",
"metric": "save_rate_for_flagged_accounts",
"window": "90d",
"target": 0.35
}
],
"drift_detection": [
{
"feature": "usage_trend_ratio_7d_30d",
"type": "distribution_shift",
"method": "psi",
"threshold": 0.15,
"note": "Seasonal patterns expected in Dec/Jan -- adjust threshold"
},
{
"feature": "avg_dau_7d",
"type": "distribution_shift",
"method": "psi",
"threshold": 0.10
}
],
"alerts": [
{
"condition": "auc_roc < 0.78 for 2 consecutive weeks",
"action": "trigger_retraining_pipeline",
"notify": ["zara@cloudflow.io", "slack://ml-alerts"]
},
{
"condition": "prediction_volume drops > 20%",
"action": "page_on_call",
"note": "Likely ingestion pipeline failure, not model issue"
}
]
}
}Tracking CSM Intervention Outcomes
The ultimate measure of model value is whether CSM interventions on flagged accounts actually reduce churn:
-- Monthly intervention outcome tracking
SELECT
DATE_TRUNC('month', s.scored_at) AS score_month,
s.risk_tier,
COUNT(DISTINCT s.workspace_id) AS accounts_flagged,
COUNT(DISTINCT CASE WHEN i.intervention_date IS NOT NULL
THEN s.workspace_id END) AS accounts_contacted,
COUNT(DISTINCT CASE WHEN w.status = 'cancelled'
AND w.cancelled_at <= s.scored_at + INTERVAL '90' DAY
THEN s.workspace_id END) AS actually_churned,
ROUND(COUNT(DISTINCT CASE WHEN w.status = 'cancelled'
AND w.cancelled_at <= s.scored_at + INTERVAL '90' DAY
THEN s.workspace_id END) * 100.0
/ COUNT(DISTINCT s.workspace_id), 1) AS churn_rate
FROM ml.churn_scores_daily s
LEFT JOIN csm_interventions i ON s.workspace_id = i.workspace_id
AND i.intervention_date BETWEEN s.scored_at AND s.scored_at + INTERVAL '14' DAY
LEFT JOIN workspaces w ON s.workspace_id = w.workspace_id
WHERE s.risk_tier = 'high'
GROUP BY 1, 2
ORDER BY 1;Stage 8: Experimentation
Zara runs structured experiments to improve model accuracy and validate business impact.
Model Comparison
| Model | AUC-ROC | AUC-PR | P@80R | Latency (p50) | Notes |
|---|---|---|---|---|---|
| Logistic Regression (baseline) | 0.76 | 0.44 | 0.38 | 8ms | Interpretable but underfits non-linear patterns |
| Gradient Boosting (current) | 0.83 | 0.61 | 0.52 | 24ms | Best balance of accuracy and speed |
| Neural Network (MLP) | 0.85 | 0.65 | 0.56 | 45ms | Marginal gain, higher latency, less interpretable |
| Gradient Boosting + NPS | 0.86 | 0.67 | 0.58 | 26ms | Adding NPS survey data improves AUC by 0.03 |
Feature Ablation Study
Zara tests the impact of removing individual features to understand dependencies:
| Removed Feature | AUC Change | Conclusion |
|---|---|---|
usage_trend_ratio | -0.04 | Critical -- keep |
features_adopted | -0.03 | Critical -- keep |
nps_score | -0.01 | Valuable but not critical (sparse data) |
support_tickets_30d | -0.02 | Important signal for enterprise accounts |
account_age_days | -0.005 | Minimal impact -- could remove for simplicity |
A/B Test: CSM Outreach Strategies
Zara works with the Customer Success team to A/B test different intervention strategies for high-risk accounts:
{
"experiment": {
"name": "csm-outreach-strategy-q1-2026",
"hypothesis": "Personalized outreach based on specific churn risk factors will have higher save rates than generic check-in calls",
"variants": [
{
"name": "control",
"description": "Standard quarterly check-in call",
"allocation": 0.33
},
{
"name": "risk_personalized",
"description": "Outreach with specific risk factors: 'We noticed your team has not used [feature] recently -- can we schedule a training session?'",
"allocation": 0.34
},
{
"name": "value_demo",
"description": "Proactive ROI report showing workspace productivity gains and cost savings",
"allocation": 0.33
}
],
"primary_metric": "90_day_retention_rate",
"secondary_metrics": ["expansion_mrr", "nps_change", "csm_time_per_account"],
"duration_weeks": 12,
"min_sample_per_variant": 100
}
}Early results (week 8):
| Variant | Accounts | Retained | Retention Rate | Expansion MRR |
|---|---|---|---|---|
| Control (standard check-in) | 112 | 81 | 72.3% | $2,100 |
| Risk-personalized outreach | 118 | 98 | 83.1% | $8,400 |
| Proactive value demo | 110 | 91 | 82.7% | $12,300 |
The risk-personalized outreach shows the highest retention rate, while the proactive value demo drives the most expansion revenue -- suggesting a combined strategy may be optimal.
Summary
| Stage | Key Action | Platform Component | Outcome |
|---|---|---|---|
| 1. Ingestion | Connected Salesforce CRM, Zendesk for support + billing signals | Ingestion Service (Airbyte) | 7 data sources flowing, real-time + batch |
| 2. Discovery | Found 347 event types, profiled 40% at-risk users, traced existing health_score lineage | Data Workbench Catalog | Identified data gaps and existing assets |
| 3. Query | Built 15+ churn features via federated SQL across 3 sources | Query Engine (Trino) | Usage trends, adoption, collaboration, support, billing features |
| 4. Orchestration | Daily pipeline with quality gates (event freshness, completeness) | Pipeline Service (Temporal) | Automated scoring with CRM sync |
| 5. Analysis | Uncovered 25% revenue churn driven by enterprise segment, 4x churn rate for low-adoption accounts | Data Workbench + Query Engine | Data-driven feature selection, survivorship bias check |
| 6. Productionization | Deployed gradient boosting model (AUC 0.83) to Ray Serve + daily batch scoring | ML Service (Ray Serve) | Real-time and batch scoring, Salesforce integration |
| 7. Feedback | Weekly AUC monitoring, feature drift detection, CSM intervention tracking | ML Workbench Model Registry | Automated alerts, retraining triggers, business outcome tracking |
| 8. Experimentation | Compared 4 model architectures, tested CSM outreach strategies via A/B test | ML Workbench Experiments | 83% retention with personalized outreach vs 72% baseline |
Related Walkthroughs
- ML Engineer Journey -- Raj builds the feature recommendation engine that improves adoption (Zara's #2 predictor)
- BI Lead Journey -- Emily builds the PLG dashboards that surface churn scores to stakeholders
- Executive Journey -- Michael uses churn data for strategic pricing and retention decisions
- SaaS & Technology Overview -- Industry context and dataset reference