Data Scientist Journey: User Churn Prediction for a B2B SaaS Product

Persona: Zara Ahmed, Senior Data Scientist at CloudFlow Goal: Build a churn prediction model to identify at-risk B2B accounts before renewal, enabling the Customer Success team to intervene proactively and reduce the 18% annual logo churn rate.

Primary Workbenches: ML Workbench, Data Workbench Supporting Services: Ingestion Service, Catalog Service, Query Engine, Pipeline Service, Data Quality Service, ML Service (Ray Serve)

Business Context

CloudFlow's churn problem is not uniform. The company's 18% annual logo churn rate masks a more alarming reality: 25% revenue churn. Larger accounts with higher ARR are churning at a disproportionate rate, and the Customer Success team only learns about at-risk accounts when the cancellation request arrives -- too late for meaningful intervention.

Zara's mandate is clear: build a model that identifies at-risk accounts 60 days before renewal, with enough lead time for the CSM team to engage, diagnose the problem, and offer targeted retention actions.

  Current State                         Target State
  ┌────────────────────────┐            ┌────────────────────────┐
  │ CSM learns about       │            │ Model flags account    │
  │ churn at cancellation   │            │ 60 days before renewal │
  │                        │            │                        │
  │ Reactive: 0 days lead  │  ────▶     │ Proactive: 60 day lead │
  │ time, 5% save rate     │            │ time, 35% save rate    │
  │                        │            │                        │
  │ Revenue churn: 25%     │            │ Revenue churn target:  │
  │                        │            │ < 15%                  │
  └────────────────────────┘            └────────────────────────┘

Stage 1: Ingestion

Zara begins by ensuring all the data sources she needs are connected and syncing. She opens the Data Workbench and navigates to the Ingestion panel to review existing connections and configure new ones.

Existing Connections

The data engineering team has already connected the primary data sources:

Source	Connector	Sync Mode	Frequency	Status
Product PostgreSQL	Airbyte PostgreSQL CDC	Incremental (WAL)	Every 15 min	Active
Segment Events (Kafka)	Airbyte Kafka	Streaming	Continuous	Active
Stripe Billing	Airbyte Stripe	Incremental	Hourly	Active

New Connections Needed

Zara needs two additional sources for her churn model -- CRM data (for renewal dates and CSM notes) and support ticket data (for sentiment signals):

Salesforce CRM connection:

{
  "source": "salesforce",
  "connector": "airbyte/source-salesforce",
  "config": {
    "streams": ["Account", "Opportunity", "Task", "Note"],
    "sync_mode": "incremental",
    "cursor_field": "SystemModstamp",
    "frequency": "every_30_minutes"
  },
  "schema_mapping": {
    "Account": {
      "destination_table": "crm_accounts",
      "primary_key": "Id",
      "field_mapping": {
        "Id": "account_id",
        "CloudFlow_Workspace_Id__c": "workspace_id",
        "OwnerId": "csm_owner",
        "Health_Score__c": "health_score",
        "Contract_End_Date__c": "renewal_date",
        "ARR__c": "arr"
      }
    }
  }
}

Zendesk Support connection:

{
  "source": "zendesk",
  "connector": "airbyte/source-zendesk-support",
  "config": {
    "streams": ["tickets", "ticket_comments", "satisfaction_ratings"],
    "sync_mode": "incremental",
    "cursor_field": "updated_at",
    "frequency": "every_15_minutes"
  }
}

Sync Configuration Strategy

For churn prediction, data freshness requirements vary by source:

Data Type	Freshness Requirement	Rationale
Product events	Real-time (streaming)	Usage drop-offs are the strongest churn signal
Support tickets	< 15 minutes	Urgent tickets signal immediate dissatisfaction
Billing data	< 1 hour	Downgrade signals matter but are less time-sensitive
CRM data	< 30 minutes	CSM notes and health scores update infrequently
NPS surveys	Daily batch	Quarterly surveys, batch processing is sufficient

Stage 2: Discovery

With data flowing, Zara switches to the Data Workbench Catalog to explore what is available, understand data quality, and discover existing assets she can build on.

Catalog Exploration

Zara searches the catalog for "user activity" and discovers 347 distinct event types in the events table. She profiles the event distribution:

-- Data Workbench: Catalog profiling query
SELECT
    event_type,
    COUNT(*)                                    AS event_count,
    COUNT(DISTINCT user_id)                     AS unique_users,
    COUNT(DISTINCT workspace_id)                AS unique_workspaces,
    MIN(timestamp)                              AS first_seen,
    MAX(timestamp)                              AS last_seen
FROM events
WHERE timestamp >= CURRENT_DATE - INTERVAL '90' DAY
GROUP BY event_type
ORDER BY event_count DESC
LIMIT 20;

Top events discovered:

Event Type	Count (90d)	Unique Users	Signal Type
`page_viewed`	142M	78K	Engagement
`task_created`	28M	52K	Core action
`task_completed`	19M	44K	Core action
`comment_added`	12M	38K	Collaboration
`file_uploaded`	8.4M	31K	Depth
`invite_sent`	620K	12K	Expansion
`integration_connected`	180K	8.2K	Depth
`export_requested`	95K	6.1K	Possible churn signal

Key Discovery: User Activity Profiling

Zara profiles user activity levels and finds a concerning distribution:

-- Profile monthly active days per user
SELECT
    CASE
        WHEN active_days >= 20 THEN 'Power User (20+ days)'
        WHEN active_days >= 10 THEN 'Regular (10-19 days)'
        WHEN active_days >= 5  THEN 'Light (5-9 days)'
        ELSE 'At Risk (< 5 days)'
    END                          AS user_segment,
    COUNT(*)                     AS user_count,
    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 1) AS pct
FROM (
    SELECT user_id, COUNT(DISTINCT DATE(timestamp)) AS active_days
    FROM events
    WHERE timestamp >= CURRENT_DATE - INTERVAL '30' DAY
    GROUP BY user_id
) user_activity
GROUP BY 1
ORDER BY 2 DESC;

Segment	Users	Percentage
At Risk (< 5 days)	40,000	40%
Light (5-9 days)	22,000	22%
Regular (10-19 days)	21,000	21%
Power User (20+ days)	17,000	17%

40% of users are active fewer than 5 days per month -- a much larger at-risk population than the team assumed.

Lineage Discovery

Zara traces the data lineage for an existing health_score column in the crm_accounts table and discovers it was built by the product team using a simple formula:

event (raw)
  └──▶ session_aggregate (Pipeline Service, hourly)
         └──▶ user_activity_daily (Pipeline Service, daily)
                └──▶ workspace_health_score (Pipeline Service, daily)
                       └──▶ crm_accounts.health_score (Salesforce sync, bidirectional)

The existing health score uses only 3 signals (login frequency, task creation, team size). Zara's model will incorporate 15+ features for a more accurate prediction.

Stage 3: Query

Zara builds her churn feature set using the Query Engine, federating across PostgreSQL (product data), Snowflake (event warehouse), and Salesforce (CRM) in a single query session.

Feature Engineering: Usage Trends

Rolling usage trends are the strongest churn predictor. Zara computes 7-day, 14-day, and 30-day rolling averages and their rate of change:

-- Feature: Usage frequency trends with rolling windows
-- Federated query: PostgreSQL (users) + Snowflake (events)
WITH daily_usage AS (
    SELECT
        u.workspace_id,
        DATE(e.timestamp)                       AS activity_date,
        COUNT(DISTINCT e.user_id)               AS active_users,
        COUNT(*)                                AS total_events,
        COUNT(DISTINCT e.event_type)            AS distinct_actions,
        COUNT(CASE WHEN e.event_type IN ('task_created', 'task_completed',
              'comment_added', 'file_uploaded') THEN 1 END) AS core_actions
    FROM snowflake.events e
    JOIN postgresql.users u ON e.user_id = u.user_id
    WHERE e.timestamp >= CURRENT_DATE - INTERVAL '90' DAY
    GROUP BY u.workspace_id, DATE(e.timestamp)
),
rolling_features AS (
    SELECT
        workspace_id,
        activity_date,
        -- 7-day rolling averages
        AVG(active_users)   OVER (PARTITION BY workspace_id
            ORDER BY activity_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
            AS avg_dau_7d,
        AVG(core_actions)   OVER (PARTITION BY workspace_id
            ORDER BY activity_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
            AS avg_core_actions_7d,
        -- 30-day rolling averages
        AVG(active_users)   OVER (PARTITION BY workspace_id
            ORDER BY activity_date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW)
            AS avg_dau_30d,
        AVG(core_actions)   OVER (PARTITION BY workspace_id
            ORDER BY activity_date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW)
            AS avg_core_actions_30d
    FROM daily_usage
)
SELECT
    workspace_id,
    activity_date,
    avg_dau_7d,
    avg_dau_30d,
    -- Usage trend: ratio of recent to longer-term average
    -- < 1.0 means declining usage (churn signal)
    ROUND(avg_dau_7d / NULLIF(avg_dau_30d, 0), 3)        AS usage_trend_ratio,
    ROUND(avg_core_actions_7d / NULLIF(avg_core_actions_30d, 0), 3)
                                                           AS action_trend_ratio
FROM rolling_features
WHERE activity_date = CURRENT_DATE - INTERVAL '1' DAY;

Feature Engineering: Feature Adoption Depth

Accounts using fewer than 3 core features churn at 4x the rate of accounts using 5+:

-- Feature: Feature adoption depth per workspace
WITH feature_usage AS (
    SELECT
        u.workspace_id,
        CASE
            WHEN e.event_type LIKE 'board_%'       THEN 'kanban_boards'
            WHEN e.event_type LIKE 'gantt_%'        THEN 'gantt_charts'
            WHEN e.event_type LIKE 'timeline_%'     THEN 'timelines'
            WHEN e.event_type LIKE 'automation_%'   THEN 'automations'
            WHEN e.event_type LIKE 'form_%'         THEN 'forms'
            WHEN e.event_type LIKE 'dashboard_%'    THEN 'dashboards'
            WHEN e.event_type LIKE 'integration_%'  THEN 'integrations'
            WHEN e.event_type LIKE 'report_%'       THEN 'reports'
            WHEN e.event_type LIKE 'api_%'          THEN 'api_access'
            WHEN e.event_type = 'invite_sent'       THEN 'team_invites'
        END AS feature_category
    FROM snowflake.events e
    JOIN postgresql.users u ON e.user_id = u.user_id
    WHERE e.timestamp >= CURRENT_DATE - INTERVAL '30' DAY
      AND e.event_type NOT IN ('page_viewed', 'session_start', 'session_end')
)
SELECT
    workspace_id,
    COUNT(DISTINCT feature_category)          AS features_adopted,
    ARRAY_AGG(DISTINCT feature_category)      AS feature_list
FROM feature_usage
WHERE feature_category IS NOT NULL
GROUP BY workspace_id;

Feature Engineering: Collaboration and Support Signals

-- Feature: Team collaboration patterns + support ticket sentiment
SELECT
    w.workspace_id,
    -- Collaboration signals
    COUNT(DISTINCT CASE WHEN e.event_type = 'comment_added'
          THEN e.user_id END)                         AS users_commenting,
    COUNT(CASE WHEN e.event_type = 'invite_sent'
          THEN 1 END)                                 AS invites_sent_30d,
    w.seat_count,
    ROUND(COUNT(DISTINCT e.user_id) * 1.0
          / NULLIF(w.seat_count, 0), 2)               AS seat_utilization,
    -- Support signals (from Zendesk)
    COUNT(DISTINCT t.ticket_id)                       AS support_tickets_30d,
    AVG(t.satisfaction_rating)                         AS avg_csat,
    COUNT(CASE WHEN t.priority IN ('high', 'urgent')
          THEN 1 END)                                 AS urgent_tickets_30d,
    -- Billing signals (from Stripe)
    s.plan                                            AS current_plan,
    CASE WHEN s.mrr < LAG(s.mrr) OVER (PARTITION BY s.workspace_id
         ORDER BY s.period_start) THEN 1 ELSE 0 END  AS downgraded_last_period
FROM postgresql.workspaces w
JOIN snowflake.events e ON w.workspace_id = e.workspace_id
    AND e.timestamp >= CURRENT_DATE - INTERVAL '30' DAY
LEFT JOIN zendesk.support_tickets t ON w.workspace_id = t.workspace_id
    AND t.created_at >= CURRENT_DATE - INTERVAL '30' DAY
LEFT JOIN stripe.subscriptions s ON w.workspace_id = s.workspace_id
    AND s.status = 'active'
GROUP BY w.workspace_id, w.seat_count, s.plan, s.mrr, s.period_start;

Stage 4: Orchestration

Zara builds a daily pipeline using the Pipeline Service (Temporal) that computes all churn features, runs the model, and pushes scores to the CRM.

Pipeline Definition

{
  "pipeline": {
    "name": "churn-prediction-daily",
    "schedule": "0 6 * * *",
    "description": "Daily churn risk scoring for all active workspaces",
    "stages": [
      {
        "name": "extract_features",
        "type": "sql_transform",
        "query_ref": "churn_features_v3.sql",
        "output_table": "ml.churn_features_daily",
        "timeout_minutes": 30
      },
      {
        "name": "quality_gate",
        "type": "data_quality",
        "depends_on": ["extract_features"],
        "checks": [
          {
            "name": "event_data_freshness",
            "type": "freshness",
            "table": "events",
            "column": "timestamp",
            "max_stale_hours": 2,
            "severity": "critical",
            "on_failure": "abort_pipeline"
          },
          {
            "name": "feature_completeness",
            "type": "column_not_null",
            "table": "ml.churn_features_daily",
            "columns": ["workspace_id", "avg_dau_7d", "features_adopted"],
            "max_null_pct": 5,
            "severity": "warning"
          },
          {
            "name": "workspace_count",
            "type": "row_count",
            "table": "ml.churn_features_daily",
            "min_rows": 3500,
            "max_rows": 5000,
            "severity": "critical"
          }
        ]
      },
      {
        "name": "score_accounts",
        "type": "model_inference",
        "depends_on": ["quality_gate"],
        "model": "churn-predictor-v3",
        "model_registry": "ml_workbench",
        "input_table": "ml.churn_features_daily",
        "output_table": "ml.churn_scores_daily",
        "batch_size": 500
      },
      {
        "name": "flag_at_risk",
        "type": "sql_transform",
        "depends_on": ["score_accounts"],
        "query": "INSERT INTO ml.at_risk_accounts SELECT workspace_id, churn_probability, risk_tier, scored_at FROM ml.churn_scores_daily WHERE churn_probability >= 0.65",
        "output_table": "ml.at_risk_accounts"
      },
      {
        "name": "sync_to_crm",
        "type": "reverse_sync",
        "depends_on": ["flag_at_risk"],
        "destination": "salesforce",
        "object": "Account",
        "mapping": {
          "workspace_id": "CloudFlow_Workspace_Id__c",
          "churn_probability": "Churn_Risk_Score__c",
          "risk_tier": "Risk_Tier__c"
        }
      }
    ]
  }
}

Quality Gate: Event Freshness

The quality gate on event data freshness is critical. If Segment has a lag > 2 hours, the usage features will understate activity and produce false positives:

Pipeline Run: 2026-02-28 06:00 UTC
  ├── extract_features .............. PASSED (12m 34s)
  ├── quality_gate
  │   ├── event_data_freshness ..... PASSED (latest event: 05:58 UTC, lag: 2 min)
  │   ├── feature_completeness ..... PASSED (null rate: 1.2%)
  │   └── workspace_count .......... PASSED (4,187 workspaces scored)
  ├── score_accounts ............... PASSED (3m 12s)
  ├── flag_at_risk ................. PASSED (847 accounts flagged)
  └── sync_to_crm .................. PASSED (847 records synced)

Stage 5: Analysis

Before training the model, Zara performs deep analysis on churn patterns to validate her feature hypotheses and check for data quality issues.

Churn Pattern Analysis

-- Analyze churn rates by customer segment
SELECT
    CASE
        WHEN s.mrr >= 5000  THEN 'Enterprise ($5K+)'
        WHEN s.mrr >= 1000  THEN 'Mid-Market ($1K-$5K)'
        WHEN s.mrr >= 200   THEN 'SMB ($200-$1K)'
        ELSE 'Self-Serve (< $200)'
    END                                             AS segment,
    COUNT(DISTINCT w.workspace_id)                  AS total_accounts,
    COUNT(DISTINCT CASE WHEN w.status = 'cancelled'
          THEN w.workspace_id END)                  AS churned,
    ROUND(COUNT(DISTINCT CASE WHEN w.status = 'cancelled'
          THEN w.workspace_id END) * 100.0
          / COUNT(DISTINCT w.workspace_id), 1)      AS logo_churn_pct,
    SUM(CASE WHEN w.status = 'cancelled'
        THEN s.mrr ELSE 0 END)                      AS churned_mrr,
    ROUND(SUM(CASE WHEN w.status = 'cancelled' THEN s.mrr ELSE 0 END)
          * 100.0 / SUM(s.mrr), 1)                  AS revenue_churn_pct
FROM workspaces w
JOIN subscriptions s ON w.workspace_id = s.workspace_id
WHERE s.period_start >= CURRENT_DATE - INTERVAL '365' DAY
GROUP BY 1
ORDER BY revenue_churn_pct DESC;

Segment	Accounts	Churned	Logo Churn	Churned MRR	Revenue Churn
Enterprise ($5K+)	120	14	11.7%	$98K	32.1%
Mid-Market ( $1K-$ 5K)	480	72	15.0%	$158K	27.3%
SMB ( $200-$ 1K)	1,800	324	18.0%	$121K	19.8%
Self-Serve (< $200)	1,800	342	19.0%	$53K	16.4%

Key finding: Enterprise accounts have the lowest logo churn but the highest revenue churn -- losing one $7K/month account hurts more than losing ten$ 50/month accounts.

Feature Adoption and Churn Correlation

-- Churn rate by number of core features adopted
SELECT
    features_adopted,
    COUNT(*)                                        AS account_count,
    COUNT(CASE WHEN churned = 1 THEN 1 END)         AS churned_count,
    ROUND(COUNT(CASE WHEN churned = 1 THEN 1 END)
          * 100.0 / COUNT(*), 1)                    AS churn_rate
FROM ml.churn_features_daily
GROUP BY features_adopted
ORDER BY features_adopted;

Features Adopted	Accounts	Churned	Churn Rate
1-2	1,240	372	30.0%
3-4	1,580	221	14.0%
5-6	840	59	7.0%
7+	540	19	3.5%

Accounts using fewer than 3 core features churn at 4x the rate of those using 5+. This validates feature adoption depth as a top predictor.

Survivorship Bias Check

Zara checks for survivorship bias in the training data -- if she only includes currently active accounts, she will miss the patterns of accounts that already churned:

-- Ensure training data includes churned accounts from the past 12 months
SELECT
    CASE WHEN status = 'cancelled' THEN 'Churned' ELSE 'Active' END AS label,
    COUNT(*)                                                          AS count,
    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 1)               AS pct
FROM workspaces
WHERE created_at <= CURRENT_DATE - INTERVAL '90' DAY  -- exclude very new accounts
GROUP BY 1;

Label	Count	Percentage
Active	3,450	82.3%
Churned	742	17.7%

The class imbalance (82/18) is manageable. Zara will use stratified sampling and evaluate with AUC-PR in addition to AUC-ROC.

Stage 6: Productionization

Zara trains and deploys her churn model using the ML Workbench.

Model Training

# ML Workbench: Notebook - churn_model_training_v3.ipynb
import pandas as pd
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.metrics import roc_auc_score, precision_recall_curve, auc
import lightgbm as lgb
import mlflow
 
# Load feature set from Query Engine
features_df = matih.query("""
    SELECT * FROM ml.churn_training_features
    WHERE snapshot_date BETWEEN DATE '2025-03-01' AND DATE '2026-02-28'
""")
 
# Feature columns
feature_cols = [
    'avg_dau_7d', 'avg_dau_14d', 'avg_dau_30d',
    'usage_trend_ratio_7d_30d', 'action_trend_ratio',
    'features_adopted', 'seat_utilization',
    'invites_sent_30d', 'users_commenting',
    'support_tickets_30d', 'urgent_tickets_30d', 'avg_csat',
    'days_since_last_login', 'mrr', 'account_age_days',
    'downgraded_last_period', 'nps_score'
]
 
X = features_df[feature_cols]
y = features_df['churned_within_60d']
 
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42
)
 
# Train gradient boosting model
with mlflow.start_run(run_name="churn-gbm-v3"):
    model = lgb.LGBMClassifier(
        n_estimators=500,
        learning_rate=0.05,
        max_depth=6,
        num_leaves=31,
        min_child_samples=20,
        subsample=0.8,
        colsample_bytree=0.8,
        class_weight='balanced',
        random_state=42
    )
    model.fit(
        X_train, y_train,
        eval_set=[(X_test, y_test)],
        callbacks=[lgb.early_stopping(50)]
    )
 
    # Evaluate
    y_pred = model.predict_proba(X_test)[:, 1]
    auc_roc = roc_auc_score(y_test, y_pred)
    precision, recall, _ = precision_recall_curve(y_test, y_pred)
    auc_pr = auc(recall, precision)
 
    mlflow.log_metric("auc_roc", auc_roc)       # 0.83
    mlflow.log_metric("auc_pr", auc_pr)          # 0.61
    mlflow.log_metric("precision_at_80_recall", 0.52)
    mlflow.sklearn.log_model(model, "churn-predictor-v3")

Feature Importance

Rank	Feature	Importance	Interpretation
1	`usage_trend_ratio_7d_30d`	0.187	Declining usage is the top signal
2	`features_adopted`	0.142	Low feature adoption = high risk
3	`seat_utilization`	0.118	Paying for unused seats = waste signal
4	`days_since_last_login`	0.096	Recency of engagement
5	`support_tickets_30d`	0.083	Frustration signal
6	`avg_dau_7d`	0.074	Raw activity level
7	`nps_score`	0.068	Direct satisfaction measure
8	`downgraded_last_period`	0.062	Explicit contraction signal

Deploy to Ray Serve

Zara registers the model and deploys it for two use cases -- real-time scoring on login and daily batch scoring:

{
  "deployment": {
    "name": "churn-predictor-v3",
    "model_uri": "models:/churn-predictor-v3/production",
    "serving": {
      "realtime": {
        "endpoint": "/api/v1/ml/churn-score",
        "runtime": "ray_serve",
        "num_replicas": 2,
        "max_batch_size": 32,
        "latency_sla_ms": 200,
        "trigger": "user_login_event"
      },
      "batch": {
        "schedule": "0 6 * * *",
        "pipeline": "churn-prediction-daily",
        "output_table": "ml.churn_scores_daily"
      }
    },
    "crm_sync": {
      "destination": "salesforce",
      "object": "Account",
      "field": "Churn_Risk_Score__c",
      "update_frequency": "daily"
    }
  }
}

Stage 7: Feedback

Zara configures comprehensive monitoring for her deployed model using the ML Workbench Model Registry.

Performance Monitoring

{
  "monitoring": {
    "model": "churn-predictor-v3",
    "metrics": [
      {
        "name": "weekly_auc_roc",
        "type": "model_performance",
        "metric": "auc_roc",
        "window": "7d",
        "threshold": 0.78,
        "alert": "slack://ml-alerts"
      },
      {
        "name": "monthly_lift_chart",
        "type": "model_performance",
        "metric": "lift_at_decile",
        "window": "30d",
        "report": "bi_workbench://churn-model-performance"
      },
      {
        "name": "csm_intervention_tracking",
        "type": "business_outcome",
        "metric": "save_rate_for_flagged_accounts",
        "window": "90d",
        "target": 0.35
      }
    ],
    "drift_detection": [
      {
        "feature": "usage_trend_ratio_7d_30d",
        "type": "distribution_shift",
        "method": "psi",
        "threshold": 0.15,
        "note": "Seasonal patterns expected in Dec/Jan -- adjust threshold"
      },
      {
        "feature": "avg_dau_7d",
        "type": "distribution_shift",
        "method": "psi",
        "threshold": 0.10
      }
    ],
    "alerts": [
      {
        "condition": "auc_roc < 0.78 for 2 consecutive weeks",
        "action": "trigger_retraining_pipeline",
        "notify": ["zara@cloudflow.io", "slack://ml-alerts"]
      },
      {
        "condition": "prediction_volume drops > 20%",
        "action": "page_on_call",
        "note": "Likely ingestion pipeline failure, not model issue"
      }
    ]
  }
}

Tracking CSM Intervention Outcomes

The ultimate measure of model value is whether CSM interventions on flagged accounts actually reduce churn:

-- Monthly intervention outcome tracking
SELECT
    DATE_TRUNC('month', s.scored_at)        AS score_month,
    s.risk_tier,
    COUNT(DISTINCT s.workspace_id)          AS accounts_flagged,
    COUNT(DISTINCT CASE WHEN i.intervention_date IS NOT NULL
          THEN s.workspace_id END)          AS accounts_contacted,
    COUNT(DISTINCT CASE WHEN w.status = 'cancelled'
          AND w.cancelled_at <= s.scored_at + INTERVAL '90' DAY
          THEN s.workspace_id END)          AS actually_churned,
    ROUND(COUNT(DISTINCT CASE WHEN w.status = 'cancelled'
          AND w.cancelled_at <= s.scored_at + INTERVAL '90' DAY
          THEN s.workspace_id END) * 100.0
          / COUNT(DISTINCT s.workspace_id), 1) AS churn_rate
FROM ml.churn_scores_daily s
LEFT JOIN csm_interventions i ON s.workspace_id = i.workspace_id
    AND i.intervention_date BETWEEN s.scored_at AND s.scored_at + INTERVAL '14' DAY
LEFT JOIN workspaces w ON s.workspace_id = w.workspace_id
WHERE s.risk_tier = 'high'
GROUP BY 1, 2
ORDER BY 1;

Stage 8: Experimentation

Zara runs structured experiments to improve model accuracy and validate business impact.

Model Comparison

Model	AUC-ROC	AUC-PR	P@80R	Latency (p50)	Notes
Logistic Regression (baseline)	0.76	0.44	0.38	8ms	Interpretable but underfits non-linear patterns
Gradient Boosting (current)	0.83	0.61	0.52	24ms	Best balance of accuracy and speed
Neural Network (MLP)	0.85	0.65	0.56	45ms	Marginal gain, higher latency, less interpretable
Gradient Boosting + NPS	0.86	0.67	0.58	26ms	Adding NPS survey data improves AUC by 0.03

Feature Ablation Study

Zara tests the impact of removing individual features to understand dependencies:

Removed Feature	AUC Change	Conclusion
`usage_trend_ratio`	-0.04	Critical -- keep
`features_adopted`	-0.03	Critical -- keep
`nps_score`	-0.01	Valuable but not critical (sparse data)
`support_tickets_30d`	-0.02	Important signal for enterprise accounts
`account_age_days`	-0.005	Minimal impact -- could remove for simplicity

A/B Test: CSM Outreach Strategies

Zara works with the Customer Success team to A/B test different intervention strategies for high-risk accounts:

{
  "experiment": {
    "name": "csm-outreach-strategy-q1-2026",
    "hypothesis": "Personalized outreach based on specific churn risk factors will have higher save rates than generic check-in calls",
    "variants": [
      {
        "name": "control",
        "description": "Standard quarterly check-in call",
        "allocation": 0.33
      },
      {
        "name": "risk_personalized",
        "description": "Outreach with specific risk factors: 'We noticed your team has not used [feature] recently -- can we schedule a training session?'",
        "allocation": 0.34
      },
      {
        "name": "value_demo",
        "description": "Proactive ROI report showing workspace productivity gains and cost savings",
        "allocation": 0.33
      }
    ],
    "primary_metric": "90_day_retention_rate",
    "secondary_metrics": ["expansion_mrr", "nps_change", "csm_time_per_account"],
    "duration_weeks": 12,
    "min_sample_per_variant": 100
  }
}

Early results (week 8):

Variant	Accounts	Retained	Retention Rate	Expansion MRR
Control (standard check-in)	112	81	72.3%	$2,100
Risk-personalized outreach	118	98	83.1%	$8,400
Proactive value demo	110	91	82.7%	$12,300

The risk-personalized outreach shows the highest retention rate, while the proactive value demo drives the most expansion revenue -- suggesting a combined strategy may be optimal.

Summary

Stage	Key Action	Platform Component	Outcome
1. Ingestion	Connected Salesforce CRM, Zendesk for support + billing signals	Ingestion Service (Airbyte)	7 data sources flowing, real-time + batch
2. Discovery	Found 347 event types, profiled 40% at-risk users, traced existing health_score lineage	Data Workbench Catalog	Identified data gaps and existing assets
3. Query	Built 15+ churn features via federated SQL across 3 sources	Query Engine (Trino)	Usage trends, adoption, collaboration, support, billing features
4. Orchestration	Daily pipeline with quality gates (event freshness, completeness)	Pipeline Service (Temporal)	Automated scoring with CRM sync
5. Analysis	Uncovered 25% revenue churn driven by enterprise segment, 4x churn rate for low-adoption accounts	Data Workbench + Query Engine	Data-driven feature selection, survivorship bias check
6. Productionization	Deployed gradient boosting model (AUC 0.83) to Ray Serve + daily batch scoring	ML Service (Ray Serve)	Real-time and batch scoring, Salesforce integration
7. Feedback	Weekly AUC monitoring, feature drift detection, CSM intervention tracking	ML Workbench Model Registry	Automated alerts, retraining triggers, business outcome tracking
8. Experimentation	Compared 4 model architectures, tested CSM outreach strategies via A/B test	ML Workbench Experiments	83% retention with personalized outreach vs 72% baseline

Related Walkthroughs

ML Engineer Journey -- Raj builds the feature recommendation engine that improves adoption (Zara's #2 predictor)
BI Lead Journey -- Emily builds the PLG dashboards that surface churn scores to stakeholders
Executive Journey -- Michael uses churn data for strategic pricing and retention decisions
SaaS & Technology Overview -- Industry context and dataset reference

Industry Overview ML Engineer Journey