ML Engineer Journey: Intelligent Feature Recommendation Engine

Persona: Raj Patel, ML Engineer at CloudFlow Goal: Build a recommendation system that suggests relevant product features to users based on their behavior patterns and similar user profiles, increasing feature adoption from 23% to 40% and reducing time-to-value for new users.

Primary Workbenches: ML Workbench, Data Workbench Supporting Services: Ingestion Service, Catalog Service, Query Engine, Pipeline Service, ML Service (Ray Serve), AI Service

Business Context

CloudFlow has 10 core feature areas (kanban boards, gantt charts, timelines, automations, forms, dashboards, integrations, reports, API access, team management) -- but the average workspace only uses 3.4 of them. The product team ships features that users never discover. The onboarding flow is generic: every user sees the same getting-started wizard regardless of their role, team size, or industry.

Raj's mandate: build a recommendation engine that surfaces the right feature to the right user at the right time, driving adoption and stickiness. This directly feeds Zara's churn model -- feature adoption is the #2 predictor of retention.

  Current State                         Target State
  ┌────────────────────────┐            ┌────────────────────────┐
  │ Generic onboarding     │            │ Personalized feature   │
  │ Same flow for everyone │            │ recommendations        │
  │                        │            │                        │
  │ 3.4 features adopted   │  ────▶     │ 5.2 features adopted   │
  │ 23% adoption rate      │            │ 40% adoption rate      │
  │ 2.4 day time-to-value  │            │ < 1 day time-to-value  │
  └────────────────────────┘            └────────────────────────┘

Stage 1: Ingestion

Raj configures the data sources needed for the recommendation engine, focusing on user behavior signals and feature metadata.

Event Stream Configuration

The primary data source is the product event stream, already flowing through Kafka via Segment. Raj verifies the ingestion configuration:

{
  "source": "segment_kafka",
  "connector": "airbyte/source-kafka",
  "config": {
    "topic_pattern": "segment.cloudflow.events.*",
    "consumer_group": "matih-recommender-ingestion",
    "sync_mode": "streaming",
    "deserialization": "json",
    "event_filter": {
      "include_types": [
        "feature_used", "feature_discovered", "page_viewed",
        "task_created", "task_completed", "board_created",
        "automation_created", "integration_connected",
        "report_generated", "dashboard_created", "form_created",
        "gantt_viewed", "timeline_created", "api_key_generated"
      ],
      "exclude_types": ["heartbeat", "session_keepalive"]
    }
  }
}

Additional Sources

Source	Purpose	Sync Mode	Frequency
Product PostgreSQL (users)	User profiles, roles, team size	CDC (incremental)	Every 15 min
Product PostgreSQL (feature_flags)	Feature flag configs, rollout state	CDC (incremental)	Every 15 min
Product PostgreSQL (workspaces)	Workspace metadata, plan, industry	CDC (incremental)	Every 15 min
A/B test assignments	Experiment cohort data	CDC (incremental)	Real-time
Zendesk (support tickets)	Feature-related support topics	Incremental	Every 15 min

Event Schema

{
  "event_id": "evt_8f3a2b1c",
  "user_id": "usr_12345",
  "workspace_id": "ws_67890",
  "event_type": "feature_used",
  "properties": {
    "feature_category": "automations",
    "feature_name": "create_automation_rule",
    "context": "project_settings",
    "session_id": "sess_abc123",
    "session_sequence": 14,
    "time_on_page_seconds": 45,
    "is_first_use": true
  },
  "user_traits": {
    "role": "project_manager",
    "team_size": 12,
    "plan": "business",
    "signup_cohort": "2025-Q4"
  },
  "timestamp": "2026-02-28T14:23:17Z"
}

Stage 2: Discovery

Raj explores the data catalog to understand feature usage patterns, map the feature taxonomy, and identify cold-start challenges.

Event Volume Analysis

-- Profile event volume and feature coverage
SELECT
    DATE_TRUNC('week', timestamp)               AS week,
    COUNT(*)                                     AS total_events,
    COUNT(DISTINCT user_id)                      AS unique_users,
    COUNT(DISTINCT
        JSON_EXTRACT_SCALAR(properties, '$.feature_category')
    )                                            AS distinct_features,
    COUNT(CASE WHEN JSON_EXTRACT_SCALAR(properties, '$.is_first_use') = 'true'
          THEN 1 END)                            AS first_use_events
FROM events
WHERE event_type = 'feature_used'
  AND timestamp >= CURRENT_DATE - INTERVAL '12' WEEK
GROUP BY 1
ORDER BY 1;

Volume profile: 50M events/day across 347 distinct event types. Feature usage events account for approximately 12M events/day after filtering.

Feature-to-Capability Ontology

Raj builds a mapping from raw event types to feature capabilities, which becomes the item space for the recommendation engine:

-- Build feature ontology from event patterns and feature flags
SELECT
    ff.flag_id,
    ff.name                                      AS feature_name,
    ff.rollout_percentage,
    ff.target_segments,
    COUNT(DISTINCT e.user_id)                    AS users_30d,
    COUNT(DISTINCT e.workspace_id)               AS workspaces_30d,
    COUNT(*)                                     AS events_30d,
    MIN(e.timestamp)                             AS first_event,
    MAX(e.timestamp)                             AS last_event
FROM feature_flags ff
LEFT JOIN events e ON JSON_EXTRACT_SCALAR(e.properties, '$.feature_category')
    = ff.name
    AND e.timestamp >= CURRENT_DATE - INTERVAL '30' DAY
WHERE ff.enabled = true
GROUP BY ff.flag_id, ff.name, ff.rollout_percentage, ff.target_segments
ORDER BY users_30d DESC;

Feature	Users (30d)	Workspaces	Rollout	Adoption Rate
kanban_boards	52,000	3,800	100%	52%
task_management	48,000	3,600	100%	48%
file_sharing	31,000	2,900	100%	31%
automations	8,200	1,400	100%	8.2%
gantt_charts	6,100	980	100%	6.1%
custom_dashboards	4,300	720	80%	5.4%
api_access	3,100	510	100%	3.1%
forms	2,800	480	100%	2.8%
advanced_reports	2,200	390	60%	3.7%
timelines	1,900	340	100%	1.9%

Cold-Start User Profiling

-- Profile users who signed up in the last 14 days (cold-start population)
SELECT
    u.role,
    w.plan,
    w.seat_count AS team_size_bucket,
    COUNT(DISTINCT u.user_id)                    AS new_users,
    AVG(COALESCE(activity.event_count, 0))       AS avg_events,
    AVG(COALESCE(activity.features_used, 0))     AS avg_features_used
FROM users u
JOIN workspaces w ON u.workspace_id = w.workspace_id
LEFT JOIN (
    SELECT user_id,
           COUNT(*) AS event_count,
           COUNT(DISTINCT JSON_EXTRACT_SCALAR(properties, '$.feature_category'))
               AS features_used
    FROM events
    WHERE timestamp >= CURRENT_DATE - INTERVAL '14' DAY
    GROUP BY user_id
) activity ON u.user_id = activity.user_id
WHERE u.signup_date >= CURRENT_DATE - INTERVAL '14' DAY
GROUP BY u.role, w.plan, w.seat_count
ORDER BY new_users DESC;

Cold-start users (< 5 interactions) represent 35% of the active user base at any given time. The recommendation engine must handle these users using contextual features (role, team size, plan) rather than behavioral history.

Stage 3: Query

Raj builds the user-feature interaction matrix and contextual feature vectors that will feed the recommendation models.

User-Feature Interaction Matrix

-- Build interaction matrix: users x features with interaction strength
CREATE TABLE ml.user_feature_interactions AS
WITH interactions AS (
    SELECT
        e.user_id,
        JSON_EXTRACT_SCALAR(e.properties, '$.feature_category') AS feature,
        COUNT(*)                                     AS use_count,
        COUNT(DISTINCT DATE(e.timestamp))            AS active_days,
        MAX(e.timestamp)                             AS last_used,
        MIN(e.timestamp)                             AS first_used,
        -- Recency-weighted interaction score
        SUM(1.0 / (1 + DATE_DIFF('day', DATE(e.timestamp), CURRENT_DATE)))
                                                     AS recency_weighted_score
    FROM events e
    WHERE e.event_type = 'feature_used'
      AND e.timestamp >= CURRENT_DATE - INTERVAL '90' DAY
    GROUP BY e.user_id,
             JSON_EXTRACT_SCALAR(e.properties, '$.feature_category')
)
SELECT
    user_id,
    feature,
    use_count,
    active_days,
    last_used,
    first_used,
    recency_weighted_score,
    -- Normalize to 0-1 interaction strength
    ROUND(
        (0.4 * LEAST(use_count / 100.0, 1.0)) +
        (0.3 * LEAST(active_days / 30.0, 1.0)) +
        (0.3 * recency_weighted_score / 10.0),
    3) AS interaction_strength
FROM interactions;

Session-Level Feature Usage Sequences

Understanding the order in which users discover features helps predict what they should try next:

-- Feature usage sequences within sessions
WITH feature_sequences AS (
    SELECT
        e.user_id,
        e.session_id,
        JSON_EXTRACT_SCALAR(e.properties, '$.feature_category') AS feature,
        ROW_NUMBER() OVER (PARTITION BY e.session_id
            ORDER BY e.timestamp)                    AS seq_num
    FROM events e
    WHERE e.event_type = 'feature_used'
      AND e.timestamp >= CURRENT_DATE - INTERVAL '30' DAY
)
-- Find common feature transitions (feature A -> feature B)
SELECT
    a.feature                                        AS from_feature,
    b.feature                                        AS to_feature,
    COUNT(*)                                         AS transition_count,
    COUNT(DISTINCT a.user_id)                        AS unique_users,
    ROUND(COUNT(*) * 1.0 /
        SUM(COUNT(*)) OVER (PARTITION BY a.feature), 3) AS transition_prob
FROM feature_sequences a
JOIN feature_sequences b
    ON a.session_id = b.session_id
    AND b.seq_num = a.seq_num + 1
    AND a.feature != b.feature
GROUP BY a.feature, b.feature
HAVING COUNT(*) >= 50
ORDER BY transition_count DESC
LIMIT 20;

From Feature	To Feature	Transitions	Users	Probability
kanban_boards	automations	12,400	3,200	0.18
task_management	gantt_charts	8,900	2,800	0.14
kanban_boards	custom_dashboards	7,200	2,100	0.11
file_sharing	forms	5,800	1,900	0.09
automations	advanced_reports	4,100	1,200	0.15

Collaborative Filtering Signals

-- Users similar to you also use these features
-- Compute user similarity based on feature overlap (Jaccard index)
WITH user_features AS (
    SELECT DISTINCT user_id, feature
    FROM ml.user_feature_interactions
    WHERE interaction_strength >= 0.2
),
user_pairs AS (
    SELECT
        a.user_id AS user_a,
        b.user_id AS user_b,
        COUNT(*)  AS shared_features,
        (SELECT COUNT(DISTINCT feature) FROM user_features
         WHERE user_id = a.user_id)                  AS features_a,
        (SELECT COUNT(DISTINCT feature) FROM user_features
         WHERE user_id = b.user_id)                  AS features_b
    FROM user_features a
    JOIN user_features b ON a.feature = b.feature AND a.user_id < b.user_id
    GROUP BY a.user_id, b.user_id
)
SELECT
    user_a,
    user_b,
    shared_features,
    ROUND(shared_features * 1.0 /
        (features_a + features_b - shared_features), 3) AS jaccard_similarity
FROM user_pairs
WHERE shared_features >= 3
ORDER BY jaccard_similarity DESC;

Context Features

-- User context vector for cold-start and contextual bandit
SELECT
    u.user_id,
    u.role,
    w.plan,
    w.seat_count,
    CASE
        WHEN w.seat_count <= 5   THEN 'small'
        WHEN w.seat_count <= 20  THEN 'medium'
        WHEN w.seat_count <= 100 THEN 'large'
        ELSE 'enterprise'
    END                                              AS team_size_bucket,
    DATE_DIFF('day', u.signup_date, CURRENT_DATE)    AS account_age_days,
    COALESCE(fi.features_used, 0)                    AS features_used_count,
    COALESCE(fi.total_events, 0)                     AS total_events_7d
FROM users u
JOIN workspaces w ON u.workspace_id = w.workspace_id
LEFT JOIN (
    SELECT user_id,
           COUNT(DISTINCT JSON_EXTRACT_SCALAR(properties, '$.feature_category'))
               AS features_used,
           COUNT(*) AS total_events
    FROM events
    WHERE timestamp >= CURRENT_DATE - INTERVAL '7' DAY
    GROUP BY user_id
) fi ON u.user_id = fi.user_id;

Stage 4: Orchestration

Raj builds a two-phase pipeline: daily batch processing for model training data and ANN index updates, plus a real-time serving path.

Training Pipeline

{
  "pipeline": {
    "name": "recommender-training-daily",
    "schedule": "0 4 * * *",
    "description": "Daily retraining of recommendation models and ANN index refresh",
    "stages": [
      {
        "name": "build_interaction_matrix",
        "type": "sql_transform",
        "query_ref": "interaction_matrix_v2.sql",
        "output_table": "ml.user_feature_interactions",
        "timeout_minutes": 45
      },
      {
        "name": "compute_transition_probs",
        "type": "sql_transform",
        "depends_on": ["build_interaction_matrix"],
        "query_ref": "feature_transitions.sql",
        "output_table": "ml.feature_transition_probs"
      },
      {
        "name": "build_context_vectors",
        "type": "sql_transform",
        "depends_on": ["build_interaction_matrix"],
        "query_ref": "user_context_vectors.sql",
        "output_table": "ml.user_context_vectors"
      },
      {
        "name": "quality_gate",
        "type": "data_quality",
        "depends_on": ["build_interaction_matrix", "compute_transition_probs", "build_context_vectors"],
        "checks": [
          {
            "name": "interaction_matrix_coverage",
            "type": "custom_sql",
            "query": "SELECT COUNT(DISTINCT user_id) FROM ml.user_feature_interactions",
            "min_value": 60000,
            "severity": "critical"
          },
          {
            "name": "feature_coverage",
            "type": "custom_sql",
            "query": "SELECT COUNT(DISTINCT feature) FROM ml.user_feature_interactions",
            "min_value": 8,
            "severity": "critical"
          }
        ]
      },
      {
        "name": "train_matrix_factorization",
        "type": "model_training",
        "depends_on": ["quality_gate"],
        "model_type": "als_matrix_factorization",
        "input_table": "ml.user_feature_interactions",
        "hyperparameters": {
          "factors": 64,
          "regularization": 0.01,
          "iterations": 15,
          "alpha": 40
        },
        "output": "models:/feature-recommender-mf/staging"
      },
      {
        "name": "train_contextual_bandit",
        "type": "model_training",
        "depends_on": ["quality_gate"],
        "model_type": "contextual_bandit",
        "input_tables": ["ml.user_context_vectors", "ml.recommendation_feedback"],
        "hyperparameters": {
          "exploration_rate": 0.1,
          "decay_factor": 0.995
        },
        "output": "models:/feature-recommender-bandit/staging"
      },
      {
        "name": "rebuild_ann_index",
        "type": "custom",
        "depends_on": ["train_matrix_factorization"],
        "command": "build_ann_index",
        "config": {
          "embedding_source": "models:/feature-recommender-mf/staging",
          "index_type": "hnsw",
          "metric": "cosine",
          "ef_construction": 200,
          "M": 16
        }
      },
      {
        "name": "evaluate_and_promote",
        "type": "model_evaluation",
        "depends_on": ["train_matrix_factorization", "train_contextual_bandit"],
        "metrics": ["ndcg@5", "hit_rate@3", "diversity_score"],
        "promotion_criteria": {
          "ndcg@5": "> 0.35",
          "hit_rate@3": "> 0.40"
        },
        "promote_to": "production"
      }
    ]
  }
}

Pipeline Execution Flow

Pipeline Run: 2026-02-28 04:00 UTC
  ├── build_interaction_matrix ......... PASSED (22m 18s, 4.2M rows)
  ├── compute_transition_probs ......... PASSED (8m 45s)
  ├── build_context_vectors ............ PASSED (5m 12s)
  ├── quality_gate
  │   ├── interaction_matrix_coverage .. PASSED (72,400 users)
  │   └── feature_coverage ............. PASSED (10 features)
  ├── train_matrix_factorization ....... PASSED (34m 22s)
  ├── train_contextual_bandit .......... PASSED (18m 05s)
  ├── rebuild_ann_index ................ PASSED (6m 44s)
  └── evaluate_and_promote
      ├── ndcg@5 ....................... 0.41 (threshold: 0.35) PASSED
      ├── hit_rate@3 ................... 0.47 (threshold: 0.40) PASSED
      ├── diversity_score .............. 0.72
      └── promotion .................... Promoted to production

Stage 5: Analysis

Raj analyzes recommendation quality, identifies biases, and validates that the system provides genuine value rather than just recommending popular features.

Popularity Bias Check

-- Check if recommendations are biased toward already-popular features
WITH recommendation_distribution AS (
    SELECT
        recommended_feature,
        COUNT(*)                                     AS times_recommended,
        COUNT(DISTINCT user_id)                      AS unique_users_shown
    FROM ml.recommendation_log
    WHERE generated_at >= CURRENT_DATE - INTERVAL '7' DAY
    GROUP BY recommended_feature
),
usage_distribution AS (
    SELECT
        feature,
        COUNT(DISTINCT user_id)                      AS current_users
    FROM ml.user_feature_interactions
    GROUP BY feature
)
SELECT
    r.recommended_feature,
    r.times_recommended,
    u.current_users,
    ROUND(r.times_recommended * 1.0 /
        SUM(r.times_recommended) OVER (), 3)         AS rec_share,
    ROUND(u.current_users * 1.0 /
        SUM(u.current_users) OVER (), 3)             AS usage_share,
    ROUND(r.times_recommended * 1.0 /
        SUM(r.times_recommended) OVER () -
        u.current_users * 1.0 /
        SUM(u.current_users) OVER (), 3)             AS popularity_bias
FROM recommendation_distribution r
JOIN usage_distribution u ON r.recommended_feature = u.feature
ORDER BY popularity_bias DESC;

Feature	Rec Share	Usage Share	Bias	Assessment
automations	0.22	0.08	+0.14	Good: recommending underadopted, high-value feature
gantt_charts	0.15	0.06	+0.09	Good: targeting relevant users
custom_dashboards	0.14	0.05	+0.09	Good: pushing discovery
kanban_boards	0.08	0.52	-0.44	Good: NOT over-recommending already-popular feature
task_management	0.06	0.48	-0.42	Good: NOT over-recommending already-popular feature

The model correctly avoids recommending features that users likely already know about, and promotes discovery of underadopted features with high value.

Cold-Start User Performance

-- Recommendation quality for cold-start vs warm users
SELECT
    CASE
        WHEN user_event_count < 5   THEN 'Cold Start (< 5 events)'
        WHEN user_event_count < 50  THEN 'Warm (5-49 events)'
        ELSE 'Active (50+ events)'
    END                                              AS user_segment,
    COUNT(*)                                         AS recommendations,
    ROUND(AVG(CASE WHEN clicked = 1 THEN 1.0 ELSE 0 END), 3)
                                                     AS click_through_rate,
    ROUND(AVG(CASE WHEN adopted_7d = 1 THEN 1.0 ELSE 0 END), 3)
                                                     AS adoption_rate_7d
FROM ml.recommendation_feedback
WHERE generated_at >= CURRENT_DATE - INTERVAL '30' DAY
GROUP BY 1
ORDER BY 1;

Segment	Recommendations	CTR	7d Adoption
Cold Start (< 5 events)	48,200	0.12	0.04
Warm (5-49 events)	82,400	0.19	0.09
Active (50+ events)	34,600	0.24	0.14

Cold-start users have lower CTR as expected, but the contextual bandit still outperforms the random baseline (0.03 CTR) by 4x. The role and team_size context features carry most of the signal for new users.

Recommendation Diversity Analysis

Raj ensures recommendations are diverse -- showing the same 3 features repeatedly does not help adoption:

-- Measure recommendation diversity per user (7-day window)
SELECT
    ROUND(avg_unique_recs / total_recs, 2)           AS diversity_score,
    COUNT(*)                                         AS user_count
FROM (
    SELECT
        user_id,
        COUNT(DISTINCT recommended_feature)          AS avg_unique_recs,
        COUNT(*)                                     AS total_recs
    FROM ml.recommendation_log
    WHERE generated_at >= CURRENT_DATE - INTERVAL '7' DAY
    GROUP BY user_id
) per_user
GROUP BY 1
ORDER BY 1;

Average diversity score: 0.72 (users see 7.2 unique features out of 10 total over a week). Target: > 0.60 to ensure sufficient exploration.

Stage 6: Productionization

Raj deploys a two-stage recommendation system: candidate generation (ANN index lookup) followed by ranking (contextual bandit), served via Ray Serve.

System Architecture

                  Recommendation Serving Architecture

  User Login / Page Load
         │
         ▼
  ┌──────────────────┐     ┌───────────────────┐
  │  API Gateway     │────▶│  Ray Serve         │
  │  /api/v1/recs    │     │  (2 replicas)      │
  └──────────────────┘     │                    │
                           │  Stage 1: Candidate│
                           │  Generation        │
                           │  ┌──────────────┐  │
                           │  │ ANN Index    │  │
                           │  │ (HNSW)       │  │
                           │  │ Top 20       │  │
                           │  │ candidates   │  │
                           │  └──────┬───────┘  │
                           │         │          │
                           │  Stage 2: Ranking  │
                           │  ┌──────────────┐  │
                           │  │ Contextual   │  │
                           │  │ Bandit       │  │
                           │  │ Top 3        │  │
                           │  │ ranked       │  │
                           │  └──────┬───────┘  │
                           └─────────┼──────────┘
                                     │
                                     ▼
                           ┌──────────────────┐
                           │  In-App Display   │
                           │  - Tooltip        │
                           │  - Sidebar        │
                           │  - Onboarding     │
                           └──────────────────┘

Deployment Configuration

{
  "deployment": {
    "name": "feature-recommender-v2",
    "serving": {
      "runtime": "ray_serve",
      "endpoint": "/api/v1/recommendations",
      "num_replicas": 2,
      "max_concurrent_queries": 100,
      "autoscaling": {
        "min_replicas": 2,
        "max_replicas": 8,
        "target_qps_per_replica": 200
      }
    },
    "models": {
      "candidate_generator": {
        "model_uri": "models:/feature-recommender-mf/production",
        "ann_index": "indexes:/feature-recommender-hnsw/latest",
        "top_k": 20
      },
      "ranker": {
        "model_uri": "models:/feature-recommender-bandit/production",
        "output_k": 3,
        "exploration_rate": 0.1
      }
    },
    "performance_sla": {
      "p50_latency_ms": 35,
      "p99_latency_ms": 100,
      "availability": 0.999
    },
    "response_format": {
      "recommendations": [
        {
          "feature": "automations",
          "score": 0.87,
          "reason": "Teams similar to yours increased productivity 23% with automations",
          "cta_text": "Try Automations",
          "cta_url": "/features/automations/getting-started"
        }
      ]
    }
  }
}

API Response Example

{
  "user_id": "usr_12345",
  "recommendations": [
    {
      "feature": "automations",
      "score": 0.87,
      "reason": "Teams your size with Kanban boards typically adopt automations next",
      "cta_text": "Automate your workflow",
      "cta_url": "/features/automations/getting-started",
      "position": 1
    },
    {
      "feature": "custom_dashboards",
      "score": 0.72,
      "reason": "Track your project metrics in one view",
      "cta_text": "Create a Dashboard",
      "cta_url": "/features/dashboards/templates",
      "position": 2
    },
    {
      "feature": "integrations",
      "score": 0.68,
      "reason": "Connect CloudFlow with the tools your team already uses",
      "cta_text": "Browse Integrations",
      "cta_url": "/settings/integrations",
      "position": 3
    }
  ],
  "model_version": "feature-recommender-v2",
  "generated_at": "2026-02-28T14:23:17Z",
  "latency_ms": 28
}

Stage 7: Feedback

Raj configures a comprehensive feedback loop to track recommendation quality and business impact in real time.

Feedback Metrics

{
  "monitoring": {
    "model": "feature-recommender-v2",
    "realtime_metrics": [
      {
        "name": "click_through_rate",
        "type": "engagement",
        "definition": "recommendations clicked / recommendations shown",
        "window": "1h",
        "alert_threshold": "< 0.08",
        "target": 0.18
      },
      {
        "name": "feature_adoption_after_rec",
        "type": "business_outcome",
        "definition": "users who used recommended feature within 7 days / users shown recommendation",
        "window": "7d_rolling",
        "target": 0.12
      },
      {
        "name": "time_to_first_use",
        "type": "business_outcome",
        "definition": "median time from recommendation shown to first use of recommended feature",
        "window": "30d",
        "target_hours": 48
      }
    ],
    "quality_metrics": [
      {
        "name": "recommendation_diversity",
        "type": "quality",
        "definition": "unique features recommended / total recommendations per user per week",
        "target": 0.65
      },
      {
        "name": "novelty_score",
        "type": "quality",
        "definition": "avg inverse popularity of recommended features",
        "target": 0.40,
        "note": "Higher novelty means recommending less-obvious features"
      },
      {
        "name": "coverage",
        "type": "quality",
        "definition": "features that appear in at least 1% of recommendations / total features",
        "target": 0.80
      }
    ],
    "serving_metrics": [
      {
        "name": "p99_latency",
        "threshold_ms": 100,
        "alert": "pagerduty://ml-oncall"
      },
      {
        "name": "error_rate",
        "threshold": 0.001,
        "alert": "slack://ml-alerts"
      }
    ]
  }
}

Feedback Collection Pipeline

-- Track recommendation feedback (runs every hour)
INSERT INTO ml.recommendation_feedback
SELECT
    r.recommendation_id,
    r.user_id,
    r.recommended_feature,
    r.score,
    r.position,
    r.generated_at,
    -- Did user click the recommendation?
    CASE WHEN c.click_timestamp IS NOT NULL THEN 1 ELSE 0 END AS clicked,
    c.click_timestamp,
    -- Did user adopt the feature within 7 days?
    CASE WHEN e.first_use_timestamp IS NOT NULL
         AND e.first_use_timestamp <= r.generated_at + INTERVAL '7' DAY
         THEN 1 ELSE 0 END AS adopted_7d,
    e.first_use_timestamp
FROM ml.recommendation_log r
LEFT JOIN ml.recommendation_clicks c
    ON r.recommendation_id = c.recommendation_id
LEFT JOIN (
    SELECT user_id,
           JSON_EXTRACT_SCALAR(properties, '$.feature_category') AS feature,
           MIN(timestamp) AS first_use_timestamp
    FROM events
    WHERE event_type = 'feature_used'
      AND JSON_EXTRACT_SCALAR(properties, '$.is_first_use') = 'true'
    GROUP BY user_id, JSON_EXTRACT_SCALAR(properties, '$.feature_category')
) e ON r.user_id = e.user_id AND r.recommended_feature = e.feature
WHERE r.generated_at >= CURRENT_TIMESTAMP - INTERVAL '8' DAY
  AND r.recommendation_id NOT IN (SELECT recommendation_id FROM ml.recommendation_feedback);

Stage 8: Experimentation

Raj runs structured experiments to optimize the recommendation strategy, placement, and personalization approach.

Experiment 1: Personalization Strategy

{
  "experiment": {
    "name": "rec-personalization-strategy",
    "hypothesis": "Hybrid collaborative + content-based filtering outperforms either alone",
    "variants": [
      {
        "name": "collaborative_only",
        "description": "Matrix factorization based on user-feature interactions only",
        "allocation": 0.25
      },
      {
        "name": "content_based",
        "description": "Recommend based on user role, team size, and industry similarity",
        "allocation": 0.25
      },
      {
        "name": "hybrid",
        "description": "Two-stage: MF candidates + contextual bandit ranking (current model)",
        "allocation": 0.25
      },
      {
        "name": "sequence_aware",
        "description": "Hybrid + feature transition probabilities as additional signal",
        "allocation": 0.25
      }
    ],
    "primary_metric": "feature_adoption_7d",
    "secondary_metrics": ["click_through_rate", "diversity_score", "p99_latency"],
    "duration_weeks": 4,
    "min_sample_per_variant": 5000
  }
}

Results (week 4):

Strategy	CTR	7d Adoption	Diversity	P99 Latency
Collaborative only	0.16	0.08	0.58	42ms
Content-based	0.14	0.07	0.71	22ms
Hybrid (current)	0.19	0.11	0.72	68ms
Sequence-aware	0.21	0.13	0.69	74ms

Sequence-aware hybrid shows the best adoption rate. Raj plans to promote it to production after validating latency at higher traffic volumes.

Experiment 2: Recommendation Placement

{
  "experiment": {
    "name": "rec-placement-test",
    "hypothesis": "In-context tooltips near related features outperform sidebar suggestions",
    "variants": [
      {
        "name": "onboarding_wizard",
        "description": "Recommendations shown during initial onboarding flow only",
        "allocation": 0.25
      },
      {
        "name": "sidebar_panel",
        "description": "Persistent sidebar with 'Recommended for You' section",
        "allocation": 0.25
      },
      {
        "name": "in_context_tooltip",
        "description": "Contextual tooltips appearing near related features during use",
        "allocation": 0.25
      },
      {
        "name": "email_digest",
        "description": "Weekly email with personalized feature recommendations",
        "allocation": 0.25
      }
    ],
    "primary_metric": "feature_adoption_30d",
    "secondary_metrics": ["user_satisfaction", "dismissal_rate"],
    "duration_weeks": 6
  }
}

Results (week 6):

Placement	30d Adoption	CTR	Dismissal Rate	User Satisfaction
Onboarding wizard	0.15	0.22	0.08	4.1/5
Sidebar panel	0.09	0.11	0.34	3.4/5
In-context tooltip	0.27	0.31	0.12	4.3/5
Email digest	0.06	0.04	N/A	3.8/5

In-context tooltips achieve 27% feature adoption -- 3x better than the sidebar and nearly 2x better than the onboarding wizard. Users respond best when they see recommendations at the moment they are doing related work.

Combined Impact

After deploying the optimized recommendation engine (sequence-aware hybrid model + in-context tooltip placement):

Metric	Before	After	Change
Average features adopted per workspace	3.4	4.8	+41%
Feature adoption rate (30d)	23%	34%	+48%
Time to value (new users)	2.4 days	1.1 days	-54%
Churn rate (accounts with > 5 features)	3.5%	3.1%	-11%

Summary

Stage	Key Action	Platform Component	Outcome
1. Ingestion	Configured event stream, user profiles, feature flags, A/B assignments	Ingestion Service (Airbyte + Kafka)	50M events/day flowing with real-time and batch sources
2. Discovery	Mapped 347 event types, built feature ontology, profiled cold-start users	Data Workbench Catalog	Identified 10 feature categories, 35% cold-start population
3. Query	Built interaction matrix, feature sequences, collaborative signals, context vectors	Query Engine (Trino)	4.2M interaction rows, transition probability graph
4. Orchestration	Daily pipeline for model training, ANN index refresh, evaluation, promotion	Pipeline Service (Temporal)	Automated training with quality gates and auto-promotion
5. Analysis	Validated no popularity bias, analyzed cold-start performance, checked diversity	Data Workbench + Query Engine	Model recommends underadopted features, 4x better than random for cold-start
6. Productionization	Two-stage system (ANN + contextual bandit) on Ray Serve, P99 < 100ms	ML Service (Ray Serve)	Real-time recommendations via API, in-app tooltip delivery
7. Feedback	CTR, adoption rate, diversity, novelty tracking; hourly feedback pipeline	ML Workbench Model Registry	Automated monitoring with retraining triggers
8. Experimentation	Tested 4 personalization strategies, 4 placements; 27% adoption with in-context tooltips	ML Workbench Experiments	+41% feature adoption, -54% time to value

Related Walkthroughs

Data Scientist Journey -- Zara uses feature adoption (Raj's #1 output) as the #2 predictor of churn
BI Lead Journey -- Emily tracks feature adoption dashboards powered by Raj's recommendation data
Executive Journey -- Michael reviews product-market fit metrics that include recommendation-driven adoption
SaaS & Technology Overview -- Industry context and dataset reference

Data Scientist Journey BI Lead Journey