ML Engineer Journey: Real-Time Fraud Detection System

Persona: Kenji, ML Engineer at Meridian Bank (Fraud Operations team, 4 years experience) Objective: Build and operate a real-time fraud detection pipeline processing 10K transactions per minute with sub-50ms scoring latency Timeline: Continuous operation with monthly model refresh cycles Datasets: transactions (50M), accounts (500K), fraud_cases (15K), payment_messages (12M)

Stage 1: Ingestion

Kenji's fraud detection system requires a hybrid ingestion strategy: real-time streaming for transaction scoring and batch ingestion for model retraining.

Streaming Ingestion from Payment Gateway

The payment gateway publishes every transaction to a Kafka topic. Kenji configures the Airbyte Kafka connector for real-time ingestion:

{
  "source": {
    "type": "kafka",
    "name": "payment-gateway-stream",
    "connection": {
      "bootstrap_servers": "kafka.meridian.internal:9092",
      "topic": "payments.transactions.authorized",
      "consumer_group": "matih-fraud-ingestion",
      "format": "avro",
      "schema_registry": "http://schema-registry.meridian.internal:8081"
    },
    "sync_mode": "streaming",
    "processing_guarantees": "exactly_once",
    "batch_size": 1000,
    "poll_interval_ms": 100
  }
}

Batch Sources for Model Training

Source	Connector	Sync Mode	Schedule	Purpose
Core Banking PostgreSQL	Airbyte PostgreSQL	CDC incremental	Every 15 min	Account profiles, balances
Fraud Case Management	Airbyte PostgreSQL	Incremental	Hourly	Confirmed fraud labels
IP Reputation API	Airbyte REST API	Full refresh	Daily	IP risk scores, geolocation
Device Fingerprint DB	Airbyte MongoDB	Incremental	Every 15 min	Device trust scores

Ingestion Monitoring

Kenji monitors ingestion lag to ensure real-time freshness:

┌─────────────────────────────────────────────────────────────┐
│           Transaction Ingestion Health                       │
├─────────────┬──────────┬────────────┬───────────────────────┤
│ Source      │ Lag      │ Throughput │ Status                │
├─────────────┼──────────┼────────────┼───────────────────────┤
│ Kafka       │ 1.2s     │ 10,247/min │ Healthy               │
│ Core Banking│ 8m 14s   │ 2,100/sync │ Healthy               │
│ Fraud Cases │ 22m      │ 48/sync    │ Healthy               │
│ IP Repute   │ 6h 12m   │ 1.2M/sync  │ Healthy (daily batch) │
└─────────────┴──────────┴────────────┴───────────────────────┘

Stage 2: Discovery

Kenji maps the transaction data landscape and profiles the distributions that drive fraud detection.

Transaction Data Profiling

Using the Data Quality Service, Kenji profiles the transactions table:

Column	Type	Completeness	Distribution	Notes
`amount`	DECIMAL	100%	Mean: $127, Median:$ 42, P99: $5,200	Heavy right skew
`merchant_category`	VARCHAR	100%	342 distinct categories	Top 5 account for 61% of volume
`channel`	VARCHAR	100%	POS: 42%, Online: 38%, ATM: 12%, Mobile: 8%	Online growing 15% YoY
`is_fraud`	BOOLEAN	100%	0.12% positive rate (fraud)	~12 frauds per 10K transactions
`timestamp`	TIMESTAMP	100%	Peak: 11am-2pm, Trough: 3am-5am	Weekend patterns differ
`device_id`	VARCHAR	84%	NULL for POS/ATM transactions	Online/mobile only
`ip_address`	VARCHAR	76%	89K distinct IPs	Masked by governance policy

Existing Feature Discovery

Kenji discovers that a previous team built velocity features that are already materialized:

-- Existing velocity features found in catalog
SELECT table_name, column_name, description, last_updated
FROM catalog.column_metadata
WHERE schema_name = 'fraud_features'
ORDER BY last_updated DESC;

Feature Table	Columns	Last Updated	Owner
`txn_velocity_1h`	`account_id`, `txn_count_1h`, `amount_sum_1h`, `unique_merchants_1h`	2026-02-28	fraud-ops
`txn_velocity_24h`	`account_id`, `txn_count_24h`, `amount_sum_24h`, `amount_max_24h`	2026-02-28	fraud-ops
`txn_velocity_7d`	`account_id`, `txn_count_7d`, `amount_sum_7d`, `unique_countries_7d`	2026-02-28	fraud-ops

Kenji will build on these existing features rather than duplicating them.

Stage 3: Query

Kenji constructs the real-time feature queries that feed the fraud scoring model. These must execute in under 10ms to meet the overall 50ms SLA.

Real-Time Feature Query

-- Real-time fraud feature computation
-- Executed per-transaction at scoring time
-- Target: < 10ms execution
 
SELECT
    t.txn_id,
    t.account_id,
    t.amount,
    t.merchant_category,
    t.channel,
    -- Velocity features (pre-computed, lookup only)
    v1.txn_count_1h,
    v1.amount_sum_1h,
    v1.unique_merchants_1h,
    v24.txn_count_24h,
    v24.amount_sum_24h,
    v24.amount_max_24h,
    v7.unique_countries_7d,
    -- Amount anomaly features (computed inline)
    t.amount / NULLIF(a.avg_txn_amount_90d, 0)
        AS amount_ratio_to_avg,
    CASE WHEN t.amount > a.p95_txn_amount_90d THEN 1 ELSE 0 END
        AS exceeds_p95,
    -- Merchant category pattern
    CASE WHEN mc.fraud_rate > 0.005 THEN 1 ELSE 0 END
        AS high_risk_merchant,
    mc.fraud_rate AS merchant_category_fraud_rate,
    -- Temporal features
    EXTRACT(HOUR FROM t.timestamp) AS txn_hour,
    CASE WHEN EXTRACT(DOW FROM t.timestamp) IN (0, 6) THEN 1 ELSE 0 END
        AS is_weekend,
    -- Device and location
    COALESCE(df.trust_score, 0.5) AS device_trust_score,
    COALESCE(ip.risk_score, 0.5) AS ip_risk_score,
    -- Account profile
    a.account_age_days,
    a.total_txn_count,
    a.has_fraud_history
FROM transactions t
JOIN accounts a ON t.account_id = a.account_id
LEFT JOIN fraud_features.txn_velocity_1h v1 ON t.account_id = v1.account_id
LEFT JOIN fraud_features.txn_velocity_24h v24 ON t.account_id = v24.account_id
LEFT JOIN fraud_features.txn_velocity_7d v7 ON t.account_id = v7.account_id
LEFT JOIN reference.merchant_category_risk mc ON t.merchant_category = mc.category
LEFT JOIN device_fingerprints df ON t.device_id = df.device_id
LEFT JOIN ip_reputation ip ON t.ip_hash = ip.ip_hash
WHERE t.txn_id = :current_txn_id;

Historical Pattern Analysis with DuckDB

For batch analysis of historical fraud patterns, Kenji uses DuckDB on S3-stored Parquet files:

-- Analyze fraud patterns by merchant category over time
-- Data source: S3/DuckDB (historical transaction archive)
 
SELECT
    merchant_category,
    DATE_TRUNC('month', timestamp) AS month,
    COUNT(*) AS total_txns,
    SUM(CASE WHEN is_fraud THEN 1 ELSE 0 END) AS fraud_count,
    ROUND(100.0 * SUM(CASE WHEN is_fraud THEN 1 ELSE 0 END)
        / COUNT(*), 4) AS fraud_rate_pct,
    AVG(CASE WHEN is_fraud THEN amount ELSE NULL END) AS avg_fraud_amount,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY
        CASE WHEN is_fraud THEN amount ELSE NULL END) AS median_fraud_amount
FROM read_parquet('s3://meridian-data-lake/transactions/year=*/month=*/*.parquet')
WHERE timestamp >= '2025-01-01'
GROUP BY merchant_category, DATE_TRUNC('month', timestamp)
HAVING fraud_count > 10
ORDER BY fraud_rate_pct DESC;

Stage 4: Orchestration

Kenji builds a dual pipeline architecture: streaming for real-time scoring and batch for model retraining.

Streaming Pipeline Architecture

┌──────────┐   ┌──────────────┐   ┌──────────────┐   ┌───────────┐   ┌──────────┐
│  Kafka   │──▶│   Feature    │──▶│    Model     │──▶│  Decision │──▶│  Alert   │
│  Topic   │   │  Computation │   │   Scoring    │   │  Engine   │   │  Router  │
│          │   │              │   │              │   │           │   │          │
│ 10K/min  │   │ Velocity     │   │ Ensemble     │   │ Score >   │   │ Queue /  │
│          │   │ lookups,     │   │ inference    │   │ 0.85 →    │   │ Block /  │
│          │   │ enrichment   │   │ (Ray Serve)  │   │ block     │   │ Allow    │
└──────────┘   └──────────────┘   └──────────────┘   └───────────┘   └──────────┘
    1ms             8ms                12ms               2ms             1ms
                                                                    Total: ~24ms

Batch Retraining Pipeline

{
  "pipeline": {
    "name": "fraud-model-retraining",
    "schedule": "0 2 * * 0",
    "owner": "kenji@meridian.bank",
    "tags": ["fraud", "ml-training", "weekly"],
    "stages": [
      {
        "name": "extract_training_data",
        "type": "sql_transform",
        "query": "SELECT * FROM fraud_features.training_dataset WHERE label_date >= CURRENT_DATE - INTERVAL '90 days'",
        "output": "ml_staging.fraud_training_latest"
      },
      {
        "name": "validate_labels",
        "type": "data_quality",
        "checks": [
          {
            "table": "ml_staging.fraud_training_latest",
            "expectation": "expect_column_values_to_be_in_set",
            "column": "is_fraud",
            "value_set": [true, false],
            "severity": "critical"
          },
          {
            "table": "ml_staging.fraud_training_latest",
            "expectation": "expect_table_row_count_to_be_between",
            "min_value": 1000000,
            "severity": "critical"
          },
          {
            "table": "ml_staging.fraud_training_latest",
            "expectation": "expect_column_proportion_of_unique_values_to_be_between",
            "column": "is_fraud",
            "min_value": 0.0005,
            "max_value": 0.01,
            "severity": "warning"
          }
        ],
        "on_failure": "halt_pipeline"
      },
      {
        "name": "train_model",
        "type": "ml_training",
        "experiment": "fraud-detection-weekly",
        "model_config": {
          "ensemble": [
            {"type": "xgboost", "weight": 0.6},
            {"type": "neural_network", "weight": 0.4}
          ]
        },
        "depends_on": ["validate_labels"]
      },
      {
        "name": "evaluate_model",
        "type": "ml_evaluation",
        "metrics": ["auc", "precision_at_1pct_fpr", "recall_at_2pct_fpr"],
        "promotion_criteria": {
          "auc": "> 0.95",
          "precision_at_1pct_fpr": "> 0.60"
        },
        "depends_on": ["train_model"]
      },
      {
        "name": "refresh_feature_store",
        "type": "feature_store_ingest",
        "source_table": "fraud_features.txn_velocity_1h",
        "feature_group": "fraud_velocity",
        "depends_on": ["evaluate_model"]
      }
    ]
  }
}

Exactly-Once Processing

Kenji configures exactly-once semantics to prevent duplicate scoring:

{
  "processing_guarantees": {
    "dedup_key": "txn_id",
    "dedup_window_minutes": 60,
    "checkpoint_interval_ms": 5000,
    "state_backend": "redis",
    "redis_key_prefix": "fraud:dedup:",
    "on_duplicate": "skip_and_log"
  }
}

Stage 5: Analysis

Kenji validates model inputs and fraud labels to ensure the detection system is working correctly.

Fraud Label Quality

-- Validate fraud labels against investigation outcomes
SELECT
    detection_method,
    COUNT(*) AS cases,
    SUM(CASE WHEN resolution = 'confirmed_fraud' THEN 1 ELSE 0 END) AS true_positives,
    SUM(CASE WHEN resolution = 'false_alarm' THEN 1 ELSE 0 END) AS false_positives,
    ROUND(100.0 * SUM(CASE WHEN resolution = 'confirmed_fraud' THEN 1 ELSE 0 END)
        / COUNT(*), 1) AS precision_pct
FROM fraud_cases
WHERE investigation_date >= '2025-07-01'
GROUP BY detection_method
ORDER BY cases DESC;

Detection Method	Cases	True Positives	False Positives	Precision
ML model v1	8,412	7,891	521	93.8%
Rules engine	3,204	2,115	1,089	66.0%
Customer report	2,847	2,741	106	96.3%
Manual review	1,102	987	115	89.6%

The rules engine has a 34% false positive rate -- one of the primary reasons for building the ML-based system.

Distribution Shift Analysis

After a new mobile payment channel launched in Q4 2025, Kenji checks for distribution shift:

Feature	Pre-Launch (Q3)	Post-Launch (Q4)	PSI	Status
`channel` distribution	POS:45%, Online:40%, ATM:15%	POS:42%, Online:38%, ATM:12%, Mobile:8%	0.082	Warning
`txn_hour` distribution	Peak 11am-2pm	Peak 11am-2pm + 8pm-10pm (mobile)	0.041	Acceptable
`amount` distribution	Mean $131, P99$ 5,100	Mean $127, P99$ 5,200	0.008	Stable
`device_trust_score`	Mean 0.72	Mean 0.61 (new devices)	0.094	Warning

The mobile channel introduces new device fingerprints with lower trust scores. Kenji flags this for the next model retrain.

Stage 6: Productionization

Ensemble Model Deployment

Kenji deploys an ensemble combining gradient boosting (interpretable, fast) with a neural network (captures complex patterns):

{
  "deployment": {
    "name": "fraud-detection-ensemble-v3",
    "serving_engine": "ray_serve",
    "mode": "real_time",
    "models": [
      {
        "name": "fraud-xgboost-v3",
        "weight": 0.6,
        "framework": "xgboost",
        "resources": {"num_cpus": 2, "memory_gb": 4}
      },
      {
        "name": "fraud-nn-v3",
        "weight": 0.4,
        "framework": "pytorch",
        "resources": {"num_cpus": 2, "memory_gb": 4}
      }
    ],
    "sla": {
      "p50_latency_ms": 15,
      "p99_latency_ms": 50,
      "availability": 0.9999,
      "throughput_per_second": 200
    },
    "scaling": {
      "min_replicas": 3,
      "max_replicas": 12,
      "target_ongoing_requests": 50,
      "scale_up_threshold_seconds": 5,
      "availability_zones": ["us-east-1a", "us-east-1b", "us-east-1c"]
    },
    "fallback": {
      "enabled": true,
      "type": "rules_engine",
      "trigger": "model_latency > 100ms OR model_error",
      "rules": [
        {"condition": "amount > 5000 AND channel = 'online'", "action": "flag"},
        {"condition": "txn_count_1h > 20", "action": "flag"},
        {"condition": "country != account_country", "action": "flag"}
      ]
    }
  }
}

Deployment Verification

After deployment, Kenji verifies the serving infrastructure:

Check	Expected	Actual	Status
Replicas running	3 (minimum)	3	Pass
P50 latency	< 15ms	11ms	Pass
P99 latency	< 50ms	38ms	Pass
Throughput capacity	> 200 req/s	847 req/s	Pass
Fallback rules loaded	3 rules	3 rules	Pass
Model version match	v3.0.0	v3.0.0	Pass

Stage 7: Feedback

Kenji builds a comprehensive real-time monitoring dashboard in the BI Workbench.

Real-Time Monitoring Dashboard

┌─────────────────────────────────────────────────────────────────────┐
│                 FRAUD DETECTION OPERATIONS                          │
│                 Last updated: 2026-02-28 14:32:17 UTC               │
├─────────────────────┬──────────────────────┬────────────────────────┤
│  Scoring Latency    │  Model Confidence    │  Fraud Rate (24h)      │
│                     │                      │                        │
│  P50:  11ms         │  Mean:   0.12        │  Detected:    142      │
│  P95:  28ms         │  Median: 0.04        │  Missed (est): 8       │
│  P99:  38ms         │  > 0.85:  0.09%      │  False Pos:    31      │
│  Max:  67ms         │  > 0.50:  0.41%      │  Catch Rate: 94.7%     │
├─────────────────────┼──────────────────────┼────────────────────────┤
│  Throughput         │  Fallback Events     │  Alerts (24h)          │
│                     │                      │                        │
│  Current: 168/sec   │  Today:      0       │  Latency:        0     │
│  Peak:    412/sec   │  This Week:  2       │  Confidence:     1     │
│  Capacity: 847/sec  │  Reason: timeout     │  Drift:          0     │
│  Util:    19.8%     │                      │  Fraud Spike:    0     │
└─────────────────────┴──────────────────────┴────────────────────────┘

Alert Configuration

{
  "alerts": [
    {
      "name": "scoring-latency-spike",
      "condition": "p99_latency_5min > 100",
      "severity": "critical",
      "channels": ["pagerduty", "slack:#fraud-ops"],
      "action": "Consider scaling replicas or activating fallback"
    },
    {
      "name": "model-confidence-drop",
      "condition": "avg_confidence_1h < 0.08 OR avg_confidence_1h > 0.25",
      "severity": "warning",
      "channels": ["slack:#fraud-ops"],
      "action": "Investigate input distribution shift"
    },
    {
      "name": "fraud-rate-anomaly",
      "condition": "fraud_rate_1h > 3 * fraud_rate_7d_avg",
      "severity": "critical",
      "channels": ["pagerduty", "slack:#fraud-ops", "email:fraud-team"],
      "action": "Potential fraud attack -- escalate to fraud investigation"
    },
    {
      "name": "feature-drift-detected",
      "condition": "any_feature_psi > 0.10",
      "severity": "warning",
      "channels": ["slack:#fraud-ops"],
      "action": "Schedule model retrain if sustained > 48h"
    }
  ]
}

Weekly Performance Report

Kenji generates automated weekly reports:

Metric	Target	Week 8	Week 9	Week 10	Trend
Fraud catch rate	> 95%	94.2%	94.7%	95.1%	Improving
False positive rate	< 2%	1.8%	1.7%	1.6%	Improving
P99 latency	< 50ms	42ms	38ms	39ms	Stable
Availability	> 99.99%	99.998%	100%	99.997%	Stable
Fallback activations	0	1	2	0	Acceptable
Dollar amount protected	--	$1.24M	$1.31M	$1.42M	Growing

Stage 8: Experimentation

Kenji tests new features and model architectures to continuously improve fraud detection.

Shadow Mode Testing

Before any production change, Kenji runs new models in shadow mode -- scoring every transaction without affecting decisions:

{
  "experiment": {
    "name": "device-fingerprint-feature-test",
    "type": "shadow",
    "production_model": "fraud-detection-ensemble-v3",
    "shadow_model": "fraud-detection-ensemble-v4-candidate",
    "shadow_features_added": [
      "device_fingerprint_match_score",
      "behavioral_biometric_score",
      "typing_pattern_anomaly"
    ],
    "duration_days": 14,
    "evaluation": {
      "metrics": ["auc", "precision_at_1pct_fpr", "recall_at_2pct_fpr"],
      "comparison_window": "daily"
    }
  }
}

Shadow Mode Results

Metric	Production (v3)	Shadow (v4 candidate)	Delta
AUC	0.961	0.974	+1.4%
Precision @ 1% FPR	62.3%	71.8%	+15.2%
Recall @ 2% FPR	89.1%	93.4%	+4.8%
Latency P99	38ms	44ms	+6ms
New fraud patterns caught	--	23 additional cases / week	--

The device fingerprinting features add significant value, catching 23 additional fraud cases per week that the current model misses, with acceptable latency impact.

A/B Test with Traffic Split

After shadow validation, Kenji runs a controlled A/B test:

┌─────────────────────────────────────────────────┐
│            Transaction Flow                      │
└────────────────────┬────────────────────────────┘
                     │
          ┌──────────┴──────────┐
          │   Traffic Splitter   │
          │   (by account_id     │
          │    hash, stable)     │
          └──────────┬──────────┘
               ┌─────┴─────┐
         ┌─────▼─────┐ ┌───▼─────────┐
         │ Control   │ │ Treatment   │
         │ (95%)     │ │ (5%)        │
         │           │ │             │
         │ v3        │ │ v4 cand.    │
         │ Ensemble  │ │ + device    │
         │           │ │   features  │
         └─────┬─────┘ └──────┬──────┘
               │              │
         ┌─────▼──────────────▼──────┐
         │  Both make real decisions  │
         │  Results tracked per group │
         └───────────────────────────┘

A/B Test Results (Day 7 of 14)

Metric	Control (v3, 95%)	Treatment (v4, 5%)	Statistical Significance
Fraud catch rate	94.8%	97.2%	p = 0.018 (significant)
False positive rate	1.7%	1.4%	p = 0.042 (significant)
Customer friction (blocks)	0.41% of txns	0.35% of txns	p = 0.11 (not yet)
Avg scoring latency	24ms	29ms	-- (within SLA)
Revenue protected (est.)	$186K/day	$14.2K/day (scaled:$ 198K)	--

Results show statistically significant improvements in both catch rate and false positive rate. Kenji prepares the promotion plan to gradually ramp v4 from 5% to 100% over the next two weeks.

Ensemble vs Single Model Comparison

In parallel, Kenji tests whether the neural network component adds value over XGBoost alone:

Configuration	AUC	P@1%FPR	Latency P99	Verdict
XGBoost only	0.954	58.1%	22ms	Fast but lower precision
Neural Net only	0.948	64.2%	41ms	Better precision, slower
Ensemble (current)	0.961	62.3%	38ms	Best AUC overall
Ensemble + device features (v4)	0.974	71.8%	44ms	Best across all metrics

The ensemble consistently outperforms single models, justifying the additional infrastructure cost.

Key Takeaways

Stage	Key Action	Platform Component
Ingestion	Kafka streaming + batch hybrid architecture	Airbyte connectors, Kafka connector
Discovery	Profiled transaction distributions, reused existing velocity features	Catalog Service, Data Quality Service
Query	Real-time feature queries (< 10ms) + DuckDB historical analysis	Query Engine, DuckDB on S3
Orchestration	Dual pipeline: streaming scoring + weekly batch retraining	Pipeline Service (Temporal)
Analysis	Validated fraud labels (66% precision on rules engine), detected distribution shift	Data Quality Service
Productionization	Ensemble deployment to Ray Serve, 3-AZ, fallback rules engine	Ray Serve, model serving
Feedback	Real-time ops dashboard, latency/drift/fraud-rate alerts	BI Workbench, alerting
Experimentation	Shadow mode then 5% A/B test for new device fingerprint features	Experiment framework

Related Walkthroughs

Data Scientist Journey: Credit Risk Scoring -- Amir builds models on the same transaction data
BI Lead Journey: Regulatory Reporting -- Rachel reports on fraud metrics to regulators
Executive Journey: Strategic Risk Analytics -- Elena monitors fraud KPIs at portfolio level
Financial Services Overview -- Industry context and sample datasets

Data Scientist Journey BI Lead Journey