ML Engineer Journey: Real-Time Fraud Detection System
Persona: Kenji, ML Engineer at Meridian Bank (Fraud Operations team, 4 years experience)
Objective: Build and operate a real-time fraud detection pipeline processing 10K transactions per minute with sub-50ms scoring latency
Timeline: Continuous operation with monthly model refresh cycles
Datasets: transactions (50M), accounts (500K), fraud_cases (15K), payment_messages (12M)
Stage 1: Ingestion
Kenji's fraud detection system requires a hybrid ingestion strategy: real-time streaming for transaction scoring and batch ingestion for model retraining.
Streaming Ingestion from Payment Gateway
The payment gateway publishes every transaction to a Kafka topic. Kenji configures the Airbyte Kafka connector for real-time ingestion:
{
"source": {
"type": "kafka",
"name": "payment-gateway-stream",
"connection": {
"bootstrap_servers": "kafka.meridian.internal:9092",
"topic": "payments.transactions.authorized",
"consumer_group": "matih-fraud-ingestion",
"format": "avro",
"schema_registry": "http://schema-registry.meridian.internal:8081"
},
"sync_mode": "streaming",
"processing_guarantees": "exactly_once",
"batch_size": 1000,
"poll_interval_ms": 100
}
}Batch Sources for Model Training
| Source | Connector | Sync Mode | Schedule | Purpose |
|---|---|---|---|---|
| Core Banking PostgreSQL | Airbyte PostgreSQL | CDC incremental | Every 15 min | Account profiles, balances |
| Fraud Case Management | Airbyte PostgreSQL | Incremental | Hourly | Confirmed fraud labels |
| IP Reputation API | Airbyte REST API | Full refresh | Daily | IP risk scores, geolocation |
| Device Fingerprint DB | Airbyte MongoDB | Incremental | Every 15 min | Device trust scores |
Ingestion Monitoring
Kenji monitors ingestion lag to ensure real-time freshness:
┌─────────────────────────────────────────────────────────────┐
│ Transaction Ingestion Health │
├─────────────┬──────────┬────────────┬───────────────────────┤
│ Source │ Lag │ Throughput │ Status │
├─────────────┼──────────┼────────────┼───────────────────────┤
│ Kafka │ 1.2s │ 10,247/min │ Healthy │
│ Core Banking│ 8m 14s │ 2,100/sync │ Healthy │
│ Fraud Cases │ 22m │ 48/sync │ Healthy │
│ IP Repute │ 6h 12m │ 1.2M/sync │ Healthy (daily batch) │
└─────────────┴──────────┴────────────┴───────────────────────┘Stage 2: Discovery
Kenji maps the transaction data landscape and profiles the distributions that drive fraud detection.
Transaction Data Profiling
Using the Data Quality Service, Kenji profiles the transactions table:
| Column | Type | Completeness | Distribution | Notes |
|---|---|---|---|---|
amount | DECIMAL | 100% | Mean: 42, P99: $5,200 | Heavy right skew |
merchant_category | VARCHAR | 100% | 342 distinct categories | Top 5 account for 61% of volume |
channel | VARCHAR | 100% | POS: 42%, Online: 38%, ATM: 12%, Mobile: 8% | Online growing 15% YoY |
is_fraud | BOOLEAN | 100% | 0.12% positive rate (fraud) | ~12 frauds per 10K transactions |
timestamp | TIMESTAMP | 100% | Peak: 11am-2pm, Trough: 3am-5am | Weekend patterns differ |
device_id | VARCHAR | 84% | NULL for POS/ATM transactions | Online/mobile only |
ip_address | VARCHAR | 76% | 89K distinct IPs | Masked by governance policy |
Existing Feature Discovery
Kenji discovers that a previous team built velocity features that are already materialized:
-- Existing velocity features found in catalog
SELECT table_name, column_name, description, last_updated
FROM catalog.column_metadata
WHERE schema_name = 'fraud_features'
ORDER BY last_updated DESC;| Feature Table | Columns | Last Updated | Owner |
|---|---|---|---|
txn_velocity_1h | account_id, txn_count_1h, amount_sum_1h, unique_merchants_1h | 2026-02-28 | fraud-ops |
txn_velocity_24h | account_id, txn_count_24h, amount_sum_24h, amount_max_24h | 2026-02-28 | fraud-ops |
txn_velocity_7d | account_id, txn_count_7d, amount_sum_7d, unique_countries_7d | 2026-02-28 | fraud-ops |
Kenji will build on these existing features rather than duplicating them.
Stage 3: Query
Kenji constructs the real-time feature queries that feed the fraud scoring model. These must execute in under 10ms to meet the overall 50ms SLA.
Real-Time Feature Query
-- Real-time fraud feature computation
-- Executed per-transaction at scoring time
-- Target: < 10ms execution
SELECT
t.txn_id,
t.account_id,
t.amount,
t.merchant_category,
t.channel,
-- Velocity features (pre-computed, lookup only)
v1.txn_count_1h,
v1.amount_sum_1h,
v1.unique_merchants_1h,
v24.txn_count_24h,
v24.amount_sum_24h,
v24.amount_max_24h,
v7.unique_countries_7d,
-- Amount anomaly features (computed inline)
t.amount / NULLIF(a.avg_txn_amount_90d, 0)
AS amount_ratio_to_avg,
CASE WHEN t.amount > a.p95_txn_amount_90d THEN 1 ELSE 0 END
AS exceeds_p95,
-- Merchant category pattern
CASE WHEN mc.fraud_rate > 0.005 THEN 1 ELSE 0 END
AS high_risk_merchant,
mc.fraud_rate AS merchant_category_fraud_rate,
-- Temporal features
EXTRACT(HOUR FROM t.timestamp) AS txn_hour,
CASE WHEN EXTRACT(DOW FROM t.timestamp) IN (0, 6) THEN 1 ELSE 0 END
AS is_weekend,
-- Device and location
COALESCE(df.trust_score, 0.5) AS device_trust_score,
COALESCE(ip.risk_score, 0.5) AS ip_risk_score,
-- Account profile
a.account_age_days,
a.total_txn_count,
a.has_fraud_history
FROM transactions t
JOIN accounts a ON t.account_id = a.account_id
LEFT JOIN fraud_features.txn_velocity_1h v1 ON t.account_id = v1.account_id
LEFT JOIN fraud_features.txn_velocity_24h v24 ON t.account_id = v24.account_id
LEFT JOIN fraud_features.txn_velocity_7d v7 ON t.account_id = v7.account_id
LEFT JOIN reference.merchant_category_risk mc ON t.merchant_category = mc.category
LEFT JOIN device_fingerprints df ON t.device_id = df.device_id
LEFT JOIN ip_reputation ip ON t.ip_hash = ip.ip_hash
WHERE t.txn_id = :current_txn_id;Historical Pattern Analysis with DuckDB
For batch analysis of historical fraud patterns, Kenji uses DuckDB on S3-stored Parquet files:
-- Analyze fraud patterns by merchant category over time
-- Data source: S3/DuckDB (historical transaction archive)
SELECT
merchant_category,
DATE_TRUNC('month', timestamp) AS month,
COUNT(*) AS total_txns,
SUM(CASE WHEN is_fraud THEN 1 ELSE 0 END) AS fraud_count,
ROUND(100.0 * SUM(CASE WHEN is_fraud THEN 1 ELSE 0 END)
/ COUNT(*), 4) AS fraud_rate_pct,
AVG(CASE WHEN is_fraud THEN amount ELSE NULL END) AS avg_fraud_amount,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY
CASE WHEN is_fraud THEN amount ELSE NULL END) AS median_fraud_amount
FROM read_parquet('s3://meridian-data-lake/transactions/year=*/month=*/*.parquet')
WHERE timestamp >= '2025-01-01'
GROUP BY merchant_category, DATE_TRUNC('month', timestamp)
HAVING fraud_count > 10
ORDER BY fraud_rate_pct DESC;Stage 4: Orchestration
Kenji builds a dual pipeline architecture: streaming for real-time scoring and batch for model retraining.
Streaming Pipeline Architecture
┌──────────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ ┌──────────┐
│ Kafka │──▶│ Feature │──▶│ Model │──▶│ Decision │──▶│ Alert │
│ Topic │ │ Computation │ │ Scoring │ │ Engine │ │ Router │
│ │ │ │ │ │ │ │ │ │
│ 10K/min │ │ Velocity │ │ Ensemble │ │ Score > │ │ Queue / │
│ │ │ lookups, │ │ inference │ │ 0.85 → │ │ Block / │
│ │ │ enrichment │ │ (Ray Serve) │ │ block │ │ Allow │
└──────────┘ └──────────────┘ └──────────────┘ └───────────┘ └──────────┘
1ms 8ms 12ms 2ms 1ms
Total: ~24msBatch Retraining Pipeline
{
"pipeline": {
"name": "fraud-model-retraining",
"schedule": "0 2 * * 0",
"owner": "kenji@meridian.bank",
"tags": ["fraud", "ml-training", "weekly"],
"stages": [
{
"name": "extract_training_data",
"type": "sql_transform",
"query": "SELECT * FROM fraud_features.training_dataset WHERE label_date >= CURRENT_DATE - INTERVAL '90 days'",
"output": "ml_staging.fraud_training_latest"
},
{
"name": "validate_labels",
"type": "data_quality",
"checks": [
{
"table": "ml_staging.fraud_training_latest",
"expectation": "expect_column_values_to_be_in_set",
"column": "is_fraud",
"value_set": [true, false],
"severity": "critical"
},
{
"table": "ml_staging.fraud_training_latest",
"expectation": "expect_table_row_count_to_be_between",
"min_value": 1000000,
"severity": "critical"
},
{
"table": "ml_staging.fraud_training_latest",
"expectation": "expect_column_proportion_of_unique_values_to_be_between",
"column": "is_fraud",
"min_value": 0.0005,
"max_value": 0.01,
"severity": "warning"
}
],
"on_failure": "halt_pipeline"
},
{
"name": "train_model",
"type": "ml_training",
"experiment": "fraud-detection-weekly",
"model_config": {
"ensemble": [
{"type": "xgboost", "weight": 0.6},
{"type": "neural_network", "weight": 0.4}
]
},
"depends_on": ["validate_labels"]
},
{
"name": "evaluate_model",
"type": "ml_evaluation",
"metrics": ["auc", "precision_at_1pct_fpr", "recall_at_2pct_fpr"],
"promotion_criteria": {
"auc": "> 0.95",
"precision_at_1pct_fpr": "> 0.60"
},
"depends_on": ["train_model"]
},
{
"name": "refresh_feature_store",
"type": "feature_store_ingest",
"source_table": "fraud_features.txn_velocity_1h",
"feature_group": "fraud_velocity",
"depends_on": ["evaluate_model"]
}
]
}
}Exactly-Once Processing
Kenji configures exactly-once semantics to prevent duplicate scoring:
{
"processing_guarantees": {
"dedup_key": "txn_id",
"dedup_window_minutes": 60,
"checkpoint_interval_ms": 5000,
"state_backend": "redis",
"redis_key_prefix": "fraud:dedup:",
"on_duplicate": "skip_and_log"
}
}Stage 5: Analysis
Kenji validates model inputs and fraud labels to ensure the detection system is working correctly.
Fraud Label Quality
-- Validate fraud labels against investigation outcomes
SELECT
detection_method,
COUNT(*) AS cases,
SUM(CASE WHEN resolution = 'confirmed_fraud' THEN 1 ELSE 0 END) AS true_positives,
SUM(CASE WHEN resolution = 'false_alarm' THEN 1 ELSE 0 END) AS false_positives,
ROUND(100.0 * SUM(CASE WHEN resolution = 'confirmed_fraud' THEN 1 ELSE 0 END)
/ COUNT(*), 1) AS precision_pct
FROM fraud_cases
WHERE investigation_date >= '2025-07-01'
GROUP BY detection_method
ORDER BY cases DESC;| Detection Method | Cases | True Positives | False Positives | Precision |
|---|---|---|---|---|
| ML model v1 | 8,412 | 7,891 | 521 | 93.8% |
| Rules engine | 3,204 | 2,115 | 1,089 | 66.0% |
| Customer report | 2,847 | 2,741 | 106 | 96.3% |
| Manual review | 1,102 | 987 | 115 | 89.6% |
The rules engine has a 34% false positive rate -- one of the primary reasons for building the ML-based system.
Distribution Shift Analysis
After a new mobile payment channel launched in Q4 2025, Kenji checks for distribution shift:
| Feature | Pre-Launch (Q3) | Post-Launch (Q4) | PSI | Status |
|---|---|---|---|---|
channel distribution | POS:45%, Online:40%, ATM:15% | POS:42%, Online:38%, ATM:12%, Mobile:8% | 0.082 | Warning |
txn_hour distribution | Peak 11am-2pm | Peak 11am-2pm + 8pm-10pm (mobile) | 0.041 | Acceptable |
amount distribution | Mean 5,100 | Mean 5,200 | 0.008 | Stable |
device_trust_score | Mean 0.72 | Mean 0.61 (new devices) | 0.094 | Warning |
The mobile channel introduces new device fingerprints with lower trust scores. Kenji flags this for the next model retrain.
Stage 6: Productionization
Ensemble Model Deployment
Kenji deploys an ensemble combining gradient boosting (interpretable, fast) with a neural network (captures complex patterns):
{
"deployment": {
"name": "fraud-detection-ensemble-v3",
"serving_engine": "ray_serve",
"mode": "real_time",
"models": [
{
"name": "fraud-xgboost-v3",
"weight": 0.6,
"framework": "xgboost",
"resources": {"num_cpus": 2, "memory_gb": 4}
},
{
"name": "fraud-nn-v3",
"weight": 0.4,
"framework": "pytorch",
"resources": {"num_cpus": 2, "memory_gb": 4}
}
],
"sla": {
"p50_latency_ms": 15,
"p99_latency_ms": 50,
"availability": 0.9999,
"throughput_per_second": 200
},
"scaling": {
"min_replicas": 3,
"max_replicas": 12,
"target_ongoing_requests": 50,
"scale_up_threshold_seconds": 5,
"availability_zones": ["us-east-1a", "us-east-1b", "us-east-1c"]
},
"fallback": {
"enabled": true,
"type": "rules_engine",
"trigger": "model_latency > 100ms OR model_error",
"rules": [
{"condition": "amount > 5000 AND channel = 'online'", "action": "flag"},
{"condition": "txn_count_1h > 20", "action": "flag"},
{"condition": "country != account_country", "action": "flag"}
]
}
}
}Deployment Verification
After deployment, Kenji verifies the serving infrastructure:
| Check | Expected | Actual | Status |
|---|---|---|---|
| Replicas running | 3 (minimum) | 3 | Pass |
| P50 latency | < 15ms | 11ms | Pass |
| P99 latency | < 50ms | 38ms | Pass |
| Throughput capacity | > 200 req/s | 847 req/s | Pass |
| Fallback rules loaded | 3 rules | 3 rules | Pass |
| Model version match | v3.0.0 | v3.0.0 | Pass |
Stage 7: Feedback
Kenji builds a comprehensive real-time monitoring dashboard in the BI Workbench.
Real-Time Monitoring Dashboard
┌─────────────────────────────────────────────────────────────────────┐
│ FRAUD DETECTION OPERATIONS │
│ Last updated: 2026-02-28 14:32:17 UTC │
├─────────────────────┬──────────────────────┬────────────────────────┤
│ Scoring Latency │ Model Confidence │ Fraud Rate (24h) │
│ │ │ │
│ P50: 11ms │ Mean: 0.12 │ Detected: 142 │
│ P95: 28ms │ Median: 0.04 │ Missed (est): 8 │
│ P99: 38ms │ > 0.85: 0.09% │ False Pos: 31 │
│ Max: 67ms │ > 0.50: 0.41% │ Catch Rate: 94.7% │
├─────────────────────┼──────────────────────┼────────────────────────┤
│ Throughput │ Fallback Events │ Alerts (24h) │
│ │ │ │
│ Current: 168/sec │ Today: 0 │ Latency: 0 │
│ Peak: 412/sec │ This Week: 2 │ Confidence: 1 │
│ Capacity: 847/sec │ Reason: timeout │ Drift: 0 │
│ Util: 19.8% │ │ Fraud Spike: 0 │
└─────────────────────┴──────────────────────┴────────────────────────┘Alert Configuration
{
"alerts": [
{
"name": "scoring-latency-spike",
"condition": "p99_latency_5min > 100",
"severity": "critical",
"channels": ["pagerduty", "slack:#fraud-ops"],
"action": "Consider scaling replicas or activating fallback"
},
{
"name": "model-confidence-drop",
"condition": "avg_confidence_1h < 0.08 OR avg_confidence_1h > 0.25",
"severity": "warning",
"channels": ["slack:#fraud-ops"],
"action": "Investigate input distribution shift"
},
{
"name": "fraud-rate-anomaly",
"condition": "fraud_rate_1h > 3 * fraud_rate_7d_avg",
"severity": "critical",
"channels": ["pagerduty", "slack:#fraud-ops", "email:fraud-team"],
"action": "Potential fraud attack -- escalate to fraud investigation"
},
{
"name": "feature-drift-detected",
"condition": "any_feature_psi > 0.10",
"severity": "warning",
"channels": ["slack:#fraud-ops"],
"action": "Schedule model retrain if sustained > 48h"
}
]
}Weekly Performance Report
Kenji generates automated weekly reports:
| Metric | Target | Week 8 | Week 9 | Week 10 | Trend |
|---|---|---|---|---|---|
| Fraud catch rate | > 95% | 94.2% | 94.7% | 95.1% | Improving |
| False positive rate | < 2% | 1.8% | 1.7% | 1.6% | Improving |
| P99 latency | < 50ms | 42ms | 38ms | 39ms | Stable |
| Availability | > 99.99% | 99.998% | 100% | 99.997% | Stable |
| Fallback activations | 0 | 1 | 2 | 0 | Acceptable |
| Dollar amount protected | -- | $1.24M | $1.31M | $1.42M | Growing |
Stage 8: Experimentation
Kenji tests new features and model architectures to continuously improve fraud detection.
Shadow Mode Testing
Before any production change, Kenji runs new models in shadow mode -- scoring every transaction without affecting decisions:
{
"experiment": {
"name": "device-fingerprint-feature-test",
"type": "shadow",
"production_model": "fraud-detection-ensemble-v3",
"shadow_model": "fraud-detection-ensemble-v4-candidate",
"shadow_features_added": [
"device_fingerprint_match_score",
"behavioral_biometric_score",
"typing_pattern_anomaly"
],
"duration_days": 14,
"evaluation": {
"metrics": ["auc", "precision_at_1pct_fpr", "recall_at_2pct_fpr"],
"comparison_window": "daily"
}
}
}Shadow Mode Results
| Metric | Production (v3) | Shadow (v4 candidate) | Delta |
|---|---|---|---|
| AUC | 0.961 | 0.974 | +1.4% |
| Precision @ 1% FPR | 62.3% | 71.8% | +15.2% |
| Recall @ 2% FPR | 89.1% | 93.4% | +4.8% |
| Latency P99 | 38ms | 44ms | +6ms |
| New fraud patterns caught | -- | 23 additional cases / week | -- |
The device fingerprinting features add significant value, catching 23 additional fraud cases per week that the current model misses, with acceptable latency impact.
A/B Test with Traffic Split
After shadow validation, Kenji runs a controlled A/B test:
┌─────────────────────────────────────────────────┐
│ Transaction Flow │
└────────────────────┬────────────────────────────┘
│
┌──────────┴──────────┐
│ Traffic Splitter │
│ (by account_id │
│ hash, stable) │
└──────────┬──────────┘
┌─────┴─────┐
┌─────▼─────┐ ┌───▼─────────┐
│ Control │ │ Treatment │
│ (95%) │ │ (5%) │
│ │ │ │
│ v3 │ │ v4 cand. │
│ Ensemble │ │ + device │
│ │ │ features │
└─────┬─────┘ └──────┬──────┘
│ │
┌─────▼──────────────▼──────┐
│ Both make real decisions │
│ Results tracked per group │
└───────────────────────────┘A/B Test Results (Day 7 of 14)
| Metric | Control (v3, 95%) | Treatment (v4, 5%) | Statistical Significance |
|---|---|---|---|
| Fraud catch rate | 94.8% | 97.2% | p = 0.018 (significant) |
| False positive rate | 1.7% | 1.4% | p = 0.042 (significant) |
| Customer friction (blocks) | 0.41% of txns | 0.35% of txns | p = 0.11 (not yet) |
| Avg scoring latency | 24ms | 29ms | -- (within SLA) |
| Revenue protected (est.) | $186K/day | 198K) | -- |
Results show statistically significant improvements in both catch rate and false positive rate. Kenji prepares the promotion plan to gradually ramp v4 from 5% to 100% over the next two weeks.
Ensemble vs Single Model Comparison
In parallel, Kenji tests whether the neural network component adds value over XGBoost alone:
| Configuration | AUC | P@1%FPR | Latency P99 | Verdict |
|---|---|---|---|---|
| XGBoost only | 0.954 | 58.1% | 22ms | Fast but lower precision |
| Neural Net only | 0.948 | 64.2% | 41ms | Better precision, slower |
| Ensemble (current) | 0.961 | 62.3% | 38ms | Best AUC overall |
| Ensemble + device features (v4) | 0.974 | 71.8% | 44ms | Best across all metrics |
The ensemble consistently outperforms single models, justifying the additional infrastructure cost.
Key Takeaways
| Stage | Key Action | Platform Component |
|---|---|---|
| Ingestion | Kafka streaming + batch hybrid architecture | Airbyte connectors, Kafka connector |
| Discovery | Profiled transaction distributions, reused existing velocity features | Catalog Service, Data Quality Service |
| Query | Real-time feature queries (< 10ms) + DuckDB historical analysis | Query Engine, DuckDB on S3 |
| Orchestration | Dual pipeline: streaming scoring + weekly batch retraining | Pipeline Service (Temporal) |
| Analysis | Validated fraud labels (66% precision on rules engine), detected distribution shift | Data Quality Service |
| Productionization | Ensemble deployment to Ray Serve, 3-AZ, fallback rules engine | Ray Serve, model serving |
| Feedback | Real-time ops dashboard, latency/drift/fraud-rate alerts | BI Workbench, alerting |
| Experimentation | Shadow mode then 5% A/B test for new device fingerprint features | Experiment framework |
Related Walkthroughs
- Data Scientist Journey: Credit Risk Scoring -- Amir builds models on the same transaction data
- BI Lead Journey: Regulatory Reporting -- Rachel reports on fraud metrics to regulators
- Executive Journey: Strategic Risk Analytics -- Elena monitors fraud KPIs at portfolio level
- Financial Services Overview -- Industry context and sample datasets