MATIH Platform is in active MVP development. Documentation reflects current implementation status.
21. Industry Examples & Walkthroughs
Financial Services
ML Engineer Journey

ML Engineer Journey: Real-Time Fraud Detection System

Persona: Kenji, ML Engineer at Meridian Bank (Fraud Operations team, 4 years experience) Objective: Build and operate a real-time fraud detection pipeline processing 10K transactions per minute with sub-50ms scoring latency Timeline: Continuous operation with monthly model refresh cycles Datasets: transactions (50M), accounts (500K), fraud_cases (15K), payment_messages (12M)


Stage 1: Ingestion

Kenji's fraud detection system requires a hybrid ingestion strategy: real-time streaming for transaction scoring and batch ingestion for model retraining.

Streaming Ingestion from Payment Gateway

The payment gateway publishes every transaction to a Kafka topic. Kenji configures the Airbyte Kafka connector for real-time ingestion:

{
  "source": {
    "type": "kafka",
    "name": "payment-gateway-stream",
    "connection": {
      "bootstrap_servers": "kafka.meridian.internal:9092",
      "topic": "payments.transactions.authorized",
      "consumer_group": "matih-fraud-ingestion",
      "format": "avro",
      "schema_registry": "http://schema-registry.meridian.internal:8081"
    },
    "sync_mode": "streaming",
    "processing_guarantees": "exactly_once",
    "batch_size": 1000,
    "poll_interval_ms": 100
  }
}

Batch Sources for Model Training

SourceConnectorSync ModeSchedulePurpose
Core Banking PostgreSQLAirbyte PostgreSQLCDC incrementalEvery 15 minAccount profiles, balances
Fraud Case ManagementAirbyte PostgreSQLIncrementalHourlyConfirmed fraud labels
IP Reputation APIAirbyte REST APIFull refreshDailyIP risk scores, geolocation
Device Fingerprint DBAirbyte MongoDBIncrementalEvery 15 minDevice trust scores

Ingestion Monitoring

Kenji monitors ingestion lag to ensure real-time freshness:

┌─────────────────────────────────────────────────────────────┐
│           Transaction Ingestion Health                       │
├─────────────┬──────────┬────────────┬───────────────────────┤
│ Source      │ Lag      │ Throughput │ Status                │
├─────────────┼──────────┼────────────┼───────────────────────┤
│ Kafka       │ 1.2s     │ 10,247/min │ Healthy               │
│ Core Banking│ 8m 14s   │ 2,100/sync │ Healthy               │
│ Fraud Cases │ 22m      │ 48/sync    │ Healthy               │
│ IP Repute   │ 6h 12m   │ 1.2M/sync  │ Healthy (daily batch) │
└─────────────┴──────────┴────────────┴───────────────────────┘

Stage 2: Discovery

Kenji maps the transaction data landscape and profiles the distributions that drive fraud detection.

Transaction Data Profiling

Using the Data Quality Service, Kenji profiles the transactions table:

ColumnTypeCompletenessDistributionNotes
amountDECIMAL100%Mean: 127,Median:127, Median: 42, P99: $5,200Heavy right skew
merchant_categoryVARCHAR100%342 distinct categoriesTop 5 account for 61% of volume
channelVARCHAR100%POS: 42%, Online: 38%, ATM: 12%, Mobile: 8%Online growing 15% YoY
is_fraudBOOLEAN100%0.12% positive rate (fraud)~12 frauds per 10K transactions
timestampTIMESTAMP100%Peak: 11am-2pm, Trough: 3am-5amWeekend patterns differ
device_idVARCHAR84%NULL for POS/ATM transactionsOnline/mobile only
ip_addressVARCHAR76%89K distinct IPsMasked by governance policy

Existing Feature Discovery

Kenji discovers that a previous team built velocity features that are already materialized:

-- Existing velocity features found in catalog
SELECT table_name, column_name, description, last_updated
FROM catalog.column_metadata
WHERE schema_name = 'fraud_features'
ORDER BY last_updated DESC;
Feature TableColumnsLast UpdatedOwner
txn_velocity_1haccount_id, txn_count_1h, amount_sum_1h, unique_merchants_1h2026-02-28fraud-ops
txn_velocity_24haccount_id, txn_count_24h, amount_sum_24h, amount_max_24h2026-02-28fraud-ops
txn_velocity_7daccount_id, txn_count_7d, amount_sum_7d, unique_countries_7d2026-02-28fraud-ops

Kenji will build on these existing features rather than duplicating them.


Stage 3: Query

Kenji constructs the real-time feature queries that feed the fraud scoring model. These must execute in under 10ms to meet the overall 50ms SLA.

Real-Time Feature Query

-- Real-time fraud feature computation
-- Executed per-transaction at scoring time
-- Target: < 10ms execution
 
SELECT
    t.txn_id,
    t.account_id,
    t.amount,
    t.merchant_category,
    t.channel,
    -- Velocity features (pre-computed, lookup only)
    v1.txn_count_1h,
    v1.amount_sum_1h,
    v1.unique_merchants_1h,
    v24.txn_count_24h,
    v24.amount_sum_24h,
    v24.amount_max_24h,
    v7.unique_countries_7d,
    -- Amount anomaly features (computed inline)
    t.amount / NULLIF(a.avg_txn_amount_90d, 0)
        AS amount_ratio_to_avg,
    CASE WHEN t.amount > a.p95_txn_amount_90d THEN 1 ELSE 0 END
        AS exceeds_p95,
    -- Merchant category pattern
    CASE WHEN mc.fraud_rate > 0.005 THEN 1 ELSE 0 END
        AS high_risk_merchant,
    mc.fraud_rate AS merchant_category_fraud_rate,
    -- Temporal features
    EXTRACT(HOUR FROM t.timestamp) AS txn_hour,
    CASE WHEN EXTRACT(DOW FROM t.timestamp) IN (0, 6) THEN 1 ELSE 0 END
        AS is_weekend,
    -- Device and location
    COALESCE(df.trust_score, 0.5) AS device_trust_score,
    COALESCE(ip.risk_score, 0.5) AS ip_risk_score,
    -- Account profile
    a.account_age_days,
    a.total_txn_count,
    a.has_fraud_history
FROM transactions t
JOIN accounts a ON t.account_id = a.account_id
LEFT JOIN fraud_features.txn_velocity_1h v1 ON t.account_id = v1.account_id
LEFT JOIN fraud_features.txn_velocity_24h v24 ON t.account_id = v24.account_id
LEFT JOIN fraud_features.txn_velocity_7d v7 ON t.account_id = v7.account_id
LEFT JOIN reference.merchant_category_risk mc ON t.merchant_category = mc.category
LEFT JOIN device_fingerprints df ON t.device_id = df.device_id
LEFT JOIN ip_reputation ip ON t.ip_hash = ip.ip_hash
WHERE t.txn_id = :current_txn_id;

Historical Pattern Analysis with DuckDB

For batch analysis of historical fraud patterns, Kenji uses DuckDB on S3-stored Parquet files:

-- Analyze fraud patterns by merchant category over time
-- Data source: S3/DuckDB (historical transaction archive)
 
SELECT
    merchant_category,
    DATE_TRUNC('month', timestamp) AS month,
    COUNT(*) AS total_txns,
    SUM(CASE WHEN is_fraud THEN 1 ELSE 0 END) AS fraud_count,
    ROUND(100.0 * SUM(CASE WHEN is_fraud THEN 1 ELSE 0 END)
        / COUNT(*), 4) AS fraud_rate_pct,
    AVG(CASE WHEN is_fraud THEN amount ELSE NULL END) AS avg_fraud_amount,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY
        CASE WHEN is_fraud THEN amount ELSE NULL END) AS median_fraud_amount
FROM read_parquet('s3://meridian-data-lake/transactions/year=*/month=*/*.parquet')
WHERE timestamp >= '2025-01-01'
GROUP BY merchant_category, DATE_TRUNC('month', timestamp)
HAVING fraud_count > 10
ORDER BY fraud_rate_pct DESC;

Stage 4: Orchestration

Kenji builds a dual pipeline architecture: streaming for real-time scoring and batch for model retraining.

Streaming Pipeline Architecture

┌──────────┐   ┌──────────────┐   ┌──────────────┐   ┌───────────┐   ┌──────────┐
│  Kafka   │──▶│   Feature    │──▶│    Model     │──▶│  Decision │──▶│  Alert   │
│  Topic   │   │  Computation │   │   Scoring    │   │  Engine   │   │  Router  │
│          │   │              │   │              │   │           │   │          │
│ 10K/min  │   │ Velocity     │   │ Ensemble     │   │ Score >   │   │ Queue /  │
│          │   │ lookups,     │   │ inference    │   │ 0.85 →    │   │ Block /  │
│          │   │ enrichment   │   │ (Ray Serve)  │   │ block     │   │ Allow    │
└──────────┘   └──────────────┘   └──────────────┘   └───────────┘   └──────────┘
    1ms             8ms                12ms               2ms             1ms
                                                                    Total: ~24ms

Batch Retraining Pipeline

{
  "pipeline": {
    "name": "fraud-model-retraining",
    "schedule": "0 2 * * 0",
    "owner": "kenji@meridian.bank",
    "tags": ["fraud", "ml-training", "weekly"],
    "stages": [
      {
        "name": "extract_training_data",
        "type": "sql_transform",
        "query": "SELECT * FROM fraud_features.training_dataset WHERE label_date >= CURRENT_DATE - INTERVAL '90 days'",
        "output": "ml_staging.fraud_training_latest"
      },
      {
        "name": "validate_labels",
        "type": "data_quality",
        "checks": [
          {
            "table": "ml_staging.fraud_training_latest",
            "expectation": "expect_column_values_to_be_in_set",
            "column": "is_fraud",
            "value_set": [true, false],
            "severity": "critical"
          },
          {
            "table": "ml_staging.fraud_training_latest",
            "expectation": "expect_table_row_count_to_be_between",
            "min_value": 1000000,
            "severity": "critical"
          },
          {
            "table": "ml_staging.fraud_training_latest",
            "expectation": "expect_column_proportion_of_unique_values_to_be_between",
            "column": "is_fraud",
            "min_value": 0.0005,
            "max_value": 0.01,
            "severity": "warning"
          }
        ],
        "on_failure": "halt_pipeline"
      },
      {
        "name": "train_model",
        "type": "ml_training",
        "experiment": "fraud-detection-weekly",
        "model_config": {
          "ensemble": [
            {"type": "xgboost", "weight": 0.6},
            {"type": "neural_network", "weight": 0.4}
          ]
        },
        "depends_on": ["validate_labels"]
      },
      {
        "name": "evaluate_model",
        "type": "ml_evaluation",
        "metrics": ["auc", "precision_at_1pct_fpr", "recall_at_2pct_fpr"],
        "promotion_criteria": {
          "auc": "> 0.95",
          "precision_at_1pct_fpr": "> 0.60"
        },
        "depends_on": ["train_model"]
      },
      {
        "name": "refresh_feature_store",
        "type": "feature_store_ingest",
        "source_table": "fraud_features.txn_velocity_1h",
        "feature_group": "fraud_velocity",
        "depends_on": ["evaluate_model"]
      }
    ]
  }
}

Exactly-Once Processing

Kenji configures exactly-once semantics to prevent duplicate scoring:

{
  "processing_guarantees": {
    "dedup_key": "txn_id",
    "dedup_window_minutes": 60,
    "checkpoint_interval_ms": 5000,
    "state_backend": "redis",
    "redis_key_prefix": "fraud:dedup:",
    "on_duplicate": "skip_and_log"
  }
}

Stage 5: Analysis

Kenji validates model inputs and fraud labels to ensure the detection system is working correctly.

Fraud Label Quality

-- Validate fraud labels against investigation outcomes
SELECT
    detection_method,
    COUNT(*) AS cases,
    SUM(CASE WHEN resolution = 'confirmed_fraud' THEN 1 ELSE 0 END) AS true_positives,
    SUM(CASE WHEN resolution = 'false_alarm' THEN 1 ELSE 0 END) AS false_positives,
    ROUND(100.0 * SUM(CASE WHEN resolution = 'confirmed_fraud' THEN 1 ELSE 0 END)
        / COUNT(*), 1) AS precision_pct
FROM fraud_cases
WHERE investigation_date >= '2025-07-01'
GROUP BY detection_method
ORDER BY cases DESC;
Detection MethodCasesTrue PositivesFalse PositivesPrecision
ML model v18,4127,89152193.8%
Rules engine3,2042,1151,08966.0%
Customer report2,8472,74110696.3%
Manual review1,10298711589.6%

The rules engine has a 34% false positive rate -- one of the primary reasons for building the ML-based system.

Distribution Shift Analysis

After a new mobile payment channel launched in Q4 2025, Kenji checks for distribution shift:

FeaturePre-Launch (Q3)Post-Launch (Q4)PSIStatus
channel distributionPOS:45%, Online:40%, ATM:15%POS:42%, Online:38%, ATM:12%, Mobile:8%0.082Warning
txn_hour distributionPeak 11am-2pmPeak 11am-2pm + 8pm-10pm (mobile)0.041Acceptable
amount distributionMean 131,P99131, P99 5,100Mean 127,P99127, P99 5,2000.008Stable
device_trust_scoreMean 0.72Mean 0.61 (new devices)0.094Warning

The mobile channel introduces new device fingerprints with lower trust scores. Kenji flags this for the next model retrain.


Stage 6: Productionization

Ensemble Model Deployment

Kenji deploys an ensemble combining gradient boosting (interpretable, fast) with a neural network (captures complex patterns):

{
  "deployment": {
    "name": "fraud-detection-ensemble-v3",
    "serving_engine": "ray_serve",
    "mode": "real_time",
    "models": [
      {
        "name": "fraud-xgboost-v3",
        "weight": 0.6,
        "framework": "xgboost",
        "resources": {"num_cpus": 2, "memory_gb": 4}
      },
      {
        "name": "fraud-nn-v3",
        "weight": 0.4,
        "framework": "pytorch",
        "resources": {"num_cpus": 2, "memory_gb": 4}
      }
    ],
    "sla": {
      "p50_latency_ms": 15,
      "p99_latency_ms": 50,
      "availability": 0.9999,
      "throughput_per_second": 200
    },
    "scaling": {
      "min_replicas": 3,
      "max_replicas": 12,
      "target_ongoing_requests": 50,
      "scale_up_threshold_seconds": 5,
      "availability_zones": ["us-east-1a", "us-east-1b", "us-east-1c"]
    },
    "fallback": {
      "enabled": true,
      "type": "rules_engine",
      "trigger": "model_latency > 100ms OR model_error",
      "rules": [
        {"condition": "amount > 5000 AND channel = 'online'", "action": "flag"},
        {"condition": "txn_count_1h > 20", "action": "flag"},
        {"condition": "country != account_country", "action": "flag"}
      ]
    }
  }
}

Deployment Verification

After deployment, Kenji verifies the serving infrastructure:

CheckExpectedActualStatus
Replicas running3 (minimum)3Pass
P50 latency< 15ms11msPass
P99 latency< 50ms38msPass
Throughput capacity> 200 req/s847 req/sPass
Fallback rules loaded3 rules3 rulesPass
Model version matchv3.0.0v3.0.0Pass

Stage 7: Feedback

Kenji builds a comprehensive real-time monitoring dashboard in the BI Workbench.

Real-Time Monitoring Dashboard

┌─────────────────────────────────────────────────────────────────────┐
│                 FRAUD DETECTION OPERATIONS                          │
│                 Last updated: 2026-02-28 14:32:17 UTC               │
├─────────────────────┬──────────────────────┬────────────────────────┤
│  Scoring Latency    │  Model Confidence    │  Fraud Rate (24h)      │
│                     │                      │                        │
│  P50:  11ms         │  Mean:   0.12        │  Detected:    142      │
│  P95:  28ms         │  Median: 0.04        │  Missed (est): 8       │
│  P99:  38ms         │  > 0.85:  0.09%      │  False Pos:    31      │
│  Max:  67ms         │  > 0.50:  0.41%      │  Catch Rate: 94.7%     │
├─────────────────────┼──────────────────────┼────────────────────────┤
│  Throughput         │  Fallback Events     │  Alerts (24h)          │
│                     │                      │                        │
│  Current: 168/sec   │  Today:      0       │  Latency:        0     │
│  Peak:    412/sec   │  This Week:  2       │  Confidence:     1     │
│  Capacity: 847/sec  │  Reason: timeout     │  Drift:          0     │
│  Util:    19.8%     │                      │  Fraud Spike:    0     │
└─────────────────────┴──────────────────────┴────────────────────────┘

Alert Configuration

{
  "alerts": [
    {
      "name": "scoring-latency-spike",
      "condition": "p99_latency_5min > 100",
      "severity": "critical",
      "channels": ["pagerduty", "slack:#fraud-ops"],
      "action": "Consider scaling replicas or activating fallback"
    },
    {
      "name": "model-confidence-drop",
      "condition": "avg_confidence_1h < 0.08 OR avg_confidence_1h > 0.25",
      "severity": "warning",
      "channels": ["slack:#fraud-ops"],
      "action": "Investigate input distribution shift"
    },
    {
      "name": "fraud-rate-anomaly",
      "condition": "fraud_rate_1h > 3 * fraud_rate_7d_avg",
      "severity": "critical",
      "channels": ["pagerduty", "slack:#fraud-ops", "email:fraud-team"],
      "action": "Potential fraud attack -- escalate to fraud investigation"
    },
    {
      "name": "feature-drift-detected",
      "condition": "any_feature_psi > 0.10",
      "severity": "warning",
      "channels": ["slack:#fraud-ops"],
      "action": "Schedule model retrain if sustained > 48h"
    }
  ]
}

Weekly Performance Report

Kenji generates automated weekly reports:

MetricTargetWeek 8Week 9Week 10Trend
Fraud catch rate> 95%94.2%94.7%95.1%Improving
False positive rate< 2%1.8%1.7%1.6%Improving
P99 latency< 50ms42ms38ms39msStable
Availability> 99.99%99.998%100%99.997%Stable
Fallback activations0120Acceptable
Dollar amount protected--$1.24M$1.31M$1.42MGrowing

Stage 8: Experimentation

Kenji tests new features and model architectures to continuously improve fraud detection.

Shadow Mode Testing

Before any production change, Kenji runs new models in shadow mode -- scoring every transaction without affecting decisions:

{
  "experiment": {
    "name": "device-fingerprint-feature-test",
    "type": "shadow",
    "production_model": "fraud-detection-ensemble-v3",
    "shadow_model": "fraud-detection-ensemble-v4-candidate",
    "shadow_features_added": [
      "device_fingerprint_match_score",
      "behavioral_biometric_score",
      "typing_pattern_anomaly"
    ],
    "duration_days": 14,
    "evaluation": {
      "metrics": ["auc", "precision_at_1pct_fpr", "recall_at_2pct_fpr"],
      "comparison_window": "daily"
    }
  }
}

Shadow Mode Results

MetricProduction (v3)Shadow (v4 candidate)Delta
AUC0.9610.974+1.4%
Precision @ 1% FPR62.3%71.8%+15.2%
Recall @ 2% FPR89.1%93.4%+4.8%
Latency P9938ms44ms+6ms
New fraud patterns caught--23 additional cases / week--

The device fingerprinting features add significant value, catching 23 additional fraud cases per week that the current model misses, with acceptable latency impact.

A/B Test with Traffic Split

After shadow validation, Kenji runs a controlled A/B test:

┌─────────────────────────────────────────────────┐
│            Transaction Flow                      │
└────────────────────┬────────────────────────────┘

          ┌──────────┴──────────┐
          │   Traffic Splitter   │
          │   (by account_id     │
          │    hash, stable)     │
          └──────────┬──────────┘
               ┌─────┴─────┐
         ┌─────▼─────┐ ┌───▼─────────┐
         │ Control   │ │ Treatment   │
         │ (95%)     │ │ (5%)        │
         │           │ │             │
         │ v3        │ │ v4 cand.    │
         │ Ensemble  │ │ + device    │
         │           │ │   features  │
         └─────┬─────┘ └──────┬──────┘
               │              │
         ┌─────▼──────────────▼──────┐
         │  Both make real decisions  │
         │  Results tracked per group │
         └───────────────────────────┘

A/B Test Results (Day 7 of 14)

MetricControl (v3, 95%)Treatment (v4, 5%)Statistical Significance
Fraud catch rate94.8%97.2%p = 0.018 (significant)
False positive rate1.7%1.4%p = 0.042 (significant)
Customer friction (blocks)0.41% of txns0.35% of txnsp = 0.11 (not yet)
Avg scoring latency24ms29ms-- (within SLA)
Revenue protected (est.)$186K/day14.2K/day(scaled:14.2K/day (scaled: 198K)--

Results show statistically significant improvements in both catch rate and false positive rate. Kenji prepares the promotion plan to gradually ramp v4 from 5% to 100% over the next two weeks.

Ensemble vs Single Model Comparison

In parallel, Kenji tests whether the neural network component adds value over XGBoost alone:

ConfigurationAUCP@1%FPRLatency P99Verdict
XGBoost only0.95458.1%22msFast but lower precision
Neural Net only0.94864.2%41msBetter precision, slower
Ensemble (current)0.96162.3%38msBest AUC overall
Ensemble + device features (v4)0.97471.8%44msBest across all metrics

The ensemble consistently outperforms single models, justifying the additional infrastructure cost.


Key Takeaways

StageKey ActionPlatform Component
IngestionKafka streaming + batch hybrid architectureAirbyte connectors, Kafka connector
DiscoveryProfiled transaction distributions, reused existing velocity featuresCatalog Service, Data Quality Service
QueryReal-time feature queries (< 10ms) + DuckDB historical analysisQuery Engine, DuckDB on S3
OrchestrationDual pipeline: streaming scoring + weekly batch retrainingPipeline Service (Temporal)
AnalysisValidated fraud labels (66% precision on rules engine), detected distribution shiftData Quality Service
ProductionizationEnsemble deployment to Ray Serve, 3-AZ, fallback rules engineRay Serve, model serving
FeedbackReal-time ops dashboard, latency/drift/fraud-rate alertsBI Workbench, alerting
ExperimentationShadow mode then 5% A/B test for new device fingerprint featuresExperiment framework

Related Walkthroughs