Data Scientist Journey: Building a Credit Risk Scoring Model

Persona: Amir, Senior Data Scientist at Meridian Bank (Risk Analytics team, 6 years experience) Objective: Build a probability-of-default (PD) model for consumer loan applications to replace the legacy logistic regression scorecard Timeline: 4-week sprint from data exploration to champion-challenger deployment Datasets: credit_applications (200K), accounts (500K), transactions (50M), bureau_scores (450K), historical_defaults (35K)

Stage 1: Ingestion

Amir begins by connecting the data sources needed for credit risk modeling. He needs both live application data and historical outcomes to train a supervised model.

Connecting Core Banking PostgreSQL

In the Data Workbench, Amir opens the Ingestion panel and selects the Airbyte PostgreSQL connector. He configures the connection to the core banking system:

{
  "source": {
    "type": "postgres",
    "name": "core-banking-prod",
    "connection": {
      "host": "core-banking.meridian.internal",
      "port": 5432,
      "database": "meridian_core",
      "schema": "lending",
      "replication_method": "CDC",
      "publication": "matih_publication"
    },
    "tables": [
      "accounts",
      "credit_applications",
      "payment_history",
      "loan_disbursements"
    ],
    "sync_schedule": {
      "frequency": "daily",
      "time": "02:00",
      "timezone": "America/New_York"
    }
  }
}

Importing Credit Bureau Scores

The bureau data arrives as a monthly CSV file from three bureaus. Amir uses the File Import component in the Data Workbench:

Field	Value
File format	CSV (pipe-delimited)
File size	~45 MB per month
Schema detection	Automatic with overrides
Dedup key	`customer_id` + `score_date` + `bureau`
Destination	`risk_analytics.bureau_scores`

Connecting Snowflake for Historical Defaults

Historical default data spanning 10 years lives in the Snowflake analytics warehouse. Amir configures an incremental sync:

{
  "source": {
    "type": "snowflake",
    "name": "snowflake-analytics",
    "connection": {
      "account": "meridian.us-east-1",
      "warehouse": "ANALYTICS_WH",
      "database": "RISK_ANALYTICS",
      "schema": "HISTORICAL"
    },
    "tables": ["loan_defaults", "recovery_outcomes", "workout_history"],
    "sync_schedule": {
      "frequency": "daily",
      "time": "03:00",
      "timezone": "America/New_York"
    },
    "cursor_field": "updated_at"
  }
}

After configuring all three sources, Amir triggers an initial full sync. The sync status panel shows progress:

Source	Tables	Rows Synced	Duration	Status
core-banking-prod	4	751,200	8m 12s	Complete
bureau-scores-csv	1	450,000	1m 45s	Complete
snowflake-analytics	3	85,400	3m 22s	Complete

Stage 2: Discovery

With data synced, Amir switches to the Data Catalog to explore what is available and assess data quality.

Catalog Exploration

Searching the catalog for "credit" returns 14 tables across 3 schemas. Amir focuses on these:

risk_analytics.credit_applications   200,432 rows   Last synced: 2026-02-28 02:15 UTC
risk_analytics.bureau_scores         450,118 rows   Last synced: 2026-02-28 03:00 UTC
lending.payment_history            3,842,901 rows   Last synced: 2026-02-28 02:18 UTC
historical.loan_defaults              35,219 rows   Last synced: 2026-02-28 03:22 UTC

Data Profiling

Amir runs the automated profiler on credit_applications. The Data Quality Service returns:

Column	Type	Completeness	Distinct	Min	Max	Issues
`application_id`	VARCHAR	100%	200,432	--	--	None
`customer_id`	VARCHAR	100%	178,201	--	--	12.4% repeat applicants
`income_reported`	DECIMAL	92.1%	14,872	$12,000	$2,400,000	7.9% missing
`employment_length`	INTEGER	98.3%	52	0	87	Outliers > 50 years
`requested_amount`	DECIMAL	100%	3,441	$1,000	$500,000	Right-skewed
`debt_to_income`	DECIMAL	89.7%	8,219	0.01	14.2	10.3% missing, extreme values
`ssn`	VARCHAR	100%	178,201	--	--	PII - masked by policy

Amir notes two issues to address in feature engineering: missing income_reported values (7.9%) and employment_length outliers exceeding 50 years.

Lineage Tracing

The catalog's lineage view shows the data flow from application to outcome:

credit_applications ──▶ loan_disbursements ──▶ payment_history ──▶ loan_defaults
      (apply)               (approve)            (repay)           (default)
     200,432                 142,301            3,842,901            35,219

This lineage confirms Amir can join application features to default outcomes, which is the target variable for the model.

Stage 3: Query

Amir uses the Query Engine to build the risk feature set. He needs to federate across PostgreSQL (live data) and Snowflake (historical outcomes).

Building the Risk Feature Set

-- Credit Risk Feature Set: Federated query across PostgreSQL + Snowflake
-- Governance note: SSN is auto-masked; only hashed customer_id used for joins
 
WITH payment_features AS (
    -- Payment behavior features from core banking (PostgreSQL)
    SELECT
        ca.customer_id,
        ca.application_id,
        ca.income_reported,
        ca.employment_length,
        ca.requested_amount,
        ca.debt_to_income,
        -- Payment delinquency features
        COUNT(CASE WHEN ph.days_past_due > 30 THEN 1 END)
            AS dpd_30_count_12m,
        COUNT(CASE WHEN ph.days_past_due > 60 THEN 1 END)
            AS dpd_60_count_12m,
        COUNT(CASE WHEN ph.days_past_due > 90 THEN 1 END)
            AS dpd_90_count_12m,
        MAX(ph.days_past_due) AS max_dpd_12m,
        -- Utilization features
        AVG(ph.balance / NULLIF(ph.credit_limit, 0))
            AS avg_utilization_12m,
        MAX(ph.balance / NULLIF(ph.credit_limit, 0))
            AS max_utilization_12m
    FROM risk_analytics.credit_applications ca
    LEFT JOIN lending.payment_history ph
        ON ca.customer_id = ph.customer_id
        AND ph.payment_date >= ca.application_date - INTERVAL '12 months'
        AND ph.payment_date < ca.application_date
    GROUP BY ca.customer_id, ca.application_id,
             ca.income_reported, ca.employment_length,
             ca.requested_amount, ca.debt_to_income
),
bureau_features AS (
    -- Latest bureau score before application date
    SELECT DISTINCT ON (bs.customer_id)
        bs.customer_id,
        bs.score AS bureau_score,
        bs.num_inquiries AS inquiries_6m,
        bs.delinquencies AS bureau_delinquencies
    FROM risk_analytics.bureau_scores bs
    ORDER BY bs.customer_id, bs.score_date DESC
),
default_labels AS (
    -- Default outcome from Snowflake (historical data)
    SELECT
        ld.loan_id,
        ld.customer_id,
        1 AS defaulted,
        ld.loss_given_default
    FROM historical.loan_defaults ld
)
 
SELECT
    pf.application_id,
    pf.customer_id,
    -- Application features
    pf.income_reported,
    LEAST(pf.employment_length, 50) AS employment_length_capped,
    pf.requested_amount,
    pf.requested_amount / NULLIF(pf.income_reported, 0)
        AS loan_to_income,
    pf.debt_to_income,
    -- Payment behavior
    pf.dpd_30_count_12m,
    pf.dpd_60_count_12m,
    pf.dpd_90_count_12m,
    pf.max_dpd_12m,
    pf.avg_utilization_12m,
    pf.max_utilization_12m,
    -- Bureau features
    bf.bureau_score,
    bf.inquiries_6m,
    bf.bureau_delinquencies,
    -- Target variable
    COALESCE(dl.defaulted, 0) AS defaulted
FROM payment_features pf
LEFT JOIN bureau_features bf ON pf.customer_id = bf.customer_id
LEFT JOIN default_labels dl ON pf.customer_id = dl.customer_id;

The Query Engine executes this across both sources, returning 200,432 rows with 17 features and the target variable. Amir saves this as a materialized view risk_analytics.credit_risk_features.

Stage 4: Orchestration

Amir builds a daily pipeline to keep the risk feature set fresh. In the Pipeline Service, he defines a 4-stage DAG:

┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│ Extract  │───▶│ Validate │───▶│ Engineer │───▶│  Load    │
│          │    │          │    │ Features │    │ Feature  │
│ Sync all │    │ Quality  │    │          │    │  Store   │
│ 3 sources│    │ gates    │    │ Run SQL  │    │          │
└──────────┘    └──────────┘    └──────────┘    └──────────┘
   5 min           2 min           8 min           3 min

Pipeline Configuration

{
  "pipeline": {
    "name": "credit-risk-feature-pipeline",
    "schedule": "0 4 * * *",
    "owner": "amir@meridian.bank",
    "tags": ["risk", "credit", "ml-features"],
    "stages": [
      {
        "name": "extract",
        "type": "sync_trigger",
        "sources": ["core-banking-prod", "bureau-scores-csv", "snowflake-analytics"],
        "timeout_minutes": 30
      },
      {
        "name": "validate",
        "type": "data_quality",
        "checks": [
          {
            "table": "credit_applications",
            "expectation": "expect_column_values_to_not_be_null",
            "column": "customer_id",
            "severity": "critical"
          },
          {
            "table": "credit_applications",
            "expectation": "expect_column_values_to_be_between",
            "column": "income_reported",
            "min_value": 0,
            "max_value": 10000000,
            "mostly": 0.92,
            "severity": "warning"
          },
          {
            "table": "bureau_scores",
            "expectation": "expect_table_row_count_to_be_between",
            "min_value": 400000,
            "max_value": 600000,
            "severity": "critical"
          }
        ],
        "on_failure": "halt_pipeline"
      },
      {
        "name": "engineer_features",
        "type": "sql_transform",
        "query_ref": "risk_analytics.credit_risk_features",
        "materialization": "table",
        "depends_on": ["validate"]
      },
      {
        "name": "load_feature_store",
        "type": "feature_store_ingest",
        "source_table": "risk_analytics.credit_risk_features",
        "feature_group": "credit_risk_v2",
        "entity_key": "application_id",
        "timestamp_column": "application_date",
        "depends_on": ["engineer_features"]
      }
    ]
  }
}

The data quality gate in stage 2 is critical: if customer_id has nulls or bureau_scores row count drops unexpectedly, the pipeline halts and sends an alert rather than propagating bad data into the feature store.

Stage 5: Analysis

Before model training, Amir validates the feature set for modeling fitness.

Class Imbalance Check

SELECT
    defaulted,
    COUNT(*) AS count,
    ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER(), 2) AS pct
FROM risk_analytics.credit_risk_features
GROUP BY defaulted;

defaulted	count	pct
0	194,419	97.0%
1	6,013	3.0%

The 3% default rate confirms class imbalance. Amir will use SMOTE oversampling and class-weighted loss during training.

Data Leakage Validation

Amir verifies no features contain information from after the application decision:

Feature	Temporal Check	Result
`dpd_30_count_12m`	Uses only pre-application payment history	Pass
`bureau_score`	Latest score before application date	Pass
`avg_utilization_12m`	12-month window ending at application	Pass
`loss_given_default`	Post-default metric	Excluded (target leakage)

Feature Correlation Analysis

The Data Quality Service generates a correlation matrix. Key findings:

Feature Pair	Correlation	Action
`bureau_score` / `dpd_30_count_12m`	-0.72	Expected -- keep both (different signal sources)
`debt_to_income` / `loan_to_income`	0.84	High collinearity -- consider dropping one
`max_utilization_12m` / `avg_utilization_12m`	0.91	Keep max only for interpretability
`income_reported` / `requested_amount`	0.31	Low -- independent features

Amir decides to drop avg_utilization_12m (collinear with max) and keep both debt_to_income and loan_to_income (different denominators capture different risk signals).

Stage 6: Productionization

Model Training and Comparison

In the ML Workbench, Amir sets up an experiment to compare models:

{
  "experiment": {
    "name": "credit-risk-pd-model-v2",
    "tracking_uri": "mlflow://meridian.matih.internal",
    "runs": [
      {
        "name": "logistic-regression-baseline",
        "model_type": "sklearn.linear_model.LogisticRegression",
        "params": {
          "class_weight": "balanced",
          "C": 0.1,
          "max_iter": 1000
        }
      },
      {
        "name": "xgboost-candidate",
        "model_type": "xgboost.XGBClassifier",
        "params": {
          "n_estimators": 500,
          "max_depth": 6,
          "learning_rate": 0.05,
          "scale_pos_weight": 32.3,
          "subsample": 0.8,
          "colsample_bytree": 0.8,
          "eval_metric": "auc"
        }
      }
    ],
    "evaluation": {
      "split": "temporal",
      "train_end": "2025-06-30",
      "test_start": "2025-07-01",
      "metrics": ["auc", "gini", "ks_statistic", "log_loss", "precision_at_10pct"]
    }
  }
}

Model Comparison Results

Metric	Logistic Regression	XGBoost	Improvement
AUC	0.781	0.843	+7.9%
Gini Coefficient	0.562	0.686	+22.1%
KS Statistic	0.389	0.512	+31.6%
Precision @ 10% recall	42.1%	58.7%	+39.4%
Log Loss	0.142	0.108	-23.9%

Model Registration with Fairness Metrics

Amir registers the XGBoost model in the Model Registry with a mandatory model card:

{
  "model_card": {
    "name": "credit-risk-pd-xgboost-v2",
    "version": "2.0.0",
    "owner": "amir@meridian.bank",
    "use_case": "Probability of default scoring for consumer loan applications",
    "training_data": {
      "source": "risk_analytics.credit_risk_features",
      "rows": 160346,
      "date_range": "2020-01-01 to 2025-06-30",
      "target_rate": "3.0%"
    },
    "performance": {
      "test_auc": 0.843,
      "test_gini": 0.686,
      "test_ks": 0.512
    },
    "fairness_metrics": {
      "disparate_impact_ratio": {
        "gender": 0.92,
        "age_group": 0.88,
        "ethnicity": 0.85
      },
      "equal_opportunity_difference": {
        "gender": 0.03,
        "age_group": 0.07,
        "ethnicity": 0.09
      },
      "threshold": "DIR > 0.80 (four-fifths rule)"
    },
    "governance": {
      "masked_columns": ["ssn", "date_of_birth"],
      "audit_logging": true,
      "regulatory_framework": "SR 11-7",
      "approval_required": true,
      "approved_by": null,
      "approval_date": null
    }
  }
}

Deployment to Ray Serve

Amir deploys the model for batch scoring of the loan portfolio:

{
  "deployment": {
    "model": "credit-risk-pd-xgboost-v2",
    "version": "2.0.0",
    "serving_engine": "ray_serve",
    "mode": "batch",
    "schedule": "0 6 * * *",
    "input": "risk_analytics.credit_risk_features",
    "output": "risk_analytics.pd_scores",
    "resources": {
      "num_replicas": 2,
      "num_cpus": 4,
      "memory_gb": 8
    },
    "governance": {
      "decision_audit_table": "audit.model_decisions",
      "log_inputs": true,
      "log_predictions": true,
      "log_feature_importance": true
    }
  }
}

Stage 7: Feedback

Amir sets up ongoing monitoring to detect model degradation.

Model Performance Monitoring

In the ML Workbench monitoring tab, Amir configures weekly performance tracking:

Metric	Baseline	Week 1	Week 2	Week 3	Week 4	Alert Threshold
AUC	0.843	0.841	0.839	0.835	0.832	< 0.75
Gini	0.686	0.682	0.678	0.670	0.664	< 0.55
PSI (Population Stability Index)	0.000	0.012	0.018	0.024	0.031	> 0.10

Feature Drift Detection

{
  "drift_monitor": {
    "name": "credit-risk-feature-drift",
    "model": "credit-risk-pd-xgboost-v2",
    "schedule": "weekly",
    "reference_window": "2025-01-01 to 2025-06-30",
    "features_monitored": [
      {
        "feature": "income_reported",
        "method": "ks_test",
        "threshold": 0.05
      },
      {
        "feature": "debt_to_income",
        "method": "psi",
        "threshold": 0.10
      },
      {
        "feature": "bureau_score",
        "method": "wasserstein",
        "threshold": 15.0
      }
    ],
    "alerts": {
      "channel": "slack",
      "target": "#risk-model-alerts",
      "severity_mapping": {
        "single_feature_drift": "warning",
        "multiple_feature_drift": "critical",
        "performance_degradation": "critical"
      }
    }
  }
}

Regulatory Model Validation Metrics

For SR 11-7 compliance, Amir generates quarterly validation reports:

Validation Test	Q1 Result	Status
Discriminatory power (AUC)	0.843	Pass (> 0.70)
Calibration (Hosmer-Lemeshow)	p = 0.34	Pass (p > 0.05)
Stability (PSI vs development)	0.031	Pass (< 0.10)
Concentration (Herfindahl)	0.08	Pass (< 0.15)
Back-testing (predicted vs actual PD)	3.2% vs 3.0%	Pass (within 10%)

Stage 8: Experimentation

Amir runs a champion-challenger test to validate the new model against the existing scorecard in production.

Champion-Challenger Design

┌─────────────────────────────────────────────────────────┐
│              Incoming Loan Applications                  │
│                    (100% traffic)                        │
└────────────────────────┬────────────────────────────────┘
                         │
              ┌──────────┴──────────┐
              │   Random Assignment  │
              │    (by application)  │
              └──────────┬──────────┘
                    ┌────┴────┐
              ┌─────▼─────┐ ┌─▼───────────┐
              │ Champion  │ │ Challenger  │
              │ (70%)     │ │ (30%)       │
              │           │ │             │
              │ Current   │ │ XGBoost v2  │
              │ Scorecard │ │ AUC: 0.843  │
              │ AUC: 0.78 │ │             │
              └─────┬─────┘ └──────┬──────┘
                    │              │
              ┌─────▼──────────────▼──────┐
              │  Both scores logged for   │
              │  every application        │
              │  Decision uses assigned   │
              │  model only               │
              └───────────────────────────┘

Test Configuration

{
  "experiment": {
    "name": "credit-risk-champion-challenger-q1-2026",
    "type": "champion_challenger",
    "champion": {
      "model": "legacy-scorecard-v1",
      "traffic_pct": 70
    },
    "challenger": {
      "model": "credit-risk-pd-xgboost-v2",
      "traffic_pct": 30
    },
    "assignment": {
      "method": "random",
      "unit": "application_id",
      "stratify_by": ["loan_product", "branch_region"]
    },
    "duration_weeks": 12,
    "primary_metric": "auc_on_realized_defaults",
    "guardrail_metrics": [
      {
        "name": "approval_rate",
        "max_deviation": 0.05
      },
      {
        "name": "disparate_impact_ratio",
        "min_value": 0.80
      }
    ],
    "governance": {
      "regulatory_review_required": true,
      "reviewer": "model-validation-team@meridian.bank",
      "sr_11_7_documentation": true
    }
  }
}

Interim Results (Week 6 of 12)

Metric	Champion (Scorecard)	Challenger (XGBoost)	Difference
Applications scored	14,210	6,090	--
AUC (realized defaults)	0.779	0.838	+7.6%
Approval rate	68.2%	71.4%	+3.2pp
Expected loss (annualized)	$4.2M	$3.1M	-$1.1M
False decline rate	8.4%	5.1%	-3.3pp
Disparate impact (gender)	0.89	0.91	+0.02
Disparate impact (ethnicity)	0.83	0.85	+0.02

The challenger model shows higher approval rates with lower expected losses, meaning it better separates good borrowers from risky ones. Fairness metrics are within regulatory thresholds. Amir prepares the model validation documentation for the regulatory review committee before full production rollout.

Governance Audit Trail

Every model decision is logged to audit.model_decisions with full traceability:

SELECT
    decision_id,
    application_id,
    model_name,
    model_version,
    prediction_score,
    decision,
    feature_importance_top5,
    created_at
FROM audit.model_decisions
WHERE application_id = 'APP-2026-00142891'
ORDER BY created_at DESC
LIMIT 1;

Field	Value
`decision_id`	`DEC-9f3a2b...`
`application_id`	`APP-2026-00142891`
`model_name`	`credit-risk-pd-xgboost-v2`
`model_version`	`2.0.0`
`prediction_score`	0.042 (4.2% PD)
`decision`	APPROVE
`feature_importance_top5`	`bureau_score (0.23), dpd_30_count (0.18), debt_to_income (0.15), utilization (0.12), employment_length (0.09)`

Key Takeaways

Stage	Key Action	Platform Component
Ingestion	Connected 3 sources with scheduled syncs	Data Workbench, Airbyte connectors
Discovery	Profiled data, found quality issues, traced lineage	Catalog Service, Data Quality Service
Query	Built federated feature set across PostgreSQL + Snowflake	Query Engine (SQL federation)
Orchestration	Daily pipeline with quality gates	Pipeline Service (Temporal)
Analysis	Validated class balance, leakage, correlations	Data Quality Service, SQL analytics
Productionization	Trained, registered, deployed model with fairness metrics	ML Workbench, MLflow, Ray Serve
Feedback	Weekly performance monitoring, drift detection	ML monitoring, alerting
Experimentation	Champion-challenger test with regulatory governance	Experiment framework, Governance Service

Related Walkthroughs

ML Engineer Journey: Real-Time Fraud Detection -- Kenji builds the fraud scoring system that protects the same transactions
BI Lead Journey: Regulatory Reporting -- Rachel builds dashboards that consume Amir's model outputs
Executive Journey: Strategic Risk Analytics -- Elena uses model results for portfolio-level decisions
Financial Services Overview -- Industry context and sample datasets

Industry Overview ML Engineer Journey