Data Scientist Journey: Building a Credit Risk Scoring Model
Persona: Amir, Senior Data Scientist at Meridian Bank (Risk Analytics team, 6 years experience)
Objective: Build a probability-of-default (PD) model for consumer loan applications to replace the legacy logistic regression scorecard
Timeline: 4-week sprint from data exploration to champion-challenger deployment
Datasets: credit_applications (200K), accounts (500K), transactions (50M), bureau_scores (450K), historical_defaults (35K)
Stage 1: Ingestion
Amir begins by connecting the data sources needed for credit risk modeling. He needs both live application data and historical outcomes to train a supervised model.
Connecting Core Banking PostgreSQL
In the Data Workbench, Amir opens the Ingestion panel and selects the Airbyte PostgreSQL connector. He configures the connection to the core banking system:
{
"source": {
"type": "postgres",
"name": "core-banking-prod",
"connection": {
"host": "core-banking.meridian.internal",
"port": 5432,
"database": "meridian_core",
"schema": "lending",
"replication_method": "CDC",
"publication": "matih_publication"
},
"tables": [
"accounts",
"credit_applications",
"payment_history",
"loan_disbursements"
],
"sync_schedule": {
"frequency": "daily",
"time": "02:00",
"timezone": "America/New_York"
}
}
}Importing Credit Bureau Scores
The bureau data arrives as a monthly CSV file from three bureaus. Amir uses the File Import component in the Data Workbench:
| Field | Value |
|---|---|
| File format | CSV (pipe-delimited) |
| File size | ~45 MB per month |
| Schema detection | Automatic with overrides |
| Dedup key | customer_id + score_date + bureau |
| Destination | risk_analytics.bureau_scores |
Connecting Snowflake for Historical Defaults
Historical default data spanning 10 years lives in the Snowflake analytics warehouse. Amir configures an incremental sync:
{
"source": {
"type": "snowflake",
"name": "snowflake-analytics",
"connection": {
"account": "meridian.us-east-1",
"warehouse": "ANALYTICS_WH",
"database": "RISK_ANALYTICS",
"schema": "HISTORICAL"
},
"tables": ["loan_defaults", "recovery_outcomes", "workout_history"],
"sync_schedule": {
"frequency": "daily",
"time": "03:00",
"timezone": "America/New_York"
},
"cursor_field": "updated_at"
}
}After configuring all three sources, Amir triggers an initial full sync. The sync status panel shows progress:
| Source | Tables | Rows Synced | Duration | Status |
|---|---|---|---|---|
| core-banking-prod | 4 | 751,200 | 8m 12s | Complete |
| bureau-scores-csv | 1 | 450,000 | 1m 45s | Complete |
| snowflake-analytics | 3 | 85,400 | 3m 22s | Complete |
Stage 2: Discovery
With data synced, Amir switches to the Data Catalog to explore what is available and assess data quality.
Catalog Exploration
Searching the catalog for "credit" returns 14 tables across 3 schemas. Amir focuses on these:
risk_analytics.credit_applications 200,432 rows Last synced: 2026-02-28 02:15 UTC
risk_analytics.bureau_scores 450,118 rows Last synced: 2026-02-28 03:00 UTC
lending.payment_history 3,842,901 rows Last synced: 2026-02-28 02:18 UTC
historical.loan_defaults 35,219 rows Last synced: 2026-02-28 03:22 UTCData Profiling
Amir runs the automated profiler on credit_applications. The Data Quality Service returns:
| Column | Type | Completeness | Distinct | Min | Max | Issues |
|---|---|---|---|---|---|---|
application_id | VARCHAR | 100% | 200,432 | -- | -- | None |
customer_id | VARCHAR | 100% | 178,201 | -- | -- | 12.4% repeat applicants |
income_reported | DECIMAL | 92.1% | 14,872 | $12,000 | $2,400,000 | 7.9% missing |
employment_length | INTEGER | 98.3% | 52 | 0 | 87 | Outliers > 50 years |
requested_amount | DECIMAL | 100% | 3,441 | $1,000 | $500,000 | Right-skewed |
debt_to_income | DECIMAL | 89.7% | 8,219 | 0.01 | 14.2 | 10.3% missing, extreme values |
ssn | VARCHAR | 100% | 178,201 | -- | -- | PII - masked by policy |
Amir notes two issues to address in feature engineering: missing income_reported values (7.9%) and employment_length outliers exceeding 50 years.
Lineage Tracing
The catalog's lineage view shows the data flow from application to outcome:
credit_applications ──▶ loan_disbursements ──▶ payment_history ──▶ loan_defaults
(apply) (approve) (repay) (default)
200,432 142,301 3,842,901 35,219This lineage confirms Amir can join application features to default outcomes, which is the target variable for the model.
Stage 3: Query
Amir uses the Query Engine to build the risk feature set. He needs to federate across PostgreSQL (live data) and Snowflake (historical outcomes).
Building the Risk Feature Set
-- Credit Risk Feature Set: Federated query across PostgreSQL + Snowflake
-- Governance note: SSN is auto-masked; only hashed customer_id used for joins
WITH payment_features AS (
-- Payment behavior features from core banking (PostgreSQL)
SELECT
ca.customer_id,
ca.application_id,
ca.income_reported,
ca.employment_length,
ca.requested_amount,
ca.debt_to_income,
-- Payment delinquency features
COUNT(CASE WHEN ph.days_past_due > 30 THEN 1 END)
AS dpd_30_count_12m,
COUNT(CASE WHEN ph.days_past_due > 60 THEN 1 END)
AS dpd_60_count_12m,
COUNT(CASE WHEN ph.days_past_due > 90 THEN 1 END)
AS dpd_90_count_12m,
MAX(ph.days_past_due) AS max_dpd_12m,
-- Utilization features
AVG(ph.balance / NULLIF(ph.credit_limit, 0))
AS avg_utilization_12m,
MAX(ph.balance / NULLIF(ph.credit_limit, 0))
AS max_utilization_12m
FROM risk_analytics.credit_applications ca
LEFT JOIN lending.payment_history ph
ON ca.customer_id = ph.customer_id
AND ph.payment_date >= ca.application_date - INTERVAL '12 months'
AND ph.payment_date < ca.application_date
GROUP BY ca.customer_id, ca.application_id,
ca.income_reported, ca.employment_length,
ca.requested_amount, ca.debt_to_income
),
bureau_features AS (
-- Latest bureau score before application date
SELECT DISTINCT ON (bs.customer_id)
bs.customer_id,
bs.score AS bureau_score,
bs.num_inquiries AS inquiries_6m,
bs.delinquencies AS bureau_delinquencies
FROM risk_analytics.bureau_scores bs
ORDER BY bs.customer_id, bs.score_date DESC
),
default_labels AS (
-- Default outcome from Snowflake (historical data)
SELECT
ld.loan_id,
ld.customer_id,
1 AS defaulted,
ld.loss_given_default
FROM historical.loan_defaults ld
)
SELECT
pf.application_id,
pf.customer_id,
-- Application features
pf.income_reported,
LEAST(pf.employment_length, 50) AS employment_length_capped,
pf.requested_amount,
pf.requested_amount / NULLIF(pf.income_reported, 0)
AS loan_to_income,
pf.debt_to_income,
-- Payment behavior
pf.dpd_30_count_12m,
pf.dpd_60_count_12m,
pf.dpd_90_count_12m,
pf.max_dpd_12m,
pf.avg_utilization_12m,
pf.max_utilization_12m,
-- Bureau features
bf.bureau_score,
bf.inquiries_6m,
bf.bureau_delinquencies,
-- Target variable
COALESCE(dl.defaulted, 0) AS defaulted
FROM payment_features pf
LEFT JOIN bureau_features bf ON pf.customer_id = bf.customer_id
LEFT JOIN default_labels dl ON pf.customer_id = dl.customer_id;The Query Engine executes this across both sources, returning 200,432 rows with 17 features and the target variable. Amir saves this as a materialized view risk_analytics.credit_risk_features.
Stage 4: Orchestration
Amir builds a daily pipeline to keep the risk feature set fresh. In the Pipeline Service, he defines a 4-stage DAG:
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Extract │───▶│ Validate │───▶│ Engineer │───▶│ Load │
│ │ │ │ │ Features │ │ Feature │
│ Sync all │ │ Quality │ │ │ │ Store │
│ 3 sources│ │ gates │ │ Run SQL │ │ │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
5 min 2 min 8 min 3 minPipeline Configuration
{
"pipeline": {
"name": "credit-risk-feature-pipeline",
"schedule": "0 4 * * *",
"owner": "amir@meridian.bank",
"tags": ["risk", "credit", "ml-features"],
"stages": [
{
"name": "extract",
"type": "sync_trigger",
"sources": ["core-banking-prod", "bureau-scores-csv", "snowflake-analytics"],
"timeout_minutes": 30
},
{
"name": "validate",
"type": "data_quality",
"checks": [
{
"table": "credit_applications",
"expectation": "expect_column_values_to_not_be_null",
"column": "customer_id",
"severity": "critical"
},
{
"table": "credit_applications",
"expectation": "expect_column_values_to_be_between",
"column": "income_reported",
"min_value": 0,
"max_value": 10000000,
"mostly": 0.92,
"severity": "warning"
},
{
"table": "bureau_scores",
"expectation": "expect_table_row_count_to_be_between",
"min_value": 400000,
"max_value": 600000,
"severity": "critical"
}
],
"on_failure": "halt_pipeline"
},
{
"name": "engineer_features",
"type": "sql_transform",
"query_ref": "risk_analytics.credit_risk_features",
"materialization": "table",
"depends_on": ["validate"]
},
{
"name": "load_feature_store",
"type": "feature_store_ingest",
"source_table": "risk_analytics.credit_risk_features",
"feature_group": "credit_risk_v2",
"entity_key": "application_id",
"timestamp_column": "application_date",
"depends_on": ["engineer_features"]
}
]
}
}The data quality gate in stage 2 is critical: if customer_id has nulls or bureau_scores row count drops unexpectedly, the pipeline halts and sends an alert rather than propagating bad data into the feature store.
Stage 5: Analysis
Before model training, Amir validates the feature set for modeling fitness.
Class Imbalance Check
SELECT
defaulted,
COUNT(*) AS count,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER(), 2) AS pct
FROM risk_analytics.credit_risk_features
GROUP BY defaulted;| defaulted | count | pct |
|---|---|---|
| 0 | 194,419 | 97.0% |
| 1 | 6,013 | 3.0% |
The 3% default rate confirms class imbalance. Amir will use SMOTE oversampling and class-weighted loss during training.
Data Leakage Validation
Amir verifies no features contain information from after the application decision:
| Feature | Temporal Check | Result |
|---|---|---|
dpd_30_count_12m | Uses only pre-application payment history | Pass |
bureau_score | Latest score before application date | Pass |
avg_utilization_12m | 12-month window ending at application | Pass |
loss_given_default | Post-default metric | Excluded (target leakage) |
Feature Correlation Analysis
The Data Quality Service generates a correlation matrix. Key findings:
| Feature Pair | Correlation | Action |
|---|---|---|
bureau_score / dpd_30_count_12m | -0.72 | Expected -- keep both (different signal sources) |
debt_to_income / loan_to_income | 0.84 | High collinearity -- consider dropping one |
max_utilization_12m / avg_utilization_12m | 0.91 | Keep max only for interpretability |
income_reported / requested_amount | 0.31 | Low -- independent features |
Amir decides to drop avg_utilization_12m (collinear with max) and keep both debt_to_income and loan_to_income (different denominators capture different risk signals).
Stage 6: Productionization
Model Training and Comparison
In the ML Workbench, Amir sets up an experiment to compare models:
{
"experiment": {
"name": "credit-risk-pd-model-v2",
"tracking_uri": "mlflow://meridian.matih.internal",
"runs": [
{
"name": "logistic-regression-baseline",
"model_type": "sklearn.linear_model.LogisticRegression",
"params": {
"class_weight": "balanced",
"C": 0.1,
"max_iter": 1000
}
},
{
"name": "xgboost-candidate",
"model_type": "xgboost.XGBClassifier",
"params": {
"n_estimators": 500,
"max_depth": 6,
"learning_rate": 0.05,
"scale_pos_weight": 32.3,
"subsample": 0.8,
"colsample_bytree": 0.8,
"eval_metric": "auc"
}
}
],
"evaluation": {
"split": "temporal",
"train_end": "2025-06-30",
"test_start": "2025-07-01",
"metrics": ["auc", "gini", "ks_statistic", "log_loss", "precision_at_10pct"]
}
}
}Model Comparison Results
| Metric | Logistic Regression | XGBoost | Improvement |
|---|---|---|---|
| AUC | 0.781 | 0.843 | +7.9% |
| Gini Coefficient | 0.562 | 0.686 | +22.1% |
| KS Statistic | 0.389 | 0.512 | +31.6% |
| Precision @ 10% recall | 42.1% | 58.7% | +39.4% |
| Log Loss | 0.142 | 0.108 | -23.9% |
Model Registration with Fairness Metrics
Amir registers the XGBoost model in the Model Registry with a mandatory model card:
{
"model_card": {
"name": "credit-risk-pd-xgboost-v2",
"version": "2.0.0",
"owner": "amir@meridian.bank",
"use_case": "Probability of default scoring for consumer loan applications",
"training_data": {
"source": "risk_analytics.credit_risk_features",
"rows": 160346,
"date_range": "2020-01-01 to 2025-06-30",
"target_rate": "3.0%"
},
"performance": {
"test_auc": 0.843,
"test_gini": 0.686,
"test_ks": 0.512
},
"fairness_metrics": {
"disparate_impact_ratio": {
"gender": 0.92,
"age_group": 0.88,
"ethnicity": 0.85
},
"equal_opportunity_difference": {
"gender": 0.03,
"age_group": 0.07,
"ethnicity": 0.09
},
"threshold": "DIR > 0.80 (four-fifths rule)"
},
"governance": {
"masked_columns": ["ssn", "date_of_birth"],
"audit_logging": true,
"regulatory_framework": "SR 11-7",
"approval_required": true,
"approved_by": null,
"approval_date": null
}
}
}Deployment to Ray Serve
Amir deploys the model for batch scoring of the loan portfolio:
{
"deployment": {
"model": "credit-risk-pd-xgboost-v2",
"version": "2.0.0",
"serving_engine": "ray_serve",
"mode": "batch",
"schedule": "0 6 * * *",
"input": "risk_analytics.credit_risk_features",
"output": "risk_analytics.pd_scores",
"resources": {
"num_replicas": 2,
"num_cpus": 4,
"memory_gb": 8
},
"governance": {
"decision_audit_table": "audit.model_decisions",
"log_inputs": true,
"log_predictions": true,
"log_feature_importance": true
}
}
}Stage 7: Feedback
Amir sets up ongoing monitoring to detect model degradation.
Model Performance Monitoring
In the ML Workbench monitoring tab, Amir configures weekly performance tracking:
| Metric | Baseline | Week 1 | Week 2 | Week 3 | Week 4 | Alert Threshold |
|---|---|---|---|---|---|---|
| AUC | 0.843 | 0.841 | 0.839 | 0.835 | 0.832 | < 0.75 |
| Gini | 0.686 | 0.682 | 0.678 | 0.670 | 0.664 | < 0.55 |
| PSI (Population Stability Index) | 0.000 | 0.012 | 0.018 | 0.024 | 0.031 | > 0.10 |
Feature Drift Detection
{
"drift_monitor": {
"name": "credit-risk-feature-drift",
"model": "credit-risk-pd-xgboost-v2",
"schedule": "weekly",
"reference_window": "2025-01-01 to 2025-06-30",
"features_monitored": [
{
"feature": "income_reported",
"method": "ks_test",
"threshold": 0.05
},
{
"feature": "debt_to_income",
"method": "psi",
"threshold": 0.10
},
{
"feature": "bureau_score",
"method": "wasserstein",
"threshold": 15.0
}
],
"alerts": {
"channel": "slack",
"target": "#risk-model-alerts",
"severity_mapping": {
"single_feature_drift": "warning",
"multiple_feature_drift": "critical",
"performance_degradation": "critical"
}
}
}
}Regulatory Model Validation Metrics
For SR 11-7 compliance, Amir generates quarterly validation reports:
| Validation Test | Q1 Result | Status |
|---|---|---|
| Discriminatory power (AUC) | 0.843 | Pass (> 0.70) |
| Calibration (Hosmer-Lemeshow) | p = 0.34 | Pass (p > 0.05) |
| Stability (PSI vs development) | 0.031 | Pass (< 0.10) |
| Concentration (Herfindahl) | 0.08 | Pass (< 0.15) |
| Back-testing (predicted vs actual PD) | 3.2% vs 3.0% | Pass (within 10%) |
Stage 8: Experimentation
Amir runs a champion-challenger test to validate the new model against the existing scorecard in production.
Champion-Challenger Design
┌─────────────────────────────────────────────────────────┐
│ Incoming Loan Applications │
│ (100% traffic) │
└────────────────────────┬────────────────────────────────┘
│
┌──────────┴──────────┐
│ Random Assignment │
│ (by application) │
└──────────┬──────────┘
┌────┴────┐
┌─────▼─────┐ ┌─▼───────────┐
│ Champion │ │ Challenger │
│ (70%) │ │ (30%) │
│ │ │ │
│ Current │ │ XGBoost v2 │
│ Scorecard │ │ AUC: 0.843 │
│ AUC: 0.78 │ │ │
└─────┬─────┘ └──────┬──────┘
│ │
┌─────▼──────────────▼──────┐
│ Both scores logged for │
│ every application │
│ Decision uses assigned │
│ model only │
└───────────────────────────┘Test Configuration
{
"experiment": {
"name": "credit-risk-champion-challenger-q1-2026",
"type": "champion_challenger",
"champion": {
"model": "legacy-scorecard-v1",
"traffic_pct": 70
},
"challenger": {
"model": "credit-risk-pd-xgboost-v2",
"traffic_pct": 30
},
"assignment": {
"method": "random",
"unit": "application_id",
"stratify_by": ["loan_product", "branch_region"]
},
"duration_weeks": 12,
"primary_metric": "auc_on_realized_defaults",
"guardrail_metrics": [
{
"name": "approval_rate",
"max_deviation": 0.05
},
{
"name": "disparate_impact_ratio",
"min_value": 0.80
}
],
"governance": {
"regulatory_review_required": true,
"reviewer": "model-validation-team@meridian.bank",
"sr_11_7_documentation": true
}
}
}Interim Results (Week 6 of 12)
| Metric | Champion (Scorecard) | Challenger (XGBoost) | Difference |
|---|---|---|---|
| Applications scored | 14,210 | 6,090 | -- |
| AUC (realized defaults) | 0.779 | 0.838 | +7.6% |
| Approval rate | 68.2% | 71.4% | +3.2pp |
| Expected loss (annualized) | $4.2M | $3.1M | -$1.1M |
| False decline rate | 8.4% | 5.1% | -3.3pp |
| Disparate impact (gender) | 0.89 | 0.91 | +0.02 |
| Disparate impact (ethnicity) | 0.83 | 0.85 | +0.02 |
The challenger model shows higher approval rates with lower expected losses, meaning it better separates good borrowers from risky ones. Fairness metrics are within regulatory thresholds. Amir prepares the model validation documentation for the regulatory review committee before full production rollout.
Governance Audit Trail
Every model decision is logged to audit.model_decisions with full traceability:
SELECT
decision_id,
application_id,
model_name,
model_version,
prediction_score,
decision,
feature_importance_top5,
created_at
FROM audit.model_decisions
WHERE application_id = 'APP-2026-00142891'
ORDER BY created_at DESC
LIMIT 1;| Field | Value |
|---|---|
decision_id | DEC-9f3a2b... |
application_id | APP-2026-00142891 |
model_name | credit-risk-pd-xgboost-v2 |
model_version | 2.0.0 |
prediction_score | 0.042 (4.2% PD) |
decision | APPROVE |
feature_importance_top5 | bureau_score (0.23), dpd_30_count (0.18), debt_to_income (0.15), utilization (0.12), employment_length (0.09) |
Key Takeaways
| Stage | Key Action | Platform Component |
|---|---|---|
| Ingestion | Connected 3 sources with scheduled syncs | Data Workbench, Airbyte connectors |
| Discovery | Profiled data, found quality issues, traced lineage | Catalog Service, Data Quality Service |
| Query | Built federated feature set across PostgreSQL + Snowflake | Query Engine (SQL federation) |
| Orchestration | Daily pipeline with quality gates | Pipeline Service (Temporal) |
| Analysis | Validated class balance, leakage, correlations | Data Quality Service, SQL analytics |
| Productionization | Trained, registered, deployed model with fairness metrics | ML Workbench, MLflow, Ray Serve |
| Feedback | Weekly performance monitoring, drift detection | ML monitoring, alerting |
| Experimentation | Champion-challenger test with regulatory governance | Experiment framework, Governance Service |
Related Walkthroughs
- ML Engineer Journey: Real-Time Fraud Detection -- Kenji builds the fraud scoring system that protects the same transactions
- BI Lead Journey: Regulatory Reporting -- Rachel builds dashboards that consume Amir's model outputs
- Executive Journey: Strategic Risk Analytics -- Elena uses model results for portfolio-level decisions
- Financial Services Overview -- Industry context and sample datasets