Data Scientist Journey: Predicting 30-Day Hospital Readmissions

Persona: Dr. Maya Chen, Clinical Data Scientist at Pinnacle Health System Goal: Build a readmission risk model to reduce penalties under the CMS Hospital Readmissions Reduction Program (HRRP) Primary Workbenches: ML Workbench, Data Workbench

Background

Pinnacle Health's 30-day readmission rate is 14.2% -- above the 12% CMS target. Under the HRRP, excess readmissions for targeted conditions (heart failure, pneumonia, COPD, hip/knee replacement, CABG) result in payment reductions of up to 3% of total Medicare reimbursement. For Pinnacle Health, this translates to an estimated $12.6M annual penalty at current rates.

Dr. Maya Chen has been tasked with building a predictive model that identifies high-risk patients at discharge so care coordinators can intervene with transitional care programs (follow-up calls, home visits, medication reconciliation).

Stage 1: Ingestion

Maya begins by connecting the clinical data sources she needs. She opens the Data Workbench and navigates to the Ingestion panel.

EHR Connection via FHIR API

Pinnacle Health runs Epic at 7 hospitals and Cerner at 5. Both expose patient data via FHIR R4 APIs. Maya configures the FHIR connector to pull the resources she needs:

{
  "source_type": "fhir",
  "connection_name": "pinnacle_ehr_epic",
  "config": {
    "fhir_base_url": "https://epic.pinnaclehealth.org/fhir/r4",
    "auth_type": "oauth2_client_credentials",
    "client_id": "${EPIC_CLIENT_ID}",
    "client_secret": "${EPIC_CLIENT_SECRET}",
    "resources": [
      "Patient",
      "Encounter",
      "Condition",
      "Procedure",
      "DiagnosticReport",
      "Observation",
      "MedicationRequest"
    ],
    "sync_mode": "incremental",
    "since_parameter": "lastUpdated",
    "page_size": 500
  },
  "schedule": {
    "frequency": "every_15_minutes"
  },
  "destination": {
    "schema": "clinical_ehr",
    "table_prefix": "epic_"
  }
}

Claims Database

The claims clearinghouse runs on PostgreSQL. Maya configures CDC-based replication:

{
  "source_type": "postgresql",
  "connection_name": "pinnacle_claims",
  "config": {
    "host": "claims-db.pinnaclehealth.org",
    "port": 5432,
    "database": "claims_warehouse",
    "replication_method": "cdc",
    "tables": ["claims", "claim_lines", "denials", "payer_contracts"],
    "schema": "public"
  },
  "schedule": {
    "frequency": "hourly"
  },
  "destination": {
    "schema": "claims_data"
  }
}

Lab Results and Patient Surveys

Maya also connects the Lab Information System (LIS) via FHIR for laboratory observations, and imports RedCap patient satisfaction survey exports as CSV files through the Data Workbench file import feature.

Source	Records	Sync Status
Epic EHR (7 hospitals)	142K patients, 1.4M encounters	Syncing every 15 min
Cerner EHR (5 hospitals)	58K patients, 600K encounters	Syncing every 15 min
Claims Clearinghouse	3M claims	Hourly CDC
Lab Information System	5M results	Every 30 min
RedCap Surveys (CSV)	24K responses	Imported weekly

Stage 2: Discovery

With data flowing, Maya switches to the Data Workbench Catalog to explore and understand the available clinical datasets.

Browsing the Clinical Catalog

Maya searches the catalog for admission-related tables. The catalog shows:

clinical_ehr.epic_encounters
  ├── Columns: 34
  ├── Rows: 1,400,000
  ├── Last synced: 2 minutes ago
  ├── Tags: [PHI] [HIPAA] [Clinical] [Encounter]
  ├── Quality score: 87/100
  └── Lineage: Epic FHIR → Ingestion → clinical_ehr schema

clinical_ehr.cerner_encounters
  ├── Columns: 31
  ├── Rows: 600,000
  ├── Tags: [PHI] [HIPAA] [Clinical] [Encounter]
  └── Quality score: 82/100

Data Profiling Findings

Maya runs profiling on the unified encounters view and discovers critical quality issues:

Column	Issue	Impact
`discharge_disposition`	15% missing values	Cannot determine if readmission was planned vs unplanned
`primary_diagnosis_code`	ICD-10 vs ICD-9 mix (Cerner legacy)	Comorbidity index calculation requires consistent coding
`attending_physician_id`	8% NULL for ED encounters	Physician-level analysis incomplete
`length_of_stay`	23 encounters with negative values	Data entry errors in discharge timestamps

Lineage Tracing

Maya traces the data flow from encounter through diagnosis and procedure codes:

encounter (EHR)
  ├──▶ diagnosis_codes (ICD-10-CM)
  │      └──▶ drg_assignment (MS-DRG grouper)
  │             └──▶ claims.drg_code
  ├──▶ procedure_codes (ICD-10-PCS / CPT)
  │      └──▶ claims.claim_lines
  └──▶ lab_results (LOINC coded)
         └──▶ feature_store.clinical_features

HIPAA Tagging

Maya verifies that the Governance Service has automatically tagged PHI columns. She reviews the tags in the catalog:

Table	PHI Columns	Masking Applied
`patients`	mrn, first_name, last_name, ssn, birth_date, zip_code, phone, email	Hash (names), Redact (SSN), Generalize (DOB to year, ZIP to 3-digit)
`encounters`	patient_id (FK to PHI)	Tokenized for research datasets
`lab_results`	patient_id (FK to PHI)	Tokenized for research datasets

Stage 3: Query

Maya moves to the Query Engine to build her readmission cohort. The core clinical logic requires careful SQL to match CMS methodology.

Building the Readmission Cohort

-- CMS-aligned readmission cohort identification
-- Step 1: Identify qualifying index admissions
WITH index_admissions AS (
    SELECT
        e.encounter_id,
        e.patient_id,
        e.facility_id,
        e.admit_date,
        e.discharge_date,
        e.discharge_disposition,
        e.primary_diagnosis_code,
        d.drg_code,
        d.drg_weight,
        DATEDIFF(day, e.admit_date, e.discharge_date) AS length_of_stay
    FROM clinical_unified.encounters e
    JOIN claims_data.claims c ON e.encounter_id = c.encounter_id
    JOIN claims_data.drg_assignments d ON c.claim_id = d.claim_id
    WHERE e.encounter_type = 'INPATIENT'
      AND e.discharge_date BETWEEN DATE '2025-01-01' AND DATE '2025-12-31'
      AND e.discharge_disposition NOT IN ('EXPIRED', 'LEFT_AMA', 'TRANSFER_ACUTE')
      AND e.age_at_admission >= 18
      -- Exclude planned readmission DRGs (CMS Planned Readmission Algorithm v4.0)
      AND d.drg_code NOT IN (
          '001', '002', '003', '004', '005',  -- Organ transplant
          '461', '462', '463', '464', '465',  -- Joint replacement
          '216', '217', '218', '219', '220'   -- Cardiac procedures
      )
),
 
-- Step 2: Identify 30-day readmissions
readmissions AS (
    SELECT
        ia.encounter_id AS index_encounter_id,
        ia.patient_id,
        ia.discharge_date AS index_discharge_date,
        ra.encounter_id AS readmit_encounter_id,
        ra.admit_date AS readmit_date,
        DATEDIFF(day, ia.discharge_date, ra.admit_date) AS days_to_readmit,
        ra.primary_diagnosis_code AS readmit_diagnosis
    FROM index_admissions ia
    JOIN clinical_unified.encounters ra
        ON ia.patient_id = ra.patient_id
        AND ra.encounter_type = 'INPATIENT'
        AND ra.admit_date > ia.discharge_date
        AND ra.admit_date <= ia.discharge_date + INTERVAL '30' DAY
        AND ra.encounter_id != ia.encounter_id
    -- Take only the first readmission per index admission
    QUALIFY ROW_NUMBER() OVER (
        PARTITION BY ia.encounter_id
        ORDER BY ra.admit_date
    ) = 1
)
 
-- Step 3: Build the analysis cohort
SELECT
    ia.*,
    CASE WHEN r.readmit_encounter_id IS NOT NULL THEN 1 ELSE 0 END AS readmitted_30d,
    r.days_to_readmit,
    r.readmit_diagnosis
FROM index_admissions ia
LEFT JOIN readmissions r ON ia.encounter_id = r.index_encounter_id;

Federating EHR with Claims for Comorbidity Indices

-- Compute Elixhauser Comorbidity Index for each patient
SELECT
    p.patient_id,
    COUNT(DISTINCT CASE WHEN dx.category = 'CHF' THEN 1 END) > 0 AS has_chf,
    COUNT(DISTINCT CASE WHEN dx.category = 'COPD' THEN 1 END) > 0 AS has_copd,
    COUNT(DISTINCT CASE WHEN dx.category = 'DIABETES_UNCOMP' THEN 1 END) > 0 AS has_diabetes,
    COUNT(DISTINCT CASE WHEN dx.category = 'RENAL_FAILURE' THEN 1 END) > 0 AS has_renal_failure,
    COUNT(DISTINCT CASE WHEN dx.category = 'DEPRESSION' THEN 1 END) > 0 AS has_depression,
    COUNT(DISTINCT dx.category) AS elixhauser_count,
    -- Van Walraven weighted score
    SUM(DISTINCT dx.vanwalraven_weight) AS elixhauser_vanwalraven_score
FROM clinical_unified.patients p
JOIN clinical_unified.encounters e ON p.patient_id = e.patient_id
JOIN clinical_unified.diagnosis_codes dc ON e.encounter_id = dc.encounter_id
JOIN reference.elixhauser_icd10_mapping dx ON dc.icd10_code = dx.icd10_code
WHERE e.discharge_date >= CURRENT_DATE - INTERVAL '365' DAY
GROUP BY p.patient_id;

Stage 4: Orchestration

Maya builds a daily pipeline to keep her feature set current. She defines this in the Pipeline Service (Temporal-based orchestration).

Daily Readmission Feature Pipeline

{
  "pipeline_name": "readmission_feature_pipeline",
  "schedule": "0 6 * * *",
  "description": "Daily extraction of clinical features for readmission prediction",
  "steps": [
    {
      "step_id": "extract_clinical_features",
      "type": "sql_transform",
      "query_ref": "sql/readmission_features.sql",
      "destination": "features.readmission_clinical_v2",
      "write_mode": "overwrite_partition",
      "partition_key": "discharge_date"
    },
    {
      "step_id": "compute_comorbidity_index",
      "type": "sql_transform",
      "depends_on": ["extract_clinical_features"],
      "query_ref": "sql/elixhauser_comorbidity.sql",
      "destination": "features.patient_comorbidity_index"
    },
    {
      "step_id": "quality_checks",
      "type": "data_quality",
      "depends_on": ["compute_comorbidity_index"],
      "suite": "readmission_feature_quality",
      "checks": [
        { "type": "not_null", "columns": ["patient_id", "encounter_id", "discharge_date"] },
        { "type": "accepted_values", "column": "discharge_disposition", "values": ["HOME", "SNF", "HHA", "REHAB", "LTCH"] },
        { "type": "range", "column": "length_of_stay", "min": 0, "max": 365 },
        { "type": "range", "column": "age_at_admission", "min": 18, "max": 120 },
        { "type": "unique_compound", "columns": ["patient_id", "encounter_id"] }
      ],
      "on_failure": "alert_and_quarantine"
    },
    {
      "step_id": "load_feature_store",
      "type": "feature_store_load",
      "depends_on": ["quality_checks"],
      "feature_group": "readmission_risk_features",
      "entity_key": "encounter_id",
      "timestamp_key": "discharge_date"
    },
    {
      "step_id": "hipaa_audit_log",
      "type": "audit_event",
      "depends_on": ["load_feature_store"],
      "event_type": "phi_data_access",
      "details": {
        "pipeline": "readmission_feature_pipeline",
        "data_accessed": ["encounters", "diagnosis_codes", "lab_results", "claims"],
        "phi_columns_accessed": ["patient_id", "encounter_id"],
        "purpose": "readmission_risk_model_training",
        "irb_approval": "IRB-2025-0142"
      }
    }
  ]
}

Pipeline Execution Monitoring

Pipeline: readmission_feature_pipeline
Run: 2025-11-15 06:00:00 UTC
Status: COMPLETED

  extract_clinical_features  ████████████████████ 100%  [3m 22s]  187,432 rows
  compute_comorbidity_index  ████████████████████ 100%  [1m 48s]  142,876 rows
  quality_checks             ████████████████████ 100%  [0m 34s]  3 warnings
  load_feature_store         ████████████████████ 100%  [0m 52s]  187,432 features
  hipaa_audit_log            ████████████████████ 100%  [0m 02s]  1 event logged

Warnings:
  - 847 encounters missing discharge_disposition (quarantined)
  - 12 encounters with LOS > 60 days (flagged for review)
  - 3 duplicate encounter_ids from EHR merge (deduplicated)

Stage 5: Analysis

Maya explores the readmission cohort in the ML Workbench notebook environment, profiling outcomes and checking for bias before model training.

Cohort Analysis

-- Readmission rate by facility
SELECT
    f.facility_name,
    COUNT(*) AS total_discharges,
    SUM(readmitted_30d) AS readmissions,
    ROUND(100.0 * SUM(readmitted_30d) / COUNT(*), 1) AS readmission_rate,
    AVG(length_of_stay) AS avg_los
FROM features.readmission_clinical_v2 rc
JOIN reference.facilities f ON rc.facility_id = f.facility_id
GROUP BY f.facility_name
ORDER BY readmission_rate DESC;

Results:

Facility	Discharges	Readmissions	Rate	Avg LOS
Pinnacle Downtown	28,412	4,574	16.1%	5.2
Pinnacle East	22,108	3,360	15.2%	4.9
Pinnacle Memorial	18,934	2,536	13.4%	4.6
Pinnacle Community	15,210	2,006	13.2%	4.4
System Average	187,432	26,615	14.2%	4.8

Demographic Bias Check

Maya checks for disparities across demographic groups to ensure the model does not perpetuate bias:

Group	Readmission Rate	Index Admissions	Notes
White	13.8%	112,459	Baseline
Black	16.2%	37,486	2.4pp above baseline -- investigate social determinants
Hispanic	14.9%	22,425	Within range
Asian	12.1%	9,372	Below baseline
Age 18-44	9.8%	28,115	Lower risk cohort
Age 45-64	13.4%	56,229	Moderate risk
Age 65+	16.9%	103,088	Highest risk -- aligns with CMS focus
Dual-eligible (Medicare+Medicaid)	19.3%	31,264	Social determinants impact

Feature Importance Profiling

Maya validates that her candidate features have clinical face validity:

Feature	Type	Completeness	Distribution	Clinical Rationale
`elixhauser_vanwalraven_score`	Numeric	99.8%	Mean 4.2, SD 6.1	Comorbidity burden predicts readmission
`length_of_stay`	Numeric	100%	Median 3, IQR 2-6	Shorter LOS may indicate premature discharge
`prior_admissions_12m`	Count	100%	Median 1, range 0-14	Prior utilization is strongest predictor
`discharge_disposition`	Categorical	85%	5 categories	Discharge to SNF vs home affects risk
`ed_visit_prior_30d`	Binary	100%	22% positive	ED use signals instability
`lab_abnormal_count`	Count	94%	Median 1, range 0-12	Unresolved lab abnormalities at discharge

Stage 6: Productionization

Maya trains her model and deploys it for clinical use through the ML Workbench.

Model Training

# Readmission Risk Model -- Random Forest with Clinical Features
# Model Card: pinnacle_readmission_rf_v2
 
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import roc_auc_score
import mlflow
 
with mlflow.start_run(run_name="readmission_rf_v2"):
    model = RandomForestClassifier(
        n_estimators=500,
        max_depth=12,
        min_samples_leaf=50,
        class_weight="balanced",
        random_state=42
    )
 
    # 5-fold cross-validation
    cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
    cv_scores = []
    for train_idx, val_idx in cv.split(X_train, y_train):
        model.fit(X_train[train_idx], y_train[train_idx])
        y_pred = model.predict_proba(X_train[val_idx])[:, 1]
        cv_scores.append(roc_auc_score(y_train[val_idx], y_pred))
 
    mlflow.log_metric("cv_c_statistic_mean", np.mean(cv_scores))  # 0.72
    mlflow.log_metric("cv_c_statistic_std", np.std(cv_scores))    # 0.014
    mlflow.log_params({
        "n_estimators": 500,
        "max_depth": 12,
        "cohort_size": len(X_train),
        "positive_rate": y_train.mean(),
        "feature_count": X_train.shape[1]
    })
    mlflow.sklearn.log_model(model, "readmission_rf_v2")

Model Registration with Clinical Model Card

{
  "model_name": "pinnacle_readmission_rf_v2",
  "version": "2.1.0",
  "model_card": {
    "intended_use": {
      "primary": "Identify patients at high risk of 30-day readmission at time of discharge",
      "users": ["Care coordinators", "Discharge planners", "Case managers"],
      "out_of_scope": ["Pediatric patients (< 18)", "Obstetric admissions", "Psychiatric admissions"]
    },
    "performance": {
      "c_statistic": 0.72,
      "sensitivity_at_20pct_alert_rate": 0.48,
      "ppv_at_20pct_alert_rate": 0.34,
      "calibration": "Well-calibrated across risk deciles (Hosmer-Lemeshow p=0.42)"
    },
    "fairness_analysis": {
      "metric": "equalized_odds",
      "results": {
        "white_vs_black_tpr_gap": 0.03,
        "white_vs_hispanic_tpr_gap": 0.01,
        "age_65plus_vs_under65_tpr_gap": 0.05
      },
      "assessment": "Within acceptable thresholds (< 0.10 TPR gap)"
    },
    "limitations": [
      "Trained on Pinnacle Health data only -- may not generalize to other health systems",
      "Does not account for social determinants of health beyond insurance type",
      "Performance degrades for rare DRGs with < 100 training examples"
    ],
    "regulatory": {
      "irb_approval": "IRB-2025-0142",
      "hipaa_compliance": "Model trained on de-identified dataset (Safe Harbor method)",
      "clinical_validation": "Reviewed by Dr. Sarah Park, Chief Quality Officer"
    }
  }
}

Deployment and Clinical Integration

Maya deploys the model for daily batch scoring of discharged patients:

Model Deployment: pinnacle_readmission_rf_v2
  ├── Type: Batch scoring (daily at 07:00)
  ├── Input: All discharges from previous 24 hours
  ├── Output: risk_score (0.0-1.0), risk_tier (LOW/MEDIUM/HIGH/VERY_HIGH)
  ├── Thresholds:
  │     LOW:       score < 0.10  (no intervention)
  │     MEDIUM:    0.10 - 0.20   (automated follow-up call)
  │     HIGH:      0.20 - 0.35   (care coordinator outreach within 48h)
  │     VERY_HIGH: score >= 0.35  (immediate care coordinator + home visit)
  └── Alerts: Care coordinators notified via EHR inbox for HIGH/VERY_HIGH

Column-Level Masking Configuration

The model scoring pipeline accesses PHI-adjacent data. Maya configures governance rules to ensure only authorized roles see patient identifiers in the output:

{
  "masking_policy": "readmission_scores_output",
  "table": "ml_output.readmission_risk_scores",
  "rules": [
    {
      "column": "patient_id",
      "mask_type": "visible",
      "roles": ["care_coordinator", "discharge_planner", "attending_physician"],
      "default_mask": "tokenize"
    },
    {
      "column": "patient_name",
      "mask_type": "visible",
      "roles": ["care_coordinator", "discharge_planner"],
      "default_mask": "redact"
    },
    {
      "column": "risk_score",
      "mask_type": "visible",
      "roles": ["care_coordinator", "discharge_planner", "quality_analyst", "data_scientist"],
      "default_mask": "visible"
    },
    {
      "column": "contributing_factors",
      "mask_type": "visible",
      "roles": ["care_coordinator", "attending_physician", "data_scientist"],
      "default_mask": "redact"
    }
  ],
  "audit_trail": {
    "log_every_access": true,
    "phi_access_alert_threshold": 100,
    "retention_days": 2190
  }
}

Stage 7: Feedback

Maya sets up monitoring to track model performance over time using the ML Workbench monitoring capabilities.

Performance Monitoring Dashboard

Readmission Model Performance -- November 2025
================================================

C-Statistic (Discrimination):
  Week 1:  0.723  ████████████████████████████▌
  Week 2:  0.718  ████████████████████████████▎
  Week 3:  0.721  ████████████████████████████▍
  Week 4:  0.715  ████████████████████████████
  Alert threshold: 0.680  ──────────────────────▎

Calibration by Risk Decile:
  Decile 1 (lowest risk):  Predicted 2.1%  Observed 2.3%  OK
  Decile 5 (medium risk):  Predicted 11.4% Observed 12.0% OK
  Decile 10 (highest risk): Predicted 38.2% Observed 36.8% OK

Intervention Outcomes:
  HIGH/VERY_HIGH patients with intervention:    412 scored, 348 contacted
  Readmission rate (intervened):                18.4%
  Readmission rate (not intervened, matched):   26.1%
  Estimated readmissions prevented:             27 patients

Alerting Configuration

{
  "monitor_name": "readmission_model_performance",
  "model": "pinnacle_readmission_rf_v2",
  "alerts": [
    {
      "metric": "c_statistic_weekly",
      "condition": "below",
      "threshold": 0.68,
      "action": "email",
      "recipients": ["maya.chen@pinnaclehealth.org", "quality@pinnaclehealth.org"],
      "message": "Readmission model discrimination dropped below clinical threshold"
    },
    {
      "metric": "calibration_slope",
      "condition": "outside_range",
      "range": [0.8, 1.2],
      "action": "email",
      "recipients": ["maya.chen@pinnaclehealth.org"],
      "message": "Model calibration drift detected -- review for retraining"
    },
    {
      "metric": "feature_drift_psi",
      "condition": "above",
      "threshold": 0.25,
      "action": "slack_and_email",
      "recipients": ["maya.chen@pinnaclehealth.org"],
      "message": "Population Stability Index indicates significant input drift"
    }
  ],
  "schedule": "weekly"
}

Stage 8: Experimentation

Maya runs structured experiments to improve the model and measure clinical impact.

Model Comparison Experiment

Maya compares three modeling approaches head-to-head:

Model	C-Statistic	Sensitivity @20% Alert	PPV @20% Alert	Notes
LACE Index (baseline)	0.65	0.38	0.24	Rule-based, no training needed
Random Forest v2 (current)	0.72	0.48	0.34	Production model
Gradient Boosting (candidate)	0.74	0.52	0.37	Better discrimination, more complex

Social Determinants Feature Experiment

Maya tests whether adding social determinants of health (SDOH) improves prediction:

Experiment: SDOH Feature Addition
=================================

New features tested:
  - area_deprivation_index (ADI) from census tract
  - food_desert_indicator from USDA Food Access Atlas
  - transportation_access_score from community survey
  - housing_instability_flag from social work screening

Results:
  Base model C-statistic:          0.72
  + ADI alone:                     0.73  (+0.01)
  + All SDOH features:             0.74  (+0.02)
  + SDOH + Gradient Boosting:      0.76  (+0.04)

Bias impact:
  White-Black TPR gap (base):      0.03
  White-Black TPR gap (+ SDOH):    0.01  (improved equity)

Decision: Proceed with SDOH + Gradient Boosting for v3

Clinical Impact Study

Maya collaborates with the quality team to measure real-world impact:

Prospective Clinical Impact -- Q3 2025
========================================

Study design: Stepped-wedge cluster randomized
  Intervention: 6 hospitals (model-guided care coordination)
  Control: 6 hospitals (standard discharge process)

Results:
  Intervention hospitals readmission rate:  12.8%  (was 14.4%)
  Control hospitals readmission rate:       14.6%  (was 14.0%)
  Difference-in-differences:                -2.2 percentage points
  Estimated annual readmissions prevented:  ~380 patients
  Estimated CMS penalty reduction:          $3.1M annually

Statistical significance: p = 0.003 (cluster-adjusted)
NNT (number needed to treat): 45 high-risk patients intervened
    to prevent 1 readmission

HIPAA Audit Trail

Every data access in Maya's workflow is logged by the Governance Service:

HIPAA Audit Log -- Dr. Maya Chen (user: mchen@pinnaclehealth.org)
================================================================

2025-11-15 06:00:02  PIPELINE_ACCESS   readmission_feature_pipeline
  Tables: encounters, diagnosis_codes, lab_results, claims
  PHI columns: patient_id, encounter_id (tokenized)
  Purpose: readmission_risk_model_training
  IRB: IRB-2025-0142
  Records accessed: 187,432

2025-11-15 09:14:33  QUERY_EXECUTE     readmission cohort query
  Tables: clinical_unified.encounters, claims_data.claims
  PHI columns: patient_id (masked for analyst role)
  Records returned: 26,615

2025-11-15 14:22:18  MODEL_TRAINING    pinnacle_readmission_rf_v2
  Training data: features.readmission_clinical_v2
  De-identification: Safe Harbor method applied
  Records used: 149,945 (80% train split)

Related Walkthroughs

ML Engineer Journey -- Jordan Park builds the clinical trial matching engine
BI Lead Journey -- Aisha Williams creates the operations command center
Executive Leadership Journey -- Dr. Robert Kim uses AI for clinical strategy
Healthcare Overview -- Pinnacle Health datasets, KPIs, and compliance framework

Industry Overview ML Engineer Journey