Overview
- Researchers evaluated four standard deep-learning models on 28,732 whole-slide images from 14,456 patients across 20 cancer types and eight datasets.
- Performance disparities appeared in about 29% of diagnostic tasks across self-reported race, gender, and age groups, with weaker lung subtype calls in African American and male patients and breast subtype calls in younger patients.
- The team traced bias to uneven training representation, differences in disease incidence across groups, and models learning molecular signals that act as demographic proxies.
- A contrastive-learning framework dubbed FAIR-Path reduced measured diagnostic disparities by roughly 88%, with external validation showing a 91.1% reduction across 15 independent cohorts.
- The work, supported in part by federal funding, now moves to international collaborations to test generalizability, extend FAIR-Path to low-sample settings, and study links between AI bias and clinical outcomes.