Overview
- The research published in BMC Medical Informatics and Decision Making is the first to quantify gender bias in LLM-generated care notes using 29,616 paired summaries from 617 real adult social care records.
- Google’s Gemma model systematically used terms indicating severity (“disabled,” “unable,” “complex”) more often for male cases than identical female cases, while Meta’s Llama 3 showed no such disparities.
- In response to the peer-reviewed findings, Google said it will examine the results and noted that Gemma has since advanced to newer generations not evaluated in the study.
- It remains unclear which AI models are deployed by more than half of England’s local councils and how their summaries influence social workers’ decisions.
- The LSE paper recommends that regulators mandate bias measurement for LLMs in long-term care to ensure algorithmic fairness and equal access to services.