2 Sources
2 Sources
[1]
AI summaries can downplay medical issues for female patients, UK research finds
The latest example of bias permeating artificial intelligence comes from the medical field. A new surveyed real case notes from 617 adult social care workers in the UK and found that when large language models summarized the notes, they were more likely to omit language such as "disabled," "unable" or "complex" when the patient was tagged as female, which could lead to women receiving insufficient or inaccurate medical care. Research led by the London School of Economics and Political Science ran the same case notes through two LLMs -- Meta's Llama 3 and Google's Gemma -- and swapped the patient's gender, and the AI tools often provided two very different patient snapshots. While Llama 3 showed no gender-based differences across the surveyed metrics, Gemma had significant examples of this bias. Google's AI summaries produced disparities as drastic as "Mr Smith is an 84-year-old man who lives alone and has a complex medical history, no care package and poor mobility" for a male patient, while the same case notes with credited to a female patient provided: "Mrs Smith is an 84-year-old living alone. Despite her limitations, she is independent and able to maintain her personal care." Recent research has uncovered biases against women in the medical sector, both in and in . The stats also trend worse for and for the . It's the latest stark reminder that LLMs are only as good as the information they are trained on and the . The particularly concerning takeaway from this research was that UK authorities have been using LLMs in care practices, but without always detailing which models are being introduced or in what capacity. "We know these models are being used very widely and what's concerning is that we found very meaningful differences between measures of bias in different models," lead author Dr. Sam Rickman said, noting that the Google model was particularly likely to dismiss mental and physical health issues for women. "Because the amount of care you get is determined on the basis of perceived need, this could result in women receiving less care if biased models are used in practice. But we don't actually know which models are being used at the moment."
[2]
AI tools used by English councils downplay women's health issues, study finds
Exclusive: LSE research finds risk of gender bias in care decisions made based on AI summaries of case notes Artificial intelligence tools used by more than half of England's councils are downplaying women's physical and mental health issues and risk creating gender bias in care decisions, research has found. The study found that when using Google's AI tool "Gemma" to generate and summarise the same case notes, language such as "disabled", "unable" and "complex" appeared significantly more often in descriptions of men than women. The study, by the London School of Economics and Political Science (LSE), also found that similar care needs in women were more likely to be omitted or described in less serious terms. Dr Sam Rickman, the lead author of the report and a researcher in LSE's Care Policy and Evaluation Centre, said AI could result in "unequal care provision for women". "We know these models are being used very widely and what's concerning is that we found very meaningful differences between measures of bias in different models," he said. "Google's model, in particular, downplays women's physical and mental health needs in comparison to men's. "And because the amount of care you get is determined on the basis of perceived need, this could result in women receiving less care if biased models are used in practice. But we don't actually know which models are being used at the moment." AI tools are increasingly being used by local authorities to ease the workload of overstretched social workers, although there is little information about which specific AI models are being used, how frequently and what impact this has on decision-making. The LSE research used real case notes from 617 adult social care users, which were inputted into different large language models (LLMs) multiple times, with only the gender swapped. Researchers then analysed 29,616 pairs of summaries to see how male and female cases were treated differently by the AI models. In one example, the Gemma model summarised a set of case notes as: "Mr Smith is an 84-year-old man who lives alone and has a complex medical history, no care package and poor mobility." The same case notes inputted into the same model, with the gender swapped, summarised the case as: "Mrs Smith is an 84-year-old living alone. Despite her limitations, she is independent and able to maintain her personal care." In another example, the case summary said Mr Smith was "unable to access the community", but Mrs Smith was "able to manage her daily activities". Among the AI models tested, Google's Gemma created more pronounced gender-based disparities than others. Meta's Llama 3 model did not use different language based on gender, the research found. Rickman said the tools were "already being used in the public sector, but their use must not come at the expense of fairness". "While my research highlights issues with one model, more are being deployed all the time, making it essential that all AI systems are transparent, rigorously tested for bias and subject to robust legal oversight," he said. The paper concludes that regulators "should mandate the measurement of bias in LLMs used in long-term care" in order to prioritise "algorithmic fairness". There have long been concerns about racial and gender biases in AI tools, as machine learning techniques have been found to absorb biases in human language. One US study analysed 133 AI systems across different industries and found that about 44% showed gender bias and 25% exhibited gender and racial bias. According to Google, its teams will examine the findings of the report. Its researchers tested the first generation of the Gemma model, which is now in its third generation and is expected to perform better, although it has never been stated the model should be used for medical purposes.
Share
Share
Copy Link
A study by the London School of Economics finds that AI tools used in social care can downplay health issues for female patients, potentially leading to inadequate medical care.
A recent study conducted by the London School of Economics and Political Science (LSE) has uncovered a concerning trend in artificial intelligence (AI) tools used for summarizing medical case notes. The research found that these AI systems, particularly Google's Gemma model, tend to downplay health issues for female patients, potentially leading to inadequate medical care
1
.Source: engadget
The LSE research team, led by Dr. Sam Rickman, analyzed real case notes from 617 adult social care users in the UK. These notes were processed through different large language models (LLMs), including Meta's Llama 3 and Google's Gemma, with only the patient's gender swapped. The study examined 29,616 pairs of summaries to identify how male and female cases were treated differently by the AI models
2
.The research revealed that when using Google's Gemma model, language such as "disabled," "unable," and "complex" appeared significantly more often in descriptions of men than women. For instance, the same case notes summarized for a male patient as "Mr Smith is an 84-year-old man who lives alone and has a complex medical history, no care package and poor mobility" were described for a female patient as "Mrs Smith is an 84-year-old living alone. Despite her limitations, she is independent and able to maintain her personal care"
1
2
.Dr. Rickman expressed concern about these findings, stating, "Because the amount of care you get is determined on the basis of perceived need, this could result in women receiving less care if biased models are used in practice"
1
. This bias in AI summaries could lead to unequal care provision for women, as their health needs may be underestimated or overlooked2
.The study highlights that AI tools are being used by more than half of England's councils to ease the workload of overstretched social workers. However, there is little information about which specific AI models are being used, how frequently, and what impact this has on decision-making
2
.Related Stories
Researchers emphasize the need for transparency and rigorous testing of AI systems used in healthcare. Dr. Rickman stated, "While my research highlights issues with one model, more are being deployed all the time, making it essential that all AI systems are transparent, rigorously tested for bias and subject to robust legal oversight"
2
.This study adds to the growing body of evidence showing biases in AI systems across various industries. A US study analyzing 133 AI systems found that about 44% showed gender bias and 25% exhibited both gender and racial bias
2
. These findings underscore the importance of addressing biases in AI, particularly in critical sectors like healthcare.The paper concludes by recommending that regulators "should mandate the measurement of bias in LLMs used in long-term care" to prioritize "algorithmic fairness"
2
. As AI continues to play an increasingly significant role in healthcare and social services, ensuring unbiased and equitable treatment for all patients remains a crucial challenge for developers, policymakers, and healthcare providers alike.Summarized by
Navi