3 Sources
[1]
Typos and slang spur AI to discourage seeking medical care
AI models change their medical recommendations when people ask them questions that include colourful language, typos, odd formatting and even gender-neutral pronouns Should you see a doctor about your sore throat? AI's advice may depend on how carefully you typed your question. When artificial intelligence models were tested on simulated writing from would-be patients, they were more likely to advise against seeking medical care if the writer made typos, included emotional or uncertain language - or was female. "Insidious bias can shift the tenor and content of AI advice, and that can lead to subtle but important differences in the direction of the conversation that could lead to disparities in the allocation of resources," says Karandeep Singh at the University of California San Diego, who was not involved in the study. Abinitha Gourabathina and her colleagues used AI to help create thousands of patient notes in different formats and styles. For example, some messages included extra spaces and typos to mimic patients with limited English proficiency or less ease with typing. Other notes used uncertain language in the style of writers with health anxiety, colourful expressions that lent a dramatic or emotional tone or gender-neutral pronouns. The researchers then fed the notes to four large language models (LLMs) commonly used to power chatbots and told the AI to answer questions about whether the patient should manage their condition at home or visit a clinic, and whether the patient should receive certain lab tests and other medical resources. These AI models included OpenAI's GPT-4, Meta's Llama-3-70b and Llama-3-8b, and the Palmyra-Med model developed for the healthcare industry by the AI company Writer. The tests showed that the various format and style changes made all the AI models between 7 and 9 per cent more likely to recommend patients stay home instead of getting medical attention. The models were also more likely to recommend that female patients remain at home, and follow-up research showed they were more likely than human clinicians to change their recommendations for treatments because of gender and language style in the messages. OpenAI and Meta did not respond to a request for comment. Writer does not "recommend or support" using LLMs - including the company's Palmyra-Med model - for clinical decisions or health advice "without a human in the loop", says Zayed Yasin at Writer. Most operational AI tools currently used in electronic health record systems rely on OpenAI's GPT-4o, which was not specifically studied in this research, says Singh. But he said one big takeaway from the study is the need for improved ways to "evaluate and monitor generative AI models" used in the healthcare industry.
[2]
Typos, Slang Trip Up AI Medical Assessments
By Dennis Thompson HealthDay ReporterTHURSDAY, June 26, 2025 (HealthDay News) -- Common human typing errors can trip up artificial intelligence (AI) programs designed to aid health care workers by reviewing health records, a new MIT study says. Typos and extra white spaces can interfere with AI's ability to properly analyze patient records, researchers reported this week at an Association for Computing Machinery conference in Athens, Greece. Missing gender references or the use of slang also can foul up an AI's treatment recommendations, researchers point out. These human mistakes or language choices increased the likelihood that an AI would recommend that a patient self-manage their health problem rather than seek an appointment, results show. They also were more likely to change an AI's treatment recommendations for women, resulting in a higher percentage who were erroneously advised not to seek medical care, researchers add. "These models are often trained and tested on medical exam questions but then used in tasks that are pretty far from that, like evaluating the severity of a clinical case," said lead researcher Abinitha Gourabathina. She's a graduate student with the MIT Department of Electrical Engineering and Computer Science in Cambridge, Mass. A growing body of research is exploring the ability of AI to provide a second opinion for human doctors, researchers said in background notes. The programs already are being used to help doctors draft clinical notes and triage patient messages. This study began when Gourabathina ran experiments in which she swapped gender cues in patient notes, then fed them into an AI. She was surprised to find that simple formatting errors caused meaningful changes in AI responses. To further explore this problem, researchers altered records by swapping or removing gender references, inserting extra space or typos into patient messages, or adding colorful or uncertain language. Colorful language might include exclamations like "wow," or adverbs like "really" or "very," researchers said. Examples of uncertain language include hedge words like "kind of," "sort of," "possibly" or "suppose." The patient notes preserved all clinical data, like prescription medications and previous diagnoses, while adding language that more accurately reflects how people type and speak. "The medical datasets these models are trained on are usually cleaned and structured, and not a very realistic reflection of the patient population," Gourabathina said. "We wanted to see how these very realistic changes in text could impact downstream use cases." The team ran these records past four different AIs, asking whether a patient should manage their symptoms at home, come in for a clinic visit, or get a lab test to better evaluate their condition. When the AIs were fed the altered or "perturbed" data, they were 7% to 9% more likely to recommend that patients care for themselves, results show. The use of colorful language like slang or dramatic expressions had the greatest impact, researchers said. The AI models also made about 7% more errors for female patients and were more likely to recommend that women self-manage at home - even when researchers removed all gender cues from the records. Follow-up research currently under review found that the same changes didn't affect the accuracy of human doctors, researchers added. Researchers plan to continue their work by testing records that better mimic real messages from patients. They also plan to study how AI programs infer gender from clinical tests. Researchers reported their findings at the meeting, which ends today. Findings presented at medical meetings should be considered preliminary until published in a peer-reviewed journal.
[3]
Slang, spelling errors derail AI in medical exams
Common human typing errors can trip up artificial intelligence (AI) programs designed to aid health care workers by reviewing health records, a new MIT study says. Typos and extra white spaces can interfere with AI's ability to properly analyze patient records, researchers reported this week at an Association for Computing Machinery conference in Athens, Greece. Missing gender references or the use of slang also can foul up an AI's treatment recommendations, researchers point out. These human mistakes or language choices increased the likelihood that an AI would recommend that a patient self-manage their health problem rather than seek an appointment, results show. They also were more likely to change an AI's treatment recommendations for women, resulting in a higher percentage who were erroneously advised not to seek medical care, researchers add. "These models are often trained and tested on medical exam questions but then used in tasks that are pretty far from that, like evaluating the severity of a clinical case," said lead researcher Abinitha Gourabathina. She's a graduate student with the MIT Department of Electrical Engineering and Computer Science in Cambridge, Mass. A growing body of research is exploring the ability of AI to provide a second opinion for human doctors, researchers said in background notes. The programs already are being used to help doctors draft clinical notes and triage patient messages. This study began when Gourabathina ran experiments in which she swapped gender cues in patient notes, then fed them into an AI. She was surprised to find that simple formatting errors caused meaningful changes in AI responses. To further explore this problem, researchers altered records by swapping or removing gender references, inserting extra space or typos into patient messages, or adding colorful or uncertain language. Colorful language might include exclamations like "wow," or adverbs like "really" or "very," researchers said. Examples of uncertain language include hedge words like "kind of," "sort of," "possibly" or "suppose." The patient notes preserved all clinical data, like prescription medications and previous diagnoses, while adding language that more accurately reflects how people type and speak. "The medical datasets these models are trained on are usually cleaned and structured, and not a very realistic reflection of the patient population," Gourabathina said. "We wanted to see how these very realistic changes in text could impact downstream use cases." The team ran these records past four different AIs, asking whether a patient should manage their symptoms at home, come in for a clinic visit, or get a lab test to better evaluate their condition. When the AIs were fed the altered or "perturbed" data, they were 7% to 9% more likely to recommend that patients care for themselves, results show. The use of colorful language like slang or dramatic expressions had the greatest impact, researchers said. The AI models also made about 7% more errors for female patients and were more likely to recommend that women self-manage at home - even when researchers removed all gender cues from the records. Follow-up research currently under review found that the same changes didn't affect the accuracy of human doctors, researchers added. Researchers plan to continue their work by testing records that better mimic real messages from patients. They also plan to study how AI programs infer gender from clinical tests. Researchers reported their findings at the meeting, which ends today. Findings presented at medical meetings should be considered preliminary until published in a peer-reviewed journal.
Share
Copy Link
A new MIT study shows that AI models used in healthcare can be influenced by typos, slang, and gender, potentially leading to biased medical recommendations.
A recent study conducted by researchers at the Massachusetts Institute of Technology (MIT) has uncovered significant biases in artificial intelligence (AI) models used for medical advice. The research, presented at an Association for Computing Machinery conference in Athens, Greece, reveals that AI recommendations can be substantially influenced by factors such as typos, slang, and even the gender of the patient 1.
Source: New Scientist
The study, led by graduate student Abinitha Gourabathina from MIT's Department of Electrical Engineering and Computer Science, found that when AI models were presented with patient notes containing typos, extra spaces, or colloquial language, they were 7% to 9% more likely to recommend self-care rather than seeking medical attention 2. This shift in recommendations occurred despite the preservation of all relevant clinical data in the patient records.
Perhaps more concerning is the discovery of gender bias in AI recommendations. The study found that AI models made about 7% more errors for female patients and were more likely to advise women to self-manage their conditions at home. This bias persisted even when researchers removed explicit gender cues from the patient records 3.
Researchers created thousands of simulated patient notes using AI, incorporating various writing styles and formats. These notes were then fed into four large language models (LLMs) commonly used in chatbots, including OpenAI's GPT-4, Meta's Llama-3-70b and Llama-3-8b, and the healthcare-specific Palmyra-Med model developed by AI company Writer 1.
Karandeep Singh from the University of California San Diego, who was not involved in the study, emphasized the potential consequences of these biases: "Insidious bias can shift the tenor and content of AI advice, and that can lead to subtle but important differences in the direction of the conversation that could lead to disparities in the allocation of resources" 1.
The MIT team plans to continue their research by testing records that more closely mimic real patient messages and studying how AI programs infer gender from clinical tests 2. The findings underscore the need for improved evaluation and monitoring of generative AI models in healthcare, as highlighted by Singh 1.
As AI continues to play an increasing role in healthcare, from drafting clinical notes to triaging patient messages, these biases could have significant implications for patient care and resource allocation. The study serves as a crucial reminder of the importance of rigorous testing and continuous improvement of AI systems in healthcare to ensure equitable and accurate medical advice for all patients.
NVIDIA CEO Jensen Huang confirms the development of the company's most advanced AI architecture, 'Rubin', with six new chips currently in trial production at TSMC.
2 Sources
Technology
17 hrs ago
2 Sources
Technology
17 hrs ago
Databricks, a leading data and AI company, is set to acquire machine learning startup Tecton to bolster its AI agent offerings. This strategic move aims to improve real-time data processing and expand Databricks' suite of AI tools for enterprise customers.
3 Sources
Technology
17 hrs ago
3 Sources
Technology
17 hrs ago
Google is providing free users of its Gemini app temporary access to the Veo 3 AI video generation tool, typically reserved for paying subscribers, for a limited time this weekend.
3 Sources
Technology
9 hrs ago
3 Sources
Technology
9 hrs ago
Broadcom's stock rises as the company capitalizes on the AI boom, driven by massive investments from tech giants in data infrastructure. The chipmaker faces both opportunities and challenges in this rapidly evolving landscape.
2 Sources
Technology
17 hrs ago
2 Sources
Technology
17 hrs ago
Apple is set to introduce new enterprise-focused AI tools, including ChatGPT configuration options and potential support for other AI providers, as part of its upcoming software updates.
2 Sources
Technology
17 hrs ago
2 Sources
Technology
17 hrs ago