OpenAI's Whisper AI Transcription Tool Raises Concerns in Healthcare Settings

24 Sources

[1]

AI Model Used By Hospitals Caught Making Up Details About Patients, Inventing Nonexistent Medications and Sexual Acts

In a new investigation from The Associated Press, dozens of experts have found that Whisper, an AI-powered transcription tool made by OpenAI, is plagued with frequent hallucinations and inaccuracies, with the AI model often inventing completely unrelated text. What's even more concerning, though, is who's relying on the tech, according to the AP: despite OpenAI warning that its model shouldn't be used in "high-risk domains," over 30,000 medical workers and 40 health systems are using Nabla, a tool built on Whisper, to transcribe and summarize patient interactions -- almost certainly with inaccurate results. In a medical environment, this could have "really grave consequences," Alondra Nelson, a professor at the Institute for Advanced Study, told the AP. "Nobody wants a misdiagnosis," Nelson said. "There should be a higher bar." Nabla chief technology officer Martin Raison told the AP that the tool was fine-tuned on medical language. Even so, it can't escape the inherent unreliability of its underlying model. One machine learning engineer who spoke to the AP said he discovered hallucinations in half of the over 100 hours of Whisper transcriptions he looked at. Another who examined 26,000 transcripts said he found hallucinations in almost all of them. Whisper performed poorly even with well-recorded, short audio samples, according to a recent study cited by the AP. Over millions of recordings, there could be tens of thousands of hallucinations, researchers warned. Another team of researchers revealed just how egregious these errors can be. Whisper would inexplicably add racial commentary, they found, such as making up a person's race without instruction, and also invent nonexistent medications. In other cases, the AI would describe violent and sexual acts that had no basis in the original speech. They even found baffling instances of YouTuber lingo, such as a "like and subscribe," being dropped into the transcript. Overall, nearly 40 percent of these errors were harmful or concerning, the team concluded, because they could easily misrepresent what the speaker had actually said. The scope of the damage could be immense. According to Nabla, its tool has been used to transcribe an estimated seven million medical visits, the paperwork for all of which could now have pernicious inaccuracies somewhere in the mix. And worryingly, there's no way to verify if the AI transcriptions are accurate, because the tool deletes the original audio recordings "for data safety reasons," according to Raison. Unless the medical workers themselves kept a copy of the recording, any hallucinations will stand as part of the official record. "You can't catch errors if you take away the ground truth," William Saunders, a research engineer who quit OpenAI in protest, told the AP. Nabla officials said they are aware that Whisper can hallucinate and are addressing the problem, per the AP. Being "aware" of the problem, however, seemingly didn't stop the company from pushing experimental and as yet extremely unreliable tech onto the medical industry in the first place.

[2]

Wired

OpenAI's Transcription Tool Hallucinates. Hospitals Are Using It Anyway

In health care settings, it's important to be precise. That's why the widespread use of OpenAI's Whisper transcription tool among medical workers has experts alarmed. On Saturday, an Associated Press investigation revealed that OpenAI's Whisper transcription tool creates fabricated text in medical and business settings despite warnings against such use. The AP interviewed more than 12 software engineers, developers, and researchers who found the model regularly invents text that speakers never said, a phenomenon often called a "confabulation" or "hallucination" in the AI field. Upon its release in 2022, OpenAI claimed that Whisper approached "human level robustness" in audio transcription accuracy. However, a University of Michigan researcher told the AP that Whisper created false text in 80 percent of public meeting transcripts examined. Another developer, unnamed in the AP report, claimed to have found invented content in almost all of his 26,000 test transcriptions. The fabrications pose particular risks in health care settings. Despite OpenAI's warnings against using Whisper for "high-risk domains," over 30,000 medical workers now use Whisper-based tools to transcribe patient visits, according to the AP report. The Mankato Clinic in Minnesota and Children's Hospital Los Angeles are among 40 health systems using a Whisper-powered AI copilot service from medical tech company Nabla that is fine-tuned on medical terminology. Nabla acknowledges that Whisper can confabulate, but it also reportedly erases original audio recordings "for data safety reasons." This could cause additional issues, since doctors cannot verify accuracy against the source material. And deaf patients may be highly impacted by mistaken transcripts since they would have no way to know if medical transcript audio is accurate or not. The potential problems with Whisper extend beyond health care. Researchers from Cornell University and the University of Virginia studied thousands of audio samples and found Whisper adding nonexistent violent content and racial commentary to neutral speech. They found that 1 percent of samples included "entire hallucinated phrases or sentences which did not exist in any form in the underlying audio" and that 38 percent of those included "explicit harms such as perpetuating violence, making up inaccurate associations, or implying false authority." In one case from the study cited by AP, when a speaker described "two other girls and one lady," Whisper added fictional text specifying that they "were Black." In another, the audio said, "He, the boy, was going to, I'm not sure exactly, take the umbrella." Whisper transcribed it to, "He took a big piece of a cross, a teeny, small piece ... I'm sure he didn't have a terror knife so he killed a number of people." An OpenAI spokesperson told the AP that the company appreciates the researchers' findings and that it actively studies how to reduce fabrications and incorporates feedback in updates to the model. The key to Whisper's unsuitability in high-risk domains comes from its propensity to sometimes confabulate, or plausibly make up, inaccurate outputs. The AP report says, "Researchers aren't certain why Whisper and similar tools hallucinate," but that isn't true. We know exactly why Transformer-based AI models like Whisper behave this way. Whisper is based on technology that is designed to predict the next most likely token (chunk of data) that should appear after a sequence of tokens provided by a user. In the case of ChatGPT, the input tokens come in the form of a text prompt. In the case of Whisper, the input is tokenized audio data.

[3]

TechSpot

OpenAI transcription tool widely used by doctors and hospitals raises concerns over hallucinations

Facepalm: It is no secret that generative AI is prone to hallucinations, but as these tools make their way into medical settings, alarm bells are ringing. Even OpenAI warns against using its transcription tool in high-risk settings. OpenAI's AI-powered transcription tool, Whisper, has come under fire for a significant flaw: its tendency to generate fabricated text, known as hallucinations. Despite the company's claims of "human level robustness and accuracy," experts interviewed by the Associated Press have identified numerous instances where Whisper invents entire sentences or adds non-existent content to transcriptions. The issue is particularly concerning given Whisper's widespread use across various industries. The tool is employed for translating and transcribing interviews, generating text for consumer technologies, and creating video subtitles. Perhaps most alarming is the rush by medical centers to implement Whisper-based tools for transcribing patient consultations, even though OpenAI has given explicit warnings against using the tool in "high-risk domains." Instead, the medical sector has embraced Whisper-based tools. Nabla, a company with offices in France and the US, has developed a Whisper-based tool used by over 30,000 clinicians and 40 health systems, including the Mankato Clinic in Minnesota and Children's Hospital Los Angeles. Martin Raison, Nabla's chief technology officer, said their tool has been fine-tuned on medical language to transcribe and summarize patient interactions. However, the company erases the original audio for "data safety reasons," making comparing the AI-generated transcript to the original recording impossible. So far, the tool has been used to transcribe an estimated 7 million medical visits, according to Nabla. Using AI transcription tools in medical settings has also raised privacy concerns. California state lawmaker Rebecca Bauer-Kahan shared her experience refusing to sign a form allowing her child's doctor to share consultation audio with vendors, including Microsoft Azure. "The release was very specific that for-profit companies would have the right to have this," she told the Associated Press. "I was like 'absolutely not.'" The extent of Whisper's hallucination issue is not fully known, but researchers and engineers have reported numerous instances of the problem in their work. One University of Michigan researcher observed them in 80 percent of public meeting transcriptions examined. A machine learning engineer encountered hallucinations in approximately half of over 100 hours of Whisper transcriptions analyzed, while another developer found them in nearly all 26,000 transcripts created using the tool. A study conducted by Professor Allison Koenecke of Cornell University and Assistant Professor Mona Sloane of the University of Virginia examined thousands of short audio snippets, discovering that nearly 40 percent of the hallucinations were deemed harmful or concerning due to potential misinterpretation or misrepresentation of speakers. Examples of these hallucinations include adding violent content where none existed in the original audio, inventing racial commentary not present in the original speech, and the creation of non-existent medical treatments. In one instance, Whisper transformed a simple statement about a boy taking an umbrella into a violent scenario involving a cross and a knife. In another case, the tool added racial descriptors to a neutral statement about people. Whisper also fabricated a fictional medication called "hyperactivated antibiotics" in one of its transcriptions. Such mistakes could have "really grave consequences," especially in hospital settings, said Alondra Nelson, who led the White House Office of Science and Technology Policy for the Biden administration until last year. "Nobody wants a misdiagnosis," said Nelson, a professor at the Institute for Advanced Study in Princeton, New Jersey. "There should be a higher bar." Whisper's influence extends far beyond OpenAI. The tool is integrated into some versions of ChatGPT and is offered as a built-in service on Oracle and Microsoft's cloud computing platforms. In just one month, a recent version of Whisper was downloaded over 4.2 million times from the open-source AI platform HuggingFace. Critics say that OpenAI needs to address this flaw immediately. "This seems solvable if the company is willing to prioritize it," William Saunders, a former OpenAI engineer who left the company in February over concerns about its direction, said. "It's problematic if you put this out there and people are overconfident about what it can do and integrate it into all these other systems."

[4]

Ars Technica

Hospitals adopt error-prone AI transcription tools despite warnings

OpenAI's Whisper tool may add fake text to medical transcripts, investigation finds. On Saturday, an Associated Press investigation revealed that OpenAI's Whisper transcription tool creates fabricated text in medical and business settings despite warnings against such use. The AP interviewed more than 12 software engineers, developers, and researchers who found the model regularly invents text that speakers never said, a phenomenon often called a "confabulation" or "hallucination" in the AI field. Upon its release in 2022, OpenAI claimed that Whisper approached "human level robustness" in audio transcription accuracy. However, a University of Michigan researcher told the AP that Whisper created false text in 80 percent of public meeting transcripts examined. Another developer, unnamed in the AP report, claimed to have found invented content in almost all of his 26,000 test transcriptions. The fabrications pose particular risks in health care settings. Despite OpenAI's warnings against using Whisper for "high-risk domains," over 30,000 medical workers now use Whisper-based tools to transcribe patient visits, according to the AP report. The Mankato Clinic in Minnesota and Children's Hospital Los Angeles count among 40 health systems using a Whisper-powered AI copilot service from medical tech company Nabla that is fine-tuned on medical terminology. Nabla acknowledges that Whisper can confabulate, but it also reportedly erases original audio recordings "for data safety reasons." This could cause additional issues since doctors cannot verify accuracy against the source material. And deaf patients may be highly impacted by mistaken transcripts since they would have no way to know if medical transcript audio is accurate or not. The potential problems with Whisper extend beyond health care. Researchers from Cornell University and the University of Virginia studied thousands of audio samples and found Whisper adding non-existent violent content and racial commentary to neutral speech. They found that 1 percent of samples included "entire hallucinated phrases or sentences which did not exist in any form in the underlying audio" and 38 percent of those included "explicit harms such as perpetuating violence, making up inaccurate associations, or implying false authority."

[5]

engadget

OpenAI's Whisper invents parts of transcriptions -- a lot

Imagine going to the doctor, telling them exactly how you're feeling and then a transcription later adds false information and alters your story. That could be the case in medical centers that use Whisper, OpenAI's transcription tool. Over a dozen developers, software engineers and academic researchers have found evidence that Whisper creates hallucinations -- invented text -- that includes made up medications, racial commentary and violent remarks, ABC News reports. Yet, in the last month, open-source AI platform HuggingFace saw 4.2 million downloads of Whisper's latest version. The tool is also built into Oracle and Microsoft's cloud computing platforms, along with some versions of ChatGPT. The harmful evidence is quite extensive, with experts finding significant faults with Whisper across the board. Take a University of Michigan researcher who found invented text in eight out of ten audio transcriptions of public meetings. In another study, computer scientists found 187 hallucinations while analyzing over 13,000 audio recordings. The trend continues: A machine learning engineer found them in about half of 100 hours-plus worth of transcriptions, while a developer spotted hallucinations in almost all of the 26,000 transcriptions he had Whisper create. The potential danger becomes even clearer when looking at specific examples of these hallucinations. Two professors, Allison Koenecke and Mona Sloane of Cornell University and the University of Virginia, respectively, looked at clips from a research repository called TalkBank. The pair found that nearly 40 percent of the hallucinations had the potential to be misinterpreted or misrepresented. In one case, Whisper invented that three people discussed were Black. In another, Whisper changed "He, the boy, was going to, I'm not sure exactly, take the umbrella." to "He took a big piece of a cross, a teeny, small piece ... I'm sure he didn't have a terror knife so he killed a number of people." Whisper's hallucinations also have risky medical implications. A company called Nabla utilizes Whisper for its medical transcription tool, used by over 30,000 clinicians and 40 health systems -- so far transcribing an estimated seven million visits. Though the company is aware of the issue and claims to be addressing it, there is currently no way to check the validity of the transcripts. The tool erases all audio for "data safety reasons," according to Nabla's chief technology officer Martin Raison. The company also claims that providers must quickly edit and approve the transcriptions (with all the extra time doctors have?), but that this system may change. Meanwhile, no one else can confirm the transcriptions are accurate because of privacy laws.

[6]

Tom's Guide

OpenAI's Whisper model is reportedly 'hallucinating' in high-risk situations

The tool has widespread adoption in US hospitals, but researchers are warning of potential harms Researchers have found that OpenAI's audio-powered transcription tool, Whisper, is inventing things that were never said with potentially dangerous consequences, according to a new report. As per APNews, the AI model is inventing text (commonly referred to as a 'hallucination'), where the large language model spots patterns that don't exist in its own training material, thus creating nonsensical outputs. US Researchers have found that Whisper's mistakes can include racial commentary, violence and fantasised medical treatments. Whisper is integrated with some versions of ChatGPT, and is a built-in offering in Microsoft and Oracle's cloud computing platforms. Microsoft has stated that the tool is not intended for high-risk use, though healthcare providers are starting to adopt the tool to transcribe patient consultations with doctors. Whisper is claimed to have "near human level robustness and accuracy" by its maker, and has supposedly been adopted by over 30,000 US clinicians across 40 health systems. However, researchers are warning against the adoption, with problems found in different studies. In a study of public meetings, a University of Michigan researcher found Whisper hallucinating in eight of every 10 audio transcriptions inspected. Meanwhile, a machine learning engineer discovered hallucinations in about half of over 100 hours of transcriptions and a third developer found hallucinations in nearly every one of the 26,000 transcripts he created with Whisper. In the past month, Whisper was downloaded over 4.2 million times from the open-source AI platform HuggingFace, with the tool being the most popular speech recognition model on the website. Analysing material from TalkBank, a repository hosted at Carnegie Mellon University, researchers determined that 40% of the hallucinations Whisper was producing could be harmful as the speaker was "misinterpreted or misrepresented". In AP examples of such snippets, one speaker described "two other girls and one lady", and Whisper invented commentary on race, noting "two other girls and one lady, um, which were Black". In another example, the tool created a fictional medication known as "hyperactivated antibiotics". Mistakes like those found could have "really grave consequences," especially within healthcare settings, Princeton Professor Alondra Nelson told AP as "nobody wants a misdiagnosis". There are calls for OpenAI to address the issue, as ex-employee William Saunders told AP that "it's problematic if you put this out there and people are overconfident about what it can do and integrate it into all these other systems". While it's expected by many users that AI tools will make mistakes or misspell words, researchers have found that other programs make mistakes just as much as Whisper. Google's AI Overviews was met with criticism earlier this year when it suggested using non-toxic glue to keep cheese from falling off pizza, citing a sarcastic Reddit comment as a source. Apple CEO Tim Cook admitted in an interview that AI hallucinations could be an issue in future products, including the Apple Intelligence suite. Cook told the Washington Post that his confidence level wasn't 100% on whether the tools might hallucinate. "I think we have done everything that we know to do, including thinking very deeply about the readiness of the technology in the areas that we're using it in," Cook said. Despite this, companies are furthering the development of AI tools and programs, with hallucinations, much like Whisper's inventions, continuing to be a prevalent issue. As for OpenAI's response to hallucinations, it has recommended against using Whisper in "decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes".

[7]

Entrepreneur

Doctors Are Using AI to Transcribe Conversations With Patients. But Researchers Say the Tool Is Hallucinating 'Entire' Sentences.

ChatGPT-maker OpenAI introduced Whisper two years ago as an AI tool that transcribes speech to text. Now, the tool is used by AI healthcare company Nabla and its 45,000 clinicians to help transcribe medical conversations across over 85 organizations, like the University of Iowa Health Care. However, new research shows that Whisper has been "hallucinating," or adding statements that no one has said, into transcripts of conversations, raising the question of how quickly medical facilities should adopt AI if it yields errors. According to the Associated Press, a University of Michigan researcher found hallucinations in 80% of Whisper transcriptions. An unnamed developer found hallucinations in half of more than 100 hours of transcriptions. Another engineer found inaccuracies in almost all of the 26,000 transcripts they generated with Whisper. Faulty transcriptions of conversations between doctors and patients could have "really grave consequences," Alondra Nelson, professor at the Institute for Advanced Study in Princeton, NJ, told AP. "Nobody wants a misdiagnosis," Nelson stated. Related: AI Isn't 'Revolutionary Change,' and Its Benefits Are 'Exaggerated,' Says MIT Economist Earlier this year, researchers at Cornell University, New York University, the University of Washington, and the University of Virginia published a study that tracked how many times OpenAI's Whisper speech-to-text service hallucinated when it had to transcribe 13,140 audio segments with an average length of 10 seconds. The audio was sourced from TalkBank's AphasiaBank, a database featuring the voices of people with aphasia, a language disorder that makes it difficult to communicate. The researchers found 312 instances of "entire hallucinated phrases or sentences, which did not exist in any form in the underlying audio" when they ran the experiment in the spring of 2023. Related: Google's New AI Search Results Are Already Hallucinating -- Telling Users to Eat Rocks and Make Pizza Sauce With Glue Among the hallucinated transcripts, 38% contained harmful language, like violence or stereotypes, that did not match the context of the conversation. "Our work demonstrates that there are serious concerns regarding Whisper's inaccuracy due to unpredictable hallucinations," the researchers wrote. The researchers say that the study could also mean a hallucination bias in Whisper, or a tendency for it to insert inaccuracies more often for a particular group -- and not just for people with aphasia. "Based on our findings, we suggest that this kind of hallucination bias could also arise for any demographic group with speech impairments yielding more disfluencies (such as speakers with other speech impairments like dysphonia [disorders of the voice], the very elderly, or non-native language speakers)," the researchers stated. Related: OpenAI Reportedly Used More Than a Million Hours of YouTube Videos to Train Its Latest AI Model Whisper has transcribed seven million medical conversations through Nabla, per The Verge.

[8]

PC Magazine

OpenAI's Whisper Experiencing 'AI Hallucinations' Despite High-Risk Applications

OpenAI's new AI audio transcription tool Whisper is having frequent "AI hallucinations", despite its rapid adoption in "high-risk industries" like healthcare, AP News reports. AI hallucination is where a large language model (LLM) spots patterns that don't exist, creating outputs that can be nonsensical or downright ridiculous. Whisper allegedly has invented text that includes "racial commentary, violent rhetoric and even imagined medical treatments" according to the experts who spoke to AP News. Though it is widely accepted that AI transcription tools will make at least some typos, the engineers and researchers said they had never seen another AI-powered transcription tool hallucinate to the same extent as Whisper. A University of Michigan researcher claimed he found hallucinations in eight out of every 10 audio transcriptions he studied. Microsoft has publicly stated that the tool is not intended for high-risk use cases, but the reports come as many healthcare providers have begun adopting Whisper for transcription. AP News alleges that over 30,000 clinicians and 40 health systems, such as the Mankato Clinic in Minnesota and the Children's Hospital Los Angeles, have started using a Whisper-based tool for transcription. Alondra Nelson, professor of social science at Princeton, told AP that these types of mistakes could have "really grave consequences" in a medical setting. "Nobody wants a misdiagnosis," she told the publication. "There should be a higher bar." William Saunders, a research engineer and former OpenAI employee said: "It's problematic if you put this out there and people are overconfident about what it can do and integrate it into all these other systems." But OpenAI certainly isn't the only tech giant whose products have been accused of creating hallucinations. Google's AI Overviews, a feature that provides AI pop-up summaries for websites, was caught advising one X user to add non-toxic glues to their pizza to help the ingredients stick together. Apple has also acknowledged the possibility of AI hallucinations being an issue with its future products.

[9]

Softonic

A transcription AI developed by OpenAI and used in hospitals lies more than it speaks - Softonic

A group of researchers detect an unusual amount of "hallucinations" committed by Whisper Whisper, the transcription tool developed by OpenAI and based on artificial intelligence, is lying a lot. Too much. According to several experts in the technological field, this tool is prone to "hallucinate," meaning it invents text fragments or entire sentences, which can have dire consequences in sectors such as healthcare. A group of researchers and developers consulted by Associated Press explained that Whisper, used in multiple fields, tends to generate erroneous transcriptions that include fabricated content and even racist comments. This becomes even more problematic when its use extends to high-risk areas, such as transcription of medical consultations, where a failure in accuracy could directly affect patients' health. "No one wants a misdiagnosis," stated Alondra Nelson, former director of the White House Office of Science and Technology Policy, who believes that greater reliability should be required for this type of technology. The scope of the problem is considerable. A recent study detected 187 hallucinations in more than 13,000 clear audio fragments, and in another analysis, a researcher from the University of Michigan found hallucinations in 8 out of 10 inspected transcripts. Furthermore, the errors are not limited to long or low-quality fragments; they also occur in short and well-recorded audios. A machine learning engineer claimed to have found errors in half of the 100 hours of transcripts he analyzed. These failures have led experts and advocates for the ethical use of artificial intelligence to call for the U.S. government to regulate the use of these tools. Christian Vogler, a deaf academic from Gallaudet University, points out that Whisper is also used to caption videos for deaf people, who are especially vulnerable to transcription errors. According to experts, resolving these bugs should be a priority for OpenAI before Whisper is implemented in systems and applications that require high precision in their operation.

[10]

Tom's Hardware

Concerns about medical note-taking tool raised after researcher discovers it invents things no one said -- Nabla is powered by OpenAI's Whisper

A good transcription app shouldn't indulge in creative writing. Researchers and engineers using OpenAI's Whisper audio transcription tool have said that it often includes hallucinations in its output, commonly manifested as chunks of text that don't accurately reflect the original recording. According to the Associated Press, a University of Michigan researcher said that he found made-up text in 80% of the AI tool's transcriptions that were inspected, which led to him trying to improve it. AI hallucination isn't a new phenomenon, and researchers have been trying to fix this using different tools like semantic entropy. However, what's troubling is that the Whisper AI audio transcription tool is widely used in medical settings, where mistakes could have deadly consequences. For example, one speaker said, "He, the boy, was going to, I'm not sure exactly, take the umbrella," but Whisper transcribed, "He too a big piece of a cross, a teeny, small piece ... I'm sure he didn't have a terror knife so he killed a number of people." Another recording said, "two other girls and one lady," and the AI tool transcribed this as "two other girls and one lady, um, which were Black." Lastly, one medical-related example showed Whisper writing down "hyperactivated antibiotics" in its output, which do not exist. Despite the above news, Nabla, an ambient AI assistant that helps clinicians transcribe the patient-doctor interaction, and create notes or reports after the visit, still uses Whisper. The company claims that over 45,000 clinicians across 85+ health organizations use the tool, including Children's Hospital Los Angeles and Mankato Clinic in Minnesota. Even though Nabla is based on OpenAI's Whisper, the company's Chief Technology Officer, Martin Raison, says that its tool is fine-tuned in medical language to transcribe and summarize interaction. However, OpenAI recommends against using Whisper for crucial transcriptions, even warning against using it in "decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes." The company behind Nabla says that it's aware of Whisper's tendency to hallucinate and that it's already addressing the problem. However, Raison also said that it cannot compare the AI-generated transcript with the original audio recording, as its tool automatically deletes the original audio for data privacy and safety. Fortunately, there's no recorded complaint yet against a medical provider due to hallucination by their AI notes-taking tools. Even if that's the case, William Saunders, a former OpenAI engineer, said that removing the original recording could be problematic, as the healthcare provider wouldn't be able to verify if the text is correct. "You can't catch errors if you take away the ground truth," he told the Associated Press. Nevertheless, Nabla requires its users to edit and approve transcribed notes. So, if it could deliver the report while the patient is still in the room with the doctor, it would give the healthcare provider a chance to verify the veracity of its results based on their recent memory and even confirm information with the patient if the data delivered by the AI transcription is deemed inaccurate. This shows that AI isn't an infallible machine that gets everything right -- instead, we can think of it as a person who can think quickly, but its output needs to be double-checked every time. AI is certainly a useful tool in many situations, but we can't let it do the thinking for us, at least for now.

[11]

NDTV Gadgets 360

OpenAI's Whisper Transcription Tool Might Be Hallucinating

Every 8 out of 10 Whisper transcriptions reportedly contain hallucination OpenAI released an artificial intelligence (AI) tool dubbed Whisper in 2022, which can transcribe speech to text. However, a report claimed that the AI tool is prone to hallucinations and is adding imaginary text in transcriptions. This is concerning as the tool is said to be used in several high-risk industries such as medicine and accessibility. A particular concern reportedly comes from the use of the tool in doctor-patient consultations, where hallucination can add potentially harmful information and put the patient's life at risk. The Associated Press reported that OpenAI's automatic speech recognition (ASR) system Whisper has a high potential of generating hallucinated text. Citing interviews with multiple software engineers, developers, and academic researchers, the publication claimed that the imaginary text includes racial descriptions, violence, and medical treatments and medications. Hallucination, in the AI parlance, is a major issue which causes AI systems to generate responses which are incorrect or misleading. In the case of Whisper, the AI is said to be inventing text which was never spoken by anyone. In an example verified by the publication, the speaker's sentence, "He, the boy, was going to, I'm not sure exactly, take the umbrella." was changed to "He took a big piece of a cross, a teeny, small piece ... I'm sure he didn't have a terror knife so he killed a number of people." In another instance, Whisper reportedly added racial information without any mention of it. While hallucination is not a new problem in the AI space, this particular tool's issue is more impactful as the open-source technology is being used by several tools that are being used in high-risk industries. Paris-based Nabla, for instance, has created a Whisper-based tool which is reportedly being used by more than 30,000 clinicians and 40 health systems. Nabla's tool has been used to transcribe more than seven million medical visits. To maintain data security, the company also deletes the original recording from its servers. This means if any hallucinated text was generated in these seven million transcriptions, it is impossible to verify and correct them. Another area where the technology is being used is in creating accessibility tools for the deaf and hard-of-hearing community, where again, verifying the accuracy of the tool is significantly difficult. Most of the hallucination is said to be generated from background noises, abrupt pauses, and other environmental sounds. The extent of the issue is also concerning. Citing a researcher, the publication claimed that eight out of every 10 audio transcriptions were found to contain hallucinated text. A developer told the publication that hallucination occurred in "every one of the 26,000 transcripts he created with Whisper." Notably, at the launch of Whisper, OpenAI said that Whisper offers human-level robustness to accents, background noise, and technical language. A company spokesperson told the publication that the AI firm continuously studies ways to reduce hallucinations and has promised to incorporate the feedback in future model updates.

[12]

Lifehacker

OpenAI's Transcriptions Make Up Things You Never Said to Your Doctor

Let's hope your physician isn't transcribing your conversations with Whisper. It's no surprise to those of us following generative artificial intelligence news that AI is imperfect. In fact, generative AI so often spits out untrue, false, and otherwise incorrect outputs that we have a name for it: hallucinations. That's part of the problem with outsourcing so much of our work and tasks to AI in this moment. AI can be used for good, but blindly trusting it to handle important tasks without oversight or fact-checking runs a real risk. We're now seeing the consequences of that play out in concerning ways. The latest high-profile hallucination case concerns Whisper, an AI-powered transcription tool from ChatGPT-maker OpenAI. Whisper is popular: Transcription services frequently tap into the platform to power their tools, which, in turn, are used by many users and customers to make transcribing conversations quicker and easier. On the surface, that's a good thing: Whisper, and the services it enables, has had a positive reputation among users, and the platform is growing in use across industries. However, hallucination is getting in the way. As reported by AP News, researchers and experts are sounding the alarms about Whisper, claiming that not only is it inaccurate, it often makes things up entirely. While all AI is prone to hallucinating, researchers warn that Whisper will report things were said that absolutely were not, including "racial commentary, violent rhetoric and even imagined medical treatments." That's bad enough for those of us who use Whisper for personal use. But the larger concern here is that Whisper has a large base of users in professional industries: Subtitles you see when watching a video online may be generated by Whisper, which could impact the impression that video gives off to users who are deaf or hard of hearing. Important interviews may be transcribed using Whisper-powered tools, which may leave incorrect records of what was actually said. However, the situation garnering the most attention right now is Whisper's use within hospitals and medical centers. Researchers are concerned by the number of doctors and medical professionals that have turned to Whisper tools to transcribe their conversations with patients. Your discussion about your health with your doctor may be recorded, then analyzed by Whisper, only to be transcribed with totally false statements that were never a part of the conversation. This isn't hypothetical, either: Different researchers have each reached similar conclusions by studying the transcriptions of Whisper-powered tools. AP News rounded up some of these results: A University of Michigan researcher discovered hallucinations in eight out of 10 transcriptions made by Whisper; a machine learning engineer found issues with 50% of the transcriptions he investigated; and one researcher found hallucinations in almost all of the 26,000 Whisper transcriptions they produced. A study even found consistent hallucinations when the audio recordings were short and clear. But it's the reporting from Cornell University professors Allison Koenecke and Mona Sloane that offer the most visceral look at the situation: These professors found nearly 40% of the hallucinations they found in transcripts taken from Carnegie Mellon research repository TalkBank were "harmful or concerning," as the speaker could be "misinterpreted or misrepresented." In one example, the speaker said, "He, the boy, was going to, I'm not sure exactly, take the umbrella." The AI added the following to the transcription: "He took a big piece of a cross, a teeny, small piece...I'm sure he didn't have a terror knife so he killed a number of people." In another example, the speaker said, "two other girls and one lady," while the AI turned it into, "two other girls and one lady, um, which were Black." When you take all this into consideration, it seems concerning that over 30,000 clinicians and 40 health systems are currently using Whisper via a tool developed by Nabla. What's worse, you cannot check the transcriptions against the original recordings to identify whether Nabla's tool hallucinated part of the report, as Nabla designed the tool to delete the audio for "data safety reasons." According to the company, around seven million medical visits have used the tool to transcribe conversations. Generative AI as a technology isn't new, but ChatGPT really kicked off its general adoption in late 2022. Since then, companies have raced to build and add AI into their platforms and services. Why wouldn't they? It seemed like the public really liked AI, and, well, generative AI seemed like it could do just about anything. Why not embrace it, and use the "magic" of AI to superpower tasks like transcriptions? We're seeing why at this moment. AI has a lot of potential, but also plenty of downsides. Hallucinations aren't just an occasional annoyance: They're a byproduct of the technology, a flaw built into the fabric of neural networks. We don't totally understand why AI models hallucinate, and that's part of the problem. We're trusting technology with flaws we don't fully understand to handle important work for us, so much so we're deleting the data that could be used to double-check AI's outputs in the name of safety. Personally, I don't feel safe knowing my medical records could contain outright falsehoods, just because my doctor's office decided to employ Nabla's tools in their system.

[13]

AP NEWS

Researchers say an AI-powered transcription tool used in hospitals invents things no one ever said

SAN FRANCISCO (AP) -- Tech behemoth OpenAI has touted its artificial intelligence-powered transcription tool Whisper as having near "human level robustness and accuracy." But Whisper has a major flaw: It is prone to making up chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. Those experts said some of the invented text -- known in the industry as hallucinations -- can include racial commentary, violent rhetoric and even imagined medical treatments. Experts said that such fabrications are problematic because Whisper is being used in a slew of industries worldwide to translate and transcribe interviews, generate text in popular consumer technologies and create subtitles for videos. More concerning, they said, is a rush by medical centers to utilize Whisper-based tools to transcribe patients' consultations with doctors, despite OpenAI' s warnings that the tool should not be used in "high-risk domains." The full extent of the problem is difficult to discern, but researchers and engineers said they frequently have come across Whisper's hallucinations in their work. A University of Michigan researcher conducting a study of public meetings, for example, said he found hallucinations in eight out of every 10 audio transcriptions he inspected, before he started trying to improve the model. A machine learning engineer said he initially discovered hallucinations in about half of the over 100 hours of Whisper transcriptions he analyzed. A third developer said he found hallucinations in nearly every one of the 26,000 transcripts he created with Whisper. The problems persist even in well-recorded, short audio samples. A recent study by computer scientists uncovered 187 hallucinations in over 13,000 clear audio snippets they examined. That trend would lead to tens of thousands of faulty transcriptions over millions of recordings, researchers said. Such mistakes could have "really grave consequences," particularly in hospital settings, said Alondra Nelson, who led the White House Office of Science and Technology Policy for the Biden administration until last year. "Nobody wants a misdiagnosis," said Nelson, a professor at the Institute for Advanced Study in Princeton, New Jersey. "There should be a higher bar." Whisper also is used to create closed captioning for the Deaf and hard of hearing -- a population at particular risk for faulty transcriptions. That's because the Deaf and hard of hearing have no way of identifying fabrications are "hidden amongst all this other text," said Christian Vogler, who is deaf and directs Gallaudet University's Technology Access Program. The prevalence of such hallucinations has led experts, advocates and former OpenAI employees to call for the federal government to consider AI regulations. At minimum, they said, OpenAI needs to address the flaw. "This seems solvable if the company is willing to prioritize it," said William Saunders, a San Francisco-based research engineer who quit OpenAI in February over concerns with the company's direction. "It's problematic if you put this out there and people are overconfident about what it can do and integrate it into all these other systems." An OpenAI spokesperson said the company continually studies how to reduce hallucinations and appreciated the researchers' findings, adding that OpenAI incorporates feedback in model updates. While most developers assume that transcription tools misspell words or make other errors, engineers and researchers said they had never seen another AI-powered transcription tool hallucinate as much as Whisper. The tool is integrated into some versions of OpenAI's flagship chatbot ChatGPT, and is a built-in offering in Oracle and Microsoft's cloud computing platforms, which service thousands of companies worldwide. It is also used to transcribe and translate text into multiple languages. In the last month alone, one recent version of Whisper was downloaded over 4.2 million times from open-source AI platform HuggingFace. Sanchit Gandhi, a machine-learning engineer there, said Whisper is the most popular open-source speech recognition model and is built into everything from call centers to voice assistants. Professors Allison Koenecke of Cornell University and Mona Sloane of the University of Virginia examined thousands of short snippets they obtained from TalkBank, a research repository hosted at Carnegie Mellon University. They determined that nearly 40% of the hallucinations were harmful or concerning because the speaker could be misinterpreted or misrepresented. In an example they uncovered, a speaker said, "He, the boy, was going to, I'm not sure exactly, take the umbrella." But the transcription software added: "He took a big piece of a cross, a teeny, small piece ... I'm sure he didn't have a terror knife so he killed a number of people." A speaker in another recording described "two other girls and one lady." Whisper invented extra commentary on race, adding "two other girls and one lady, um, which were Black." In a third transcription, Whisper invented a non-existent medication called "hyperactivated antibiotics." Researchers aren't certain why Whisper and similar tools hallucinate, but software developers said the fabrications tend to occur amid pauses, background sounds or music playing. OpenAI recommended in its online disclosures against using Whisper in "decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes." That warning hasn't stopped hospitals or medical centers from using speech-to-text models, including Whisper, to transcribe what's said during doctor's visits to free up medical providers to spend less time on note-taking or report writing. Over 30,000 clinicians and 40 health systems, including the Mankato Clinic in Minnesota and Children's Hospital Los Angeles, have started using a Whisper-based tool built by Nabla, which has offices in France and the U.S. That tool was fine tuned on medical language to transcribe and summarize patients' interactions, said Nabla's chief technology officer Martin Raison. Company officials said they are aware that Whisper can hallucinate and are mitigating the problem. It's impossible to compare Nabla's AI-generated transcript to the original recording because Nabla's tool erases the original audio for "data safety reasons," Raison said. Nabla said the tool has been used to transcribe an estimated 7 million medical visits. Saunders, the former OpenAI engineer, said erasing the original audio could be worrisome if transcripts aren't double checked or clinicians can't access the recording to verify they are correct. "You can't catch errors if you take away the ground truth," he said. Nabla said that no model is perfect, and that theirs currently requires medical providers to quickly edit and approve transcribed notes, but that could change. Because patient meetings with their doctors are confidential, it is hard to know how AI-generated transcripts are affecting them. A California state lawmaker, Rebecca Bauer-Kahan, said she took one of her children to the doctor earlier this year, and refused to sign a form the health network provided that sought her permission to share the consultation audio with vendors that included Microsoft Azure, the cloud computing system run by OpenAI's largest investor. Bauer-Kahan didn't want such intimate medical conversations being shared with tech companies, she said. "The release was very specific that for-profit companies would have the right to have this," said Bauer-Kahan, a Democrat who represents part of the San Francisco suburbs in the state Assembly. "I was like 'absolutely not.'" John Muir Health spokesman Ben Drew said the health system complies with state and federal privacy laws. Schellmann reported from New York. This story was produced in partnership with the Pulitzer Center's AI Accountability Network, which also partially supported the academic Whisper study. The Associated Press receives financial assistance from the Omidyar Network to support coverage of artificial intelligence and its impact on society. AP is solely responsible for all content. Find AP's standards for working with philanthropies, a list of supporters and funded coverage areas at AP.org. The Associated Press and OpenAI have a licensing and technology agreement allowing OpenAI access to part of the AP's text archives.

[14]

New York Post

Researchers say an AI-powered transcription tool used in hospitals...

Tech behemoth OpenAI has touted its artificial intelligence-powered transcription tool Whisper as having near "human level robustness and accuracy." But Whisper has a major flaw: It is prone to making up chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. Those experts said some of the invented text -- known in the industry as hallucinations -- can include racial commentary, violent rhetoric and even imagined medical treatments. Experts said that such fabrications are problematic because Whisper is being used in a slew of industries worldwide to translate and transcribe interviews, generate text in popular consumer technologies and create subtitles for videos. More concerning, they said, is a rush by medical centers to utilize Whisper-based tools to transcribe patients' consultations with doctors, despite OpenAI' s warnings that the tool should not be used in "high-risk domains." The full extent of the problem is difficult to discern, but researchers and engineers said they frequently have come across Whisper's hallucinations in their work. A University of Michigan researcher conducting a study of public meetings, for example, said he found hallucinations in 8 out of every 10 audio transcriptions he inspected, before he started trying to improve the model. A machine learning engineer said he initially discovered hallucinations in about half of the over 100 hours of Whisper transcriptions he analyzed. A third developer said he found hallucinations in nearly every one of the 26,000 transcripts he created with Whisper. The problems persist even in well-recorded, short audio samples. A recent study by computer scientists uncovered 187 hallucinations in more than 13,000 clear audio snippets they examined. That trend would lead to tens of thousands of faulty transcriptions over millions of recordings, researchers said. Such mistakes could have "really grave consequences," particularly in hospital settings, said Alondra Nelson, who led the White House Office of Science and Technology Policy for the Biden administration until last year. "Nobody wants a misdiagnosis," said Nelson, a professor at the Institute for Advanced Study in Princeton, New Jersey. "There should be a higher bar." Whisper also is used to create closed captioning for the Deaf and hard of hearing -- a population at particular risk for faulty transcriptions. That's because the Deaf and hard of hearing have no way of identifying fabrications are "hidden amongst all this other text," said Christian Vogler, who is deaf and directs Gallaudet University's Technology Access Program. The prevalence of such hallucinations has led experts, advocates and former OpenAI employees to call for the federal government to consider AI regulations. At minimum, they said, OpenAI needs to address the flaw. "This seems solvable if the company is willing to prioritize it," said William Saunders, a San Francisco-based research engineer who quit OpenAI in February over concerns with the company's direction. "It's problematic if you put this out there and people are overconfident about what it can do and integrate it into all these other systems." An OpenAI spokesperson said the company continually studies how to reduce hallucinations and appreciated the researchers' findings, adding that OpenAI incorporates feedback in model updates. While most developers assume that transcription tools misspell words or make other errors, engineers and researchers said they had never seen another AI-powered transcription tool hallucinate as much as Whisper. The tool is integrated into some versions of OpenAI's flagship chatbot ChatGPT, and is a built-in offering in Oracle and Microsoft's cloud computing platforms, which service thousands of companies worldwide. It is also used to transcribe and translate text into multiple languages. In the last month alone, one recent version of Whisper was downloaded over 4.2 million times from open-source AI platform HuggingFace. Sanchit Gandhi, a machine-learning engineer there, said Whisper is the most popular open-source speech recognition model and is built into everything from call centers to voice assistants. Professors Allison Koenecke of Cornell University and Mona Sloane of the University of Virginia examined thousands of short snippets they obtained from TalkBank, a research repository hosted at Carnegie Mellon University. They determined that nearly 40% of the hallucinations were harmful or concerning because the speaker could be misinterpreted or misrepresented. In an example they uncovered, a speaker said, "He, the boy, was going to, I'm not sure exactly, take the umbrella." But the transcription software added: "He took a big piece of a cross, a teeny, small piece ... I'm sure he didn't have a terror knife so he killed a number of people." A speaker in another recording described "two other girls and one lady." Whisper invented extra commentary on race, adding "two other girls and one lady, um, which were Black." In a third transcription, Whisper invented a non-existent medication called "hyperactivated antibiotics." Researchers aren't certain why Whisper and similar tools hallucinate, but software developers said the fabrications tend to occur amid pauses, background sounds or music playing. OpenAI recommended in its online disclosures against using Whisper in "decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes." That warning hasn't stopped hospitals or medical centers from using speech-to-text models, including Whisper, to transcribe what's said during doctor's visits to free up medical providers to spend less time on note-taking or report writing. Over 30,000 clinicians and 40 health systems, including the Mankato Clinic in Minnesota and Children's Hospital Los Angeles, have started using a Whisper-based tool built by Nabla, which has offices in France and the U.S. That tool was fine tuned on medical language to transcribe and summarize patients' interactions, said Nabla's chief technology officer Martin Raison. Company officials said they are aware that Whisper can hallucinate and are mitigating the problem. It's impossible to compare Nabla's AI-generated transcript to the original recording because Nabla's tool erases the original audio for "data safety reasons," Raison said. Nabla said the tool has been used to transcribe an estimated 7 million medical visits. Saunders, the former OpenAI engineer, said erasing the original audio could be worrisome if transcripts aren't double checked or clinicians can't access the recording to verify they are correct. "You can't catch errors if you take away the ground truth," he said. Nabla said that no model is perfect, and that theirs currently requires medical providers to quickly edit and approve transcribed notes, but that could change. Because patient meetings with their doctors are confidential, it is hard to know how AI-generated transcripts are affecting them. A California state lawmaker, Rebecca Bauer-Kahan, said she took one of her children to the doctor earlier this year, and refused to sign a form the health network provided that sought her permission to share the consultation audio with vendors that included Microsoft Azure, the cloud computing system run by OpenAI's largest investor. Bauer-Kahan didn't want such intimate medical conversations being shared with tech companies, she said. "The release was very specific that for-profit companies would have the right to have this," said Bauer-Kahan, a Democrat who represents part of the San Francisco suburbs in the state Assembly. "I was like 'absolutely not.'" John Muir Health spokesman Ben Drew said the health system complies with state and federal privacy laws.

[15]

Economic Times

Researchers say an AI-powered transcription tool used in hospitals invents things no one ever said

Experts said that such fabrications are problematic because Whisper is being used in a slew of industries worldwide to translate and transcribe interviews, generate text in popular consumer technologies and create subtitles for videos.Tech behemoth OpenAI has touted its artificial intelligence-powered transcription tool Whisper as having near "human level robustness and accuracy." But Whisper has a major flaw: It is prone to making up chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. Those experts said some of the invented text - known in the industry as hallucinations - can include racial commentary, violent rhetoric and even imagined medical treatments. Experts said that such fabrications are problematic because Whisper is being used in a slew of industries worldwide to translate and transcribe interviews, generate text in popular consumer technologies and create subtitles for videos. More concerning, they said, is a rush by medical centers to utilize Whisper-based tools to transcribe patients' consultations with doctors, despite OpenAI' s warnings that the tool should not be used in "high-risk domains." The full extent of the problem is difficult to discern, but researchers and engineers said they frequently have come across Whisper's hallucinations in their work. A University of Michigan researcher conducting a study of public meetings, for example, said he found hallucinations in eight out of every 10 audio transcriptions he inspected, before he started trying to improve the model. A machine learning engineer said he initially discovered hallucinations in about half of the over 100 hours of Whisper transcriptions he analyzed. A third developer said he found hallucinations in nearly every one of the 26,000 transcripts he created with Whisper. The problems persist even in well-recorded, short audio samples. A recent study by computer scientists uncovered 187 hallucinations in over 13,000 clear audio snippets they examined. That trend would lead to tens of thousands of faulty transcriptions over millions of recordings, researchers said. Such mistakes could have "really grave consequences," particularly in hospital settings, said Alondra Nelson, who led the White House Office of Science and Technology Policy for the Biden administration until last year. "Nobody wants a misdiagnosis," said Nelson, a professor at the Institute for Advanced Study in Princeton, New Jersey. "There should be a higher bar." Whisper also is used to create closed captioning for the Deaf and hard of hearing - a population at particular risk for faulty transcriptions. That's because the Deaf and hard of hearing have no way of identifying fabrications are "hidden amongst all this other text," said Christian Vogler, who is deaf and directs Gallaudet University's Technology Access Program. OpenAI urged to address problem The prevalence of such hallucinations has led experts, advocates and former OpenAI employees to call for the federal government to consider AI regulations. At minimum, they said, OpenAI needs to address the flaw. "This seems solvable if the company is willing to prioritize it," said William Saunders, a San Francisco-based research engineer who quit OpenAI in February over concerns with the company's direction. "It's problematic if you put this out there and people are overconfident about what it can do and integrate it into all these other systems." An OpenAI spokesperson said the company continually studies how to reduce hallucinations and appreciated the researchers' findings, adding that OpenAI incorporates feedback in model updates. While most developers assume that transcription tools misspell words or make other errors, engineers and researchers said they had never seen another AI-powered transcription tool hallucinate as much as Whisper. Whisper hallucinations The tool is integrated into some versions of OpenAI's flagship chatbot ChatGPT, and is a built-in offering in Oracle and Microsoft's cloud computing platforms, which service thousands of companies worldwide. It is also used to transcribe and translate text into multiple languages. In the last month alone, one recent version of Whisper was downloaded over 4.2 million times from open-source AI platform HuggingFace. Sanchit Gandhi, a machine-learning engineer there, said Whisper is the most popular open-source speech recognition model and is built into everything from call centers to voice assistants. Professors Allison Koenecke of Cornell University and Mona Sloane of the University of Virginia examined thousands of short snippets they obtained from TalkBank, a research repository hosted at Carnegie Mellon University. They determined that nearly 40% of the hallucinations were harmful or concerning because the speaker could be misinterpreted or misrepresented. In an example they uncovered, a speaker said, "He, the boy, was going to, I'm not sure exactly, take the umbrella." But the transcription software added: "He took a big piece of a cross, a teeny, small piece ... I'm sure he didn't have a terror knife so he killed a number of people." A speaker in another recording described "two other girls and one lady." Whisper invented extra commentary on race, adding "two other girls and one lady, um, which were Black." In a third transcription, Whisper invented a non-existent medication called "hyperactivated antibiotics." Researchers aren't certain why Whisper and similar tools hallucinate, but software developers said the fabrications tend to occur amid pauses, background sounds or music playing. OpenAI recommended in its online disclosures against using Whisper in "decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes." Transcribing doctor appointments That warning hasn't stopped hospitals or medical centers from using speech-to-text models, including Whisper, to transcribe what's said during doctor's visits to free up medical providers to spend less time on note-taking or report writing. Over 30,000 clinicians and 40 health systems, including the Mankato Clinic in Minnesota and Children's Hospital Los Angeles, have started using a Whisper-based tool built by Nabla, which has offices in France and the U.S. That tool was fine tuned on medical language to transcribe and summarize patients' interactions, said Nabla's chief technology officer Martin Raison. Company officials said they are aware that Whisper can hallucinate and are mitigating the problem. It's impossible to compare Nabla's AI-generated transcript to the original recording because Nabla's tool erases the original audio for "data safety reasons," Raison said. Nabla said the tool has been used to transcribe an estimated 7 million medical visits. Saunders, the former OpenAI engineer, said erasing the original audio could be worrisome if transcripts aren't double checked or clinicians can't access the recording to verify they are correct. "You can't catch errors if you take away the ground truth," he said. Nabla said that no model is perfect, and that theirs currently requires medical providers to quickly edit and approve transcribed notes, but that could change. Privacy concerns Because patient meetings with their doctors are confidential, it is hard to know how AI-generated transcripts are affecting them. A California state lawmaker, Rebecca Bauer-Kahan, said she took one of her children to the doctor earlier this year, and refused to sign a form the health network provided that sought her permission to share the consultation audio with vendors that included Microsoft Azure, the cloud computing system run by OpenAI's largest investor. Bauer-Kahan didn't want such intimate medical conversations being shared with tech companies, she said. "The release was very specific that for-profit companies would have the right to have this," said Bauer-Kahan, a Democrat who represents part of the San Francisco suburbs in the state Assembly. "I was like 'absolutely not.'" John Muir Health spokesman Ben Drew said the health system complies with state and federal privacy laws.

[16]

U.S. News & World Report

Researchers Say an AI-Powered Transcription Tool Used in Hospitals Invents Things No One Ever Said

SAN FRANCISCO (AP) -- Tech behemoth OpenAI has touted its artificial intelligence-powered transcription tool Whisper as having near "human level robustness and accuracy." But Whisper has a major flaw: It is prone to making up chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. Those experts said some of the invented text -- known in the industry as hallucinations -- can include racial commentary, violent rhetoric and even imagined medical treatments. Experts said that such fabrications are problematic because Whisper is being used in a slew of industries worldwide to translate and transcribe interviews, generate text in popular consumer technologies and create subtitles for videos. More concerning, they said, is a rush by medical centers to utilize Whisper-based tools to transcribe patients' consultations with doctors, despite OpenAI' s warnings that the tool should not be used in "high-risk domains." The full extent of the problem is difficult to discern, but researchers and engineers said they frequently have come across Whisper's hallucinations in their work. A University of Michigan researcher conducting a study of public meetings, for example, said he found hallucinations in eight out of every 10 audio transcriptions he inspected, before he started trying to improve the model. A machine learning engineer said he initially discovered hallucinations in about half of the over 100 hours of Whisper transcriptions he analyzed. A third developer said he found hallucinations in nearly every one of the 26,000 transcripts he created with Whisper. The problems persist even in well-recorded, short audio samples. A recent study by computer scientists uncovered 187 hallucinations in over 13,000 clear audio snippets they examined. That trend would lead to tens of thousands of faulty transcriptions over millions of recordings, researchers said. Such mistakes could have "really grave consequences," particularly in hospital settings, said Alondra Nelson, who led the White House Office of Science and Technology Policy for the Biden administration until last year. "Nobody wants a misdiagnosis," said Nelson, a professor at the Institute for Advanced Study in Princeton, New Jersey. "There should be a higher bar." Whisper also is used to create closed captioning for the Deaf and hard of hearing -- a population at particular risk for faulty transcriptions. That's because the Deaf and hard of hearing have no way of identifying fabrications are "hidden amongst all this other text," said Christian Vogler, who is deaf and directs Gallaudet University's Technology Access Program. OpenAI urged to address problem The prevalence of such hallucinations has led experts, advocates and former OpenAI employees to call for the federal government to consider AI regulations. At minimum, they said, OpenAI needs to address the flaw. "This seems solvable if the company is willing to prioritize it," said William Saunders, a San Francisco-based research engineer who quit OpenAI in February over concerns with the company's direction. "It's problematic if you put this out there and people are overconfident about what it can do and integrate it into all these other systems." An OpenAI spokesperson said the company continually studies how to reduce hallucinations and appreciated the researchers' findings, adding that OpenAI incorporates feedback in model updates. While most developers assume that transcription tools misspell words or make other errors, engineers and researchers said they had never seen another AI-powered transcription tool hallucinate as much as Whisper. Whisper hallucinations The tool is integrated into some versions of OpenAI's flagship chatbot ChatGPT, and is a built-in offering in Oracle and Microsoft's cloud computing platforms, which service thousands of companies worldwide. It is also used to transcribe and translate text into multiple languages. In the last month alone, one recent version of Whisper was downloaded over 4.2 million times from open-source AI platform HuggingFace. Sanchit Gandhi, a machine-learning engineer there, said Whisper is the most popular open-source speech recognition model and is built into everything from call centers to voice assistants. Professors Allison Koenecke of Cornell University and Mona Sloane of the University of Virginia examined thousands of short snippets they obtained from TalkBank, a research repository hosted at Carnegie Mellon University. They determined that nearly 40% of the hallucinations were harmful or concerning because the speaker could be misinterpreted or misrepresented. In an example they uncovered, a speaker said, "He, the boy, was going to, I'm not sure exactly, take the umbrella." But the transcription software added: "He took a big piece of a cross, a teeny, small piece ... I'm sure he didn't have a terror knife so he killed a number of people." A speaker in another recording described "two other girls and one lady." Whisper invented extra commentary on race, adding "two other girls and one lady, um, which were Black." In a third transcription, Whisper invented a non-existent medication called "hyperactivated antibiotics." Researchers aren't certain why Whisper and similar tools hallucinate, but software developers said the fabrications tend to occur amid pauses, background sounds or music playing. OpenAI recommended in its online disclosures against using Whisper in "decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes." Transcribing doctor appointments That warning hasn't stopped hospitals or medical centers from using speech-to-text models, including Whisper, to transcribe what's said during doctor's visits to free up medical providers to spend less time on note-taking or report writing. Over 30,000 clinicians and 40 health systems, including the Mankato Clinic in Minnesota and Children's Hospital Los Angeles, have started using a Whisper-based tool built by Nabla, which has offices in France and the U.S. That tool was fine tuned on medical language to transcribe and summarize patients' interactions, said Nabla's chief technology officer Martin Raison. Company officials said they are aware that Whisper can hallucinate and are mitigating the problem. It's impossible to compare Nabla's AI-generated transcript to the original recording because Nabla's tool erases the original audio for "data safety reasons," Raison said. Nabla said the tool has been used to transcribe an estimated 7 million medical visits. Saunders, the former OpenAI engineer, said erasing the original audio could be worrisome if transcripts aren't double checked or clinicians can't access the recording to verify they are correct. "You can't catch errors if you take away the ground truth," he said. Nabla said that no model is perfect, and that theirs currently requires medical providers to quickly edit and approve transcribed notes, but that could change. Privacy concerns Because patient meetings with their doctors are confidential, it is hard to know how AI-generated transcripts are affecting them. A California state lawmaker, Rebecca Bauer-Kahan, said she took one of her children to the doctor earlier this year, and refused to sign a form the health network provided that sought her permission to share the consultation audio with vendors that included Microsoft Azure, the cloud computing system run by OpenAI's largest investor. Bauer-Kahan didn't want such intimate medical conversations being shared with tech companies, she said. "The release was very specific that for-profit companies would have the right to have this," said Bauer-Kahan, a Democrat who represents part of the San Francisco suburbs in the state Assembly. "I was like 'absolutely not.'" John Muir Health spokesman Ben Drew said the health system complies with state and federal privacy laws. This story was produced in partnership with the Pulitzer Center's AI Accountability Network, which also partially supported the academic Whisper study. ___ The Associated Press receives financial assistance from the Omidyar Network to support coverage of artificial intelligence and its impact on society. AP is solely responsible for all content. Find AP's standards for working with philanthropies, a list of supporters and funded coverage areas at AP.org. ___ The Associated Press and OpenAI have a licensing and technology agreement allowing OpenAI access to part of the AP's text archives. Copyright 2024 The Associated Press. All rights reserved. This material may not be published, broadcast, rewritten or redistributed.

[17]

ABC News

Researchers say an AI-powered transcription tool used in hospitals invents things no one ever said

SAN FRANCISCO -- Tech behemoth OpenAI has touted its artificial intelligence-powered transcription tool Whisper as having near "human level robustness and accuracy." But Whisper has a major flaw: It is prone to making up chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. Those experts said some of the invented text -- known in the industry as hallucinations -- can include racial commentary, violent rhetoric and even imagined medical treatments. Experts said that such fabrications are problematic because Whisper is being used in a slew of industries worldwide to translate and transcribe interviews, generate text in popular consumer technologies and create subtitles for videos. More concerning, they said, is a rush by medical centers to utilize Whisper-based tools to transcribe patients' consultations with doctors, despite OpenAI' s warnings that the tool should not be used in "high-risk domains." The full extent of the problem is difficult to discern, but researchers and engineers said they frequently have come across Whisper's hallucinations in their work. A University of Michigan researcher conducting a study of public meetings, for example, said he found hallucinations in eight out of every 10 audio transcriptions he inspected, before he started trying to improve the model. A machine learning engineer said he initially discovered hallucinations in about half of the over 100 hours of Whisper transcriptions he analyzed. A third developer said he found hallucinations in nearly every one of the 26,000 transcripts he created with Whisper. The problems persist even in well-recorded, short audio samples. A recent study by computer scientists uncovered 187 hallucinations in over 13,000 clear audio snippets they examined. That trend would lead to tens of thousands of faulty transcriptions over millions of recordings, researchers said. Such mistakes could have "really grave consequences," particularly in hospital settings, said Alondra Nelson, who led the White House Office of Science and Technology Policy for the Biden administration until last year. "Nobody wants a misdiagnosis," said Nelson, a professor at the Institute for Advanced Study in Princeton, New Jersey. "There should be a higher bar." Whisper also is used to create closed captioning for the Deaf and hard of hearing -- a population at particular risk for faulty transcriptions. That's because the Deaf and hard of hearing have no way of identifying fabrications are "hidden amongst all this other text," said Christian Vogler, who is deaf and directs Gallaudet University's Technology Access Program. The prevalence of such hallucinations has led experts, advocates and former OpenAI employees to call for the federal government to consider AI regulations. At minimum, they said, OpenAI needs to address the flaw. "This seems solvable if the company is willing to prioritize it," said William Saunders, a San Francisco-based research engineer who quit OpenAI in February over concerns with the company's direction. "It's problematic if you put this out there and people are overconfident about what it can do and integrate it into all these other systems." An OpenAI spokesperson said the company continually studies how to reduce hallucinations and appreciated the researchers' findings, adding that OpenAI incorporates feedback in model updates. While most developers assume that transcription tools misspell words or make other errors, engineers and researchers said they had never seen another AI-powered transcription tool hallucinate as much as Whisper. The tool is integrated into some versions of OpenAI's flagship chatbot ChatGPT, and is a built-in offering in Oracle and Microsoft's cloud computing platforms, which service thousands of companies worldwide. It is also used to transcribe and translate text into multiple languages. In the last month alone, one recent version of Whisper was downloaded over 4.2 million times from open-source AI platform HuggingFace. Sanchit Gandhi, a machine-learning engineer there, said Whisper is the most popular open-source speech recognition model and is built into everything from call centers to voice assistants. Professors Allison Koenecke of Cornell University and Mona Sloane of the University of Virginia examined thousands of short snippets they obtained from TalkBank, a research repository hosted at Carnegie Mellon University. They determined that nearly 40% of the hallucinations were harmful or concerning because the speaker could be misinterpreted or misrepresented. In an example they uncovered, a speaker said, "He, the boy, was going to, I'm not sure exactly, take the umbrella." But the transcription software added: "He took a big piece of a cross, a teeny, small piece ... I'm sure he didn't have a terror knife so he killed a number of people." A speaker in another recording described "two other girls and one lady." Whisper invented extra commentary on race, adding "two other girls and one lady, um, which were Black." In a third transcription, Whisper invented a non-existent medication called "hyperactivated antibiotics." Researchers aren't certain why Whisper and similar tools hallucinate, but software developers said the fabrications tend to occur amid pauses, background sounds or music playing. OpenAI recommended in its online disclosures against using Whisper in "decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes." That warning hasn't stopped hospitals or medical centers from using speech-to-text models, including Whisper, to transcribe what's said during doctor's visits to free up medical providers to spend less time on note-taking or report writing. Over 30,000 clinicians and 40 health systems, including the Mankato Clinic in Minnesota and Children's Hospital Los Angeles, have started using a Whisper-based tool built by Nabla, which has offices in France and the U.S. That tool was fine tuned on medical language to transcribe and summarize patients' interactions, said Nabla's chief technology officer Martin Raison. Company officials said they are aware that Whisper can hallucinate and are mitigating the problem. It's impossible to compare Nabla's AI-generated transcript to the original recording because Nabla's tool erases the original audio for "data safety reasons," Raison said. Nabla said the tool has been used to transcribe an estimated 7 million medical visits. Saunders, the former OpenAI engineer, said erasing the original audio could be worrisome if transcripts aren't double checked or clinicians can't access the recording to verify they are correct. "You can't catch errors if you take away the ground truth," he said. Nabla said that no model is perfect, and that theirs currently requires medical providers to quickly edit and approve transcribed notes, but that could change. Because patient meetings with their doctors are confidential, it is hard to know how AI-generated transcripts are affecting them. A California state lawmaker, Rebecca Bauer-Kahan, said she took one of her children to the doctor earlier this year, and refused to sign a form the health network provided that sought her permission to share the consultation audio with vendors that included Microsoft Azure, the cloud computing system run by OpenAI's largest investor. Bauer-Kahan didn't want such intimate medical conversations being shared with tech companies, she said. "The release was very specific that for-profit companies would have the right to have this," said Bauer-Kahan, a Democrat who represents part of the San Francisco suburbs in the state Assembly. "I was like 'absolutely not.'" John Muir Health spokesman Ben Drew said the health system complies with state and federal privacy laws. This story was produced in partnership with the Pulitzer Center's AI Accountability Network, which also partially supported the academic Whisper study. ___ The Associated Press receives financial assistance from the Omidyar Network to support coverage of artificial intelligence and its impact on society. AP is solely responsible for all content. Find AP's standards for working with philanthropies, a list of supporters and funded coverage areas at AP.org. ___ The Associated Press and OpenAI have a licensing and technology agreement allowing OpenAI access to part of the AP's text archives.

[18]

Fortune

OpenAI's transcription tool hallucinates more than any other, experts say -- but hospitals keep using it

[19]

The Seattle Times

Researchers say an AI-powered transcription tool used in hospitals invents things no one ever said

SAN FRANCISCO (AP) -- Tech behemoth OpenAI has touted its artificial intelligence-powered transcription tool Whisper as having near "human level robustness and accuracy." But Whisper has a major flaw: It is prone to making up chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. Those experts said some of the invented text -- known in the industry as hallucinations -- can include racial commentary, violent rhetoric and even imagined medical treatments. Experts said that such fabrications are problematic because Whisper is being used in a slew of industries worldwide to translate and transcribe interviews, generate text in popular consumer technologies and create subtitles for videos. More concerning, they said, is a rush by medical centers to utilize Whisper-based tools to transcribe patients' consultations with doctors, despite OpenAI' s warnings that the tool should not be used in "high-risk domains." The full extent of the problem is difficult to discern, but researchers and engineers said they frequently have come across Whisper's hallucinations in their work. A University of Michigan researcher conducting a study of public meetings, for example, said he found hallucinations in eight out of every 10 audio transcriptions he inspected, before he started trying to improve the model. A machine learning engineer said he initially discovered hallucinations in about half of the over 100 hours of Whisper transcriptions he analyzed. A third developer said he found hallucinations in nearly every one of the 26,000 transcripts he created with Whisper. The problems persist even in well-recorded, short audio samples. A recent study by computer scientists uncovered 187 hallucinations in over 13,000 clear audio snippets they examined. That trend would lead to tens of thousands of faulty transcriptions over millions of recordings, researchers said. Such mistakes could have "really grave consequences," particularly in hospital settings, said Alondra Nelson, who led the White House Office of Science and Technology Policy for the Biden administration until last year. "Nobody wants a misdiagnosis," said Nelson, a professor at the Institute for Advanced Study in Princeton, New Jersey. "There should be a higher bar." Whisper also is used to create closed captioning for the Deaf and hard of hearing -- a population at particular risk for faulty transcriptions. That's because the Deaf and hard of hearing have no way of identifying fabrications are "hidden amongst all this other text," said Christian Vogler, who is deaf and directs Gallaudet University's Technology Access Program. OpenAI urged to address problem The prevalence of such hallucinations has led experts, advocates and former OpenAI employees to call for the federal government to consider AI regulations. At minimum, they said, OpenAI needs to address the flaw. "This seems solvable if the company is willing to prioritize it," said William Saunders, a San Francisco-based research engineer who quit OpenAI in February over concerns with the company's direction. "It's problematic if you put this out there and people are overconfident about what it can do and integrate it into all these other systems." An OpenAI spokesperson said the company continually studies how to reduce hallucinations and appreciated the researchers' findings, adding that OpenAI incorporates feedback in model updates. While most developers assume that transcription tools misspell words or make other errors, engineers and researchers said they had never seen another AI-powered transcription tool hallucinate as much as Whisper. Whisper hallucinations The tool is integrated into some versions of OpenAI's flagship chatbot ChatGPT, and is a built-in offering in Oracle and Microsoft's cloud computing platforms, which service thousands of companies worldwide. It is also used to transcribe and translate text into multiple languages. In the last month alone, one recent version of Whisper was downloaded over 4.2 million times from open-source AI platform HuggingFace. Sanchit Gandhi, a machine-learning engineer there, said Whisper is the most popular open-source speech recognition model and is built into everything from call centers to voice assistants. Professors Allison Koenecke of Cornell University and Mona Sloane of the University of Virginia examined thousands of short snippets they obtained from TalkBank, a research repository hosted at Carnegie Mellon University. They determined that nearly 40% of the hallucinations were harmful or concerning because the speaker could be misinterpreted or misrepresented. In an example they uncovered, a speaker said, "He, the boy, was going to, I'm not sure exactly, take the umbrella." But the transcription software added: "He took a big piece of a cross, a teeny, small piece ... I'm sure he didn't have a terror knife so he killed a number of people." A speaker in another recording described "two other girls and one lady." Whisper invented extra commentary on race, adding "two other girls and one lady, um, which were Black." In a third transcription, Whisper invented a non-existent medication called "hyperactivated antibiotics." Researchers aren't certain why Whisper and similar tools hallucinate, but software developers said the fabrications tend to occur amid pauses, background sounds or music playing. OpenAI recommended in its online disclosures against using Whisper in "decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes." Transcribing doctor appointments That warning hasn't stopped hospitals or medical centers from using speech-to-text models, including Whisper, to transcribe what's said during doctor's visits to free up medical providers to spend less time on note-taking or report writing. Over 30,000 clinicians and 40 health systems, including the Mankato Clinic in Minnesota and Children's Hospital Los Angeles, have started using a Whisper-based tool built by Nabla, which has offices in France and the U.S. That tool was fine tuned on medical language to transcribe and summarize patients' interactions, said Nabla's chief technology officer Martin Raison. Company officials said they are aware that Whisper can hallucinate and are mitigating the problem. It's impossible to compare Nabla's AI-generated transcript to the original recording because Nabla's tool erases the original audio for "data safety reasons," Raison said. Nabla said the tool has been used to transcribe an estimated 7 million medical visits. Saunders, the former OpenAI engineer, said erasing the original audio could be worrisome if transcripts aren't double checked or clinicians can't access the recording to verify they are correct. "You can't catch errors if you take away the ground truth," he said. Nabla said that no model is perfect, and that theirs currently requires medical providers to quickly edit and approve transcribed notes, but that could change. Privacy concerns Because patient meetings with their doctors are confidential, it is hard to know how AI-generated transcripts are affecting them. A California state lawmaker, Rebecca Bauer-Kahan, said she took one of her children to the doctor earlier this year, and refused to sign a form the health network provided that sought her permission to share the consultation audio with vendors that included Microsoft Azure, the cloud computing system run by OpenAI's largest investor. Bauer-Kahan didn't want such intimate medical conversations being shared with tech companies, she said. "The release was very specific that for-profit companies would have the right to have this," said Bauer-Kahan, a Democrat who represents part of the San Francisco suburbs in the state Assembly. "I was like 'absolutely not.'" John Muir Health spokesman Ben Drew said the health system complies with state and federal privacy laws. This story was produced in partnership with the Pulitzer Center's AI Accountability Network, which also partially supported the academic Whisper study. ___ The Associated Press receives financial assistance from the Omidyar Network to support coverage of artificial intelligence and its impact on society. AP is solely responsible for all content. Find AP's standards for working with philanthropies, a list of supporters and funded coverage areas at AP.org. ___ The Associated Press and OpenAI have a licensing and technology agreement allowing OpenAI access to part of the AP's text archives.

[20]

ZDNet

OpenAI's AI transcription tool hallucinates excessively - here's a better alternative

A researcher 'found hallucinations in eight out of every 10 audio transcriptions he inspected' from Whisper. OpenAI's Whisper, an artificial intelligence (AI) speech recognition and transcription tool launched in 2022, has been found to hallucinate or make things up -- so much so that experts are worried it could cause serious damage in the wrong context. Last week, the AP reported that a researcher at the University of Michigan "found hallucinations in eight out of every 10 audio transcriptions he inspected" produced by Whisper during a study of public meetings. Also: How Claude's new AI data analysis tool compares to ChatGPT's version (hint: it doesn't) The data point is one of many: separately, an engineer who reviewed 100 hours of Whisper transcriptions told the AP that he found hallucinations in roughly 50% of them, while another developer discovered hallucinations in almost every transcript he generated using Whisper, which totals 26,000. While users can always expect AI transcribers to get a word or spelling wrong here and there, researchers noted that they "had never seen another AI-powered transcription tool hallucinate as much as Whisper." OpenAI says Whisper, an open-source neural net, "approaches human level robustness and accuracy on English speech recognition." It is integrated widely across several industries for common types of speech recognition, including transcribing and translating interviews and creating video subtitles. Also: Police are using AI to write crime reports. What could go wrong? That level of ubiquity could quickly spread fabricated text, misattributed and invented quotes, and other misinformation across several mediums, which can vary in significance based on the nature of the original material. According to AP, Whisper is incorporated into some versions of ChatGPT, built into call centers, voice assistants, and cloud platforms from Oracle and Microsoft, and it was downloaded more than 4.2 million times last month from HuggingFace. What's even more concerning, experts told the AP, is that medical professionals are increasingly using "Whisper-based tools" to transcribe patient-doctor consultations. The AP interviewed more than 12 engineers, researchers, and developers who confirmed that Whisper fabricated phrases and full sentences in transcription text, some of which "can include racial commentary, violent rhetoric and even imagined medical treatments." Also: How AI hallucinations could help create life-saving antibiotics "Nobody wants a misdiagnosis," said Alondra Nelson, a professor at the Institute for Advanced Study. OpenAI may not have advocated for medical use cases -- the company advises "against use in high-risk domains like decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes" -- but putting the tool on the market and touting its accuracy means it's likely to be picked up by several industries trying to expedite work and create efficiencies wherever possible, regardless of the possible risks. The issue doesn't seem dependent on longer or poorly recorded audio, either. According to the AP, computer scientists recently found some hallucinations in short, clear audio samples. Researchers told the AP the trend "would lead to tens of thousands of faulty transcriptions over millions of recordings." "The full extent of the problem is difficult to discern, but researchers and engineers said they frequently have come across Whisper's hallucinations in their work," the AP reports. Besides, as Christian Vogler, who directs Gallaudet University's Technology Access Program and is deaf, pointed out, those who are deaf or hard of hearing can't catch hallucinations "hidden amongst all this other text." The researchers' findings indicate a broader problem in the AI industry: tools are brought to market too quickly for the sake of profit, especially while the US still lacks proper AI regulations. This is also relevant considering OpenAI's ongoing for-vs.-non-profit debate and recent predictions from leadership that don't consider AI risks. Also: Could AI make data science obsolete? "An OpenAI spokesperson said the company continually studies how to reduce hallucinations and appreciated the researchers' findings, adding that OpenAI incorporates feedback in model updates," AP wrote. While you're waiting for OpenAI to resolve the issue, we recommend trying Otter.ai, a journalist-trusted AI transcription tool that just added six new languages. Last month, one longtime Otter.ai user noted that a new AI summary feature in the platform hallucinated a statistic, but that error wasn't in the transcription itself. It may be wise to not rely on that feature, especially as risks can increase when AI is asked to summarize bigger contexts. Otter.ai's own guidance for transcription doesn't mention hallucinations, only that "accuracy can vary based on factors such as background noise, speaker accents, and the complexity of the conversation," and advises users to "review and edit the transcriptions to ensure complete accuracy, especially for critical tasks or important conversations." Also: iOS 18.1 with Apple Intelligence is here. Try these 5 AI features first If you have an iPhone, the new iOS 18.1 with Apple Intelligence now allows AI call recording and transcription, but ZDNET's editor-in-chief Jason Hiner says it's "still a work in progress." Meanwhile, OpenAI just announced plans to give its 250 million ChatGPT Plus users more tools.

[21]

Borneo Bulletin Online

Research: AI transcription tool in hospitals fabricates speech

SAN FRANCISCO (AP) -- Tech behemoth OpenAI has touted its artificial intelligence-powered transcription tool Whisper as having near "human level robustness and accuracy." But Whisper has a major flaw: It is prone to making up chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. Those experts said some of the invented text -- known in the industry as hallucinations -- can include racial commentary, violent rhetoric and even imagined medical treatments. Experts said that such fabrications are problematic because Whisper is being used in a slew of industries worldwide to translate and transcribe interviews, generate text in popular consumer technologies and create subtitles for videos. More concerning, they said, is a rush by medical centres to utilise Whisper-based tools to transcribe patients' consultations with doctors, despite OpenAI' s warnings that the tool should not be used in "high-risk domains." The full extent of the problem is difficult to discern, but researchers and engineers said they frequently have come across Whisper's hallucinations in their work. A University of Michigan researcher conducting a study of public meetings, for example, said he found hallucinations in eight out of every 10 audio transcriptions he inspected, before he started trying to improve the model. A machine learning engineer said he initially discovered hallucinations in about half of the over 100 hours of Whisper transcriptions he analysed. A third developer said he found hallucinations in nearly every one of the 26,000 transcripts he created with Whisper. The problems persist even in well-recorded, short audio samples. A recent study by computer scientists uncovered 187 hallucinations in more than 13,000 clear audio snippets they examined. That trend would lead to tens of thousands of faulty transcriptions over millions of recordings, researchers said. Such mistakes could have "really grave consequences," particularly in hospital settings, said Alondra Nelson, who led the White House Office of Science and Technology Policy for the Biden administration until last year. "Nobody wants a misdiagnosis," said Nelson, a professor at the Institute for Advanced Study in Princeton, New Jersey. "There should be a higher bar." Whisper also is used to create closed captioning for the Deaf and hard of hearing -- a population at particular risk for faulty transcriptions. That's because the Deaf and hard of hearing have no way of identifying fabrications "hidden amongst all this other text," said Christian Vogler, who is deaf and directs Gallaudet University's Technology Access Program. OpenAI urged to address problem The prevalence of such hallucinations has led experts, advocates and former OpenAI employees to call for the federal government to consider AI regulations. At minimum, they said, OpenAI needs to address the flaw. "This seems solvable if the company is willing to prioritise it," said William Saunders, a San Francisco-based research engineer who quit OpenAI in February over concerns with the company's direction. "It's problematic if you put this out there and people are overconfident about what it can do and integrate it into all these other systems." An OpenAI spokesperson said the company continually studies how to reduce hallucinations and appreciated the researchers' findings, adding that OpenAI incorporates feedback in model updates. While most developers assume that transcription tools misspell words or make other errors, engineers and researchers said they had never seen another AI-powered transcription tool hallucinate as much as Whisper. Whisper hallucinations The tool is integrated into some versions of OpenAI's flagship chatbot ChatGPT, and is a built-in offering in Oracle and Microsoft's cloud computing platforms, which service thousands of companies worldwide. It is also used to transcribe and translate text into multiple languages. In the last month alone, one recent version of Whisper was downloaded over 4.2 million times from open-source AI platform HuggingFace. Sanchit Gandhi, a machine-learning engineer there, said Whisper is the most popular open-source speech recognition model and is built into everything from call centres to voice assistants. Professors Allison Koenecke of Cornell University and Mona Sloane of the University of Virginia examined thousands of short snippets they obtained from TalkBank, a research repository hosted at Carnegie Mellon University. They determined that nearly 40 per cent of the hallucinations were harmful or concerning because the speaker could be misinterpreted or misrepresented. In an example they uncovered, a speaker said, "He, the boy, was going to, I'm not sure exactly, take the umbrella." But the transcription software added: "He took a big piece of a cross, a teeny, small piece ... I'm sure he didn't have a terror knife so he killed a number of people." A speaker in another recording described "two other girls and one lady." Whisper invented extra commentary on race, adding "two other girls and one lady, um, which were Black." In a third transcription, Whisper invented a non-existent medication called "hyperactivated antibiotics." Researchers aren't certain why Whisper and similar tools hallucinate, but software developers said the fabrications tend to occur amid pauses, background sounds or music playing. OpenAI recommended in its online disclosures against using Whisper in "decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes." Transcribing doctor appointments That warning hasn't stopped hospitals or medical centres from using speech-to-text models, including Whisper, to transcribe what's said during doctor's visits to free up medical providers to spend less time on note-taking or report writing. Over 30,000 clinicians and 40 health systems, including the Mankato Clinic in Minnesota and Children's Hospital Los Angeles, have started using a Whisper-based tool built by Nabla, which has offices in France and the US. That tool was fine-tuned on medical language to transcribe and summarise patients' interactions, said Nabla's chief technology officer Martin Raison. Company officials said they are aware that Whisper can hallucinate and are addressing the problem. It's impossible to compare Nabla's AI-generated transcript to the original recording because Nabla's tool erases the original audio for "data safety reasons," Raison said. Nabla said the tool has been used to transcribe an estimated 7 million medical visits. Saunders, the former OpenAI engineer, said erasing the original audio could be worrisome if transcripts aren't double checked or clinicians can't access the recording to verify they are correct. "You can't catch errors if you take away the ground truth," he said. Nabla said that no model is perfect, and that theirs currently requires medical providers to quickly edit and approve transcribed notes, but that could change. Privacy concerns Because patient meetings with their doctors are confidential, it is hard to know how AI-generated transcripts are affecting them. A California state lawmaker, Rebecca Bauer-Kahan, said she took one of her children to the doctor earlier this year, and refused to sign a form the health network provided that sought her permission to share the consultation audio with vendors that included Microsoft Azure, the cloud computing system run by OpenAI's largest investor. Bauer-Kahan didn't want such intimate medical conversations being shared with tech companies, she said. "The release was very specific that for-profit companies would have the right to have this," said Bauer-Kahan, a Democrat who represents part of the San Francisco suburbs in the state Assembly. "I was like 'absolutely not.' "

[22]

Borneo Bulletin Online

An AI-powered transcription tool invents things no one ever said

SAN FRANCISCO (AP) -- Tech behemoth OpenAI has touted its artificial intelligence-powered transcription tool Whisper as having near "human level robustness and accuracy." But Whisper has a major flaw: It is prone to making up chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. Those experts said some of the invented text -- known in the industry as hallucinations -- can include racial commentary, violent rhetoric and even imagined medical treatments. Experts said that such fabrications are problematic because Whisper is being used in a slew of industries worldwide to translate and transcribe interviews, generate text in popular consumer technologies and create subtitles for videos. More concerning, they said, is a rush by medical centers to utilise Whisper-based tools to transcribe patients' consultations with doctors, despite OpenAI' s warnings that the tool should not be used in "high-risk domains." The full extent of the problem is difficult to discern, but researchers and engineers said they frequently have come across Whisper's hallucinations in their work. A University of Michigan researcher conducting a study of public meetings, for example, said he found hallucinations in eight out of every 10 audio transcriptions he inspected, before he started trying to improve the model. A machine learning engineer said he initially discovered hallucinations in about half of the over 100 hours of Whisper transcriptions he analysed. A third developer said he found hallucinations in nearly every one of the 26,000 transcripts he created with Whisper. The problems persist even in well-recorded, short audio samples. A recent study by computer scientists uncovered 187 hallucinations in more than 13,000 clear audio snippets they examined. That trend would lead to tens of thousands of faulty transcriptions over millions of recordings, researchers said. Such mistakes could have "really grave consequences," particularly in hospital settings, said Alondra Nelson, who led the White House Office of Science and Technology Policy for the Biden administration until last year. "Nobody wants a misdiagnosis," said Nelson, a professor at the Institute for Advanced Study in Princeton, New Jersey. "There should be a higher bar." Whisper also is used to create closed captioning for the Deaf and hard of hearing -- a population at particular risk for faulty transcriptions. That's because the Deaf and hard of hearing have no way of identifying fabrications "hidden amongst all this other text," said Christian Vogler, who is deaf and directs Gallaudet University's Technology Access Program. OpenAI urged to address problem The prevalence of such hallucinations has led experts, advocates and former OpenAI employees to call for the federal government to consider AI regulations. At minimum, they said, OpenAI needs to address the flaw. "This seems solvable if the company is willing to prioritise it," said William Saunders, a San Francisco-based research engineer who quit OpenAI in February over concerns with the company's direction. "It's problematic if you put this out there and people are overconfident about what it can do and integrate it into all these other systems." An OpenAI spokesperson said the company continually studies how to reduce hallucinations and appreciated the researchers' findings, adding that OpenAI incorporates feedback in model updates. While most developers assume that transcription tools misspell words or make other errors, engineers and researchers said they had never seen another AI-powered transcription tool hallucinate as much as Whisper. Whisper hallucinations The tool is integrated into some versions of OpenAI's flagship chatbot ChatGPT, and is a built-in offering in Oracle and Microsoft's cloud computing platforms, which service thousands of companies worldwide. It is also used to transcribe and translate text into multiple languages. In the last month alone, one recent version of Whisper was downloaded over 4.2 million times from open-source AI platform HuggingFace. Sanchit Gandhi, a machine-learning engineer there, said Whisper is the most popular open-source speech recognition model and is built into everything from call centers to voice assistants. Professors Allison Koenecke of Cornell University and Mona Sloane of the University of Virginia examined thousands of short snippets they obtained from TalkBank, a research repository hosted at Carnegie Mellon University. They determined that nearly 40 per cent of the hallucinations were harmful or concerning because the speaker could be misinterpreted or misrepresented. In an example they uncovered, a speaker said, "He, the boy, was going to, I'm not sure exactly, take the umbrella." But the transcription software added: "He took a big piece of a cross, a teeny, small piece ... I'm sure he didn't have a terror knife so he killed a number of people." A speaker in another recording described "two other girls and one lady." Whisper invented extra commentary on race, adding "two other girls and one lady, um, which were Black." In a third transcription, Whisper invented a non-existent medication called "hyperactivated antibiotics." Researchers aren't certain why Whisper and similar tools hallucinate, but software developers said the fabrications tend to occur amid pauses, background sounds or music playing. OpenAI recommended in its online disclosures against using Whisper in "decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes." Transcribing doctor appointments That warning hasn't stopped hospitals or medical centers from using speech-to-text models, including Whisper, to transcribe what's said during doctor's visits to free up medical providers to spend less time on note-taking or report writing. Over 30,000 clinicians and 40 health systems, including the Mankato Clinic in Minnesota and Children's Hospital Los Angeles, have started using a Whisper-based tool built by Nabla, which has offices in France and the US. That tool was fine-tuned on medical language to transcribe and summarise patients' interactions, said Nabla's chief technology officer Martin Raison. Company officials said they are aware that Whisper can hallucinate and are addressing the problem. It's impossible to compare Nabla's AI-generated transcript to the original recording because Nabla's tool erases the original audio for "data safety reasons," Raison said. Nabla said the tool has been used to transcribe an estimated 7 million medical visits. Saunders, the former OpenAI engineer, said erasing the original audio could be worrisome if transcripts aren't double checked or clinicians can't access the recording to verify they are correct. "You can't catch errors if you take away the ground truth," he said. Nabla said that no model is perfect, and that theirs currently requires medical providers to quickly edit and approve transcribed notes, but that could change. Privacy concerns Because patient meetings with their doctors are confidential, it is hard to know how AI-generated transcripts are affecting them. A California state lawmaker, Rebecca Bauer-Kahan, said she took one of her children to the doctor earlier this year, and refused to sign a form the health network provided that sought her permission to share the consultation audio with vendors that included Microsoft Azure, the cloud computing system run by OpenAI's largest investor. Bauer-Kahan didn't want such intimate medical conversations being shared with tech companies, she said. "The release was very specific that for-profit companies would have the right to have this," said Bauer-Kahan, a Democrat who represents part of the San Francisco suburbs in the state Assembly. "I was like 'absolutely not.' " John Muir Health spokesman Ben Drew said the health system complies with state and federal privacy laws.

[23]

BreakingNews.ie

Transcription tool powered by AI found to 'hallucinate' extra sentences | BreakingNews.ie

Tech giant OpenAI has touted its artificial intelligence-powered transcription tool Whisper as having near "human level robustness and accuracy". But Whisper has a major flaw: It is prone to making up chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. Those experts said some of the invented text - known in the industry as hallucinations - can include racial commentary, violent rhetoric and even imagined medical treatments. Experts said that such fabrications are problematic because Whisper is being used in a slew of industries worldwide to translate and transcribe interviews, generate text in popular consumer technologies and create subtitles for videos. More concerning, they said, is a rush by medical centres to utilise Whisper-based tools to transcribe patients' consultations with doctors, despite OpenAI's warnings that the tool should not be used in "high-risk domains". The full extent of the problem is difficult to discern, but researchers and engineers said they frequently have come across Whisper's hallucinations in their work. A University of Michigan researcher conducting a study of public meetings, for example, said he found hallucinations in eight out of every 10 audio transcriptions he inspected, before he started trying to improve the model. A machine learning engineer said he initially discovered hallucinations in about half of the over 100 hours of Whisper transcriptions he analysed. A third developer said he found hallucinations in nearly every one of the 26,000 transcripts he created with Whisper. The problems persist even in well-recorded, short audio samples. A recent study by computer scientists uncovered 187 hallucinations in over 13,000 clear audio snippets they examined. That trend would lead to tens of thousands of faulty transcriptions over millions of recordings, researchers said. Such mistakes could have "really grave consequences," particularly in hospital settings, said Alondra Nelson, who led the White House Office of Science and Technology Policy for the Biden administration until last year. The prevalence of such hallucinations has led experts, advocates and former OpenAI employees to call for the federal government to consider AI regulations. At minimum, they said, OpenAI needs to address the flaw. An OpenAI spokesperson said the company continually studies how to reduce hallucinations and appreciated the researchers' findings, adding that OpenAI incorporates feedback in model updates. While most developers assume that transcription tools misspell words or make other errors, engineers and researchers said they had never seen another AI-powered transcription tool hallucinate as much as Whisper. Professors Allison Koenecke of Cornell University and Mona Sloane of the University of Virginia examined thousands of short snippets they obtained from TalkBank, a research repository hosted at Carnegie Mellon University. They determined that nearly 40% of the hallucinations were harmful or concerning because the speaker could be misinterpreted or misrepresented. In an example they uncovered, a speaker said, "He, the boy, was going to, I'm not sure exactly, take the umbrella." But the transcription software added: "He took a big piece of a cross, a teeny, small piece ... I'm sure he didn't have a terror knife so he killed a number of people." A speaker in another recording described "two other girls and one lady." Whisper invented extra commentary on race, adding "two other girls and one lady, um, which were black."

[24]

PYMNTS

AI Speech Tools Stumble as Hallucination Problems Persist | PYMNTS.com

In a troubling discovery for businesses racing to automate customer service, OpenAI's popular Whisper transcription software has been caught adding fabricated text to conversations, including potentially harmful content that speakers never uttered. The findings underscore a growing challenge for companies betting billions on AI to handle sensitive customer interactions, from call centers to medical offices. Even advanced language models continue to "hallucinate" or generate false information that could damage customer relationships or create legal liability. "If a chatbot is providing shoppers with wrong or misleading answers and advice, it can be a big problem and might hurt the organization's reputation," Iris Zarecki, director of product marketing at AI company K2view, told PYMNTS. "A couple of recent examples of this include Air Canada's chatbot giving incorrect information to a traveler and arguing the chatbot is "responsible for its own actions," and an AI-powered chatbot created by New York City to help small business owners giving advice that misstates local policies and advised companies to violate the law." The Associated Press reported that OpenAI's Whisper stumbles on a basic task: accurately writing down what people say. Researchers found the AI transcription tool frequently invents text out of thin air, including disturbing content about violence and race. One study found hallucinations in 8 out of 10 transcripts. The glitch raises alarm bells as hospitals and clinics rush to adopt Whisper-based tools for patient consultations despite OpenAI's explicit warnings against using it for high-risk scenarios. Hallucinations in AI are defined as confident but inaccurate responses that are not justified by the training data used for the AI model, Chris Sheehan, senior vice president and general manager of strategic accounts at Applause, a company that does generative AI testing among other services, told PYMNTS. He said that AI hallucinations occur when generative AI models create outputs that are nonsensical or inaccurate. Applause's 2024 Generative AI survey found that 37% of respondents had experienced hallucinations. "Presenting inaccurate or offensive information to customers is annoying and time-consuming, but more deeply, it erodes trust in the brand that it could deploy technology that was obviously not properly tested and vetted," Sheehan said. While both agentic AI and chatbots can hallucinate, it's important to distinguish between them, Dan Balaceanu, chief product officer at Druid AI, told PYMNTS. Chatbots enable natural interaction via text, voice, or both but are typically rule-based and only respond to specific keywords or phrases. On the other hand, AI agents can handle more complex conversations and provide personalized solutions to customers. These agents leverage advanced AI techniques like natural language processing and machine learning to provide customized answers to queries. "AI hallucinations are a form of software application defect or error," he added. "These are more problematic, though, because you can't acknowledge them easily. You don't get a blue screen or an error message in front of you. Instead, you get a nice and apparently correct response, which can be incredibly misleading." However, Cordial's Chief Product and Engineering Officer Matt Howland sees an upside to hallucinations. He told PYMNTS that AI's creative departures from reality can be both a challenge and an opportunity. These quirks show up when AI models work with limited data or unclear instructions, sometimes leading to content ranging from mildly inaccurate to completely fantastical. "What fascinates me is how marketers and brands can actually turn this seeming limitation into an advantage," he added. "While we absolutely need accuracy for technical documentation and customer communications, these unexpected AI behaviors can spark creative breakthroughs in marketing campaigns and content creation. The key lies in knowing when to rein in AI's imagination and when to let it roam free. With this, everyone should treat AI as a creative collaborator rather than a production tool."

Twitter

Facebook

Copy Link

OpenAI's Whisper, an AI-powered transcription tool, is found to generate hallucinations and inaccuracies, raising alarm as it's widely used in medical settings despite warnings against its use in high-risk domains.

OpenAI's Whisper: A Controversial AI Transcription Tool

OpenAI's Whisper, an AI-powered transcription tool, has come under scrutiny for its tendency to generate fabricated text, known as hallucinations. Despite OpenAI's claims of "human level robustness and accuracy," experts have identified numerous instances where Whisper invents entire sentences or adds non-existent content to transcriptions 1 2.

Widespread Adoption in Healthcare

Despite OpenAI's explicit warnings against using Whisper in "high-risk domains," the medical sector has widely adopted Whisper-based tools. Nabla, a medical tech company, has developed a Whisper-based tool used by over 30,000 clinicians and 40 health systems, including the Mankato Clinic in Minnesota and Children's Hospital Los Angeles 3 4.

Alarming Findings

Researchers and engineers have reported numerous instances of hallucinations in their work with Whisper:

A University of Michigan researcher observed hallucinations in 80% of public meeting transcriptions examined 2.
A machine learning engineer encountered hallucinations in approximately half of over 100 hours of Whisper transcriptions analyzed 3.
Another developer found hallucinations in nearly all 26,000 transcripts created using the tool 3.

Potential Consequences in Healthcare

The use of Whisper in medical settings raises significant concerns:

Nabla's tool has been used to transcribe an estimated 7 million medical visits 1 5.
The tool erases original audio recordings, making it impossible to verify the accuracy of transcriptions 2 4.
Deaf patients may be particularly impacted by mistaken transcripts 4.

Types of Hallucinations

A study conducted by researchers from Cornell University and the University of Virginia revealed alarming types of hallucinations:

Addition of non-existent violent content and racial commentary to neutral speech 2 4.
Invention of fictional medications, such as "hyperactivated antibiotics" 3.
Transformation of innocuous statements into violent scenarios 3 5.

Implications and Concerns

Experts warn of potential grave consequences, especially in hospital settings. Alondra Nelson, a professor at the Institute for Advanced Study, emphasized the need for a higher bar in medical contexts 1 3.

OpenAI's Response

An OpenAI spokesperson stated that the company appreciates the researchers' findings and is actively studying how to reduce fabrications. They incorporate feedback in updates to the model 2.

Broader Impact

Whisper's influence extends beyond OpenAI, with integration into some versions of ChatGPT and availability on Oracle and Microsoft's cloud computing platforms. In just one month, a recent version of Whisper was downloaded over 4.2 million times from the open-source AI platform HuggingFace 3 5.

As AI tools like Whisper continue to be adopted in critical sectors, the need for improved accuracy and safeguards becomes increasingly apparent. The medical community, in particular, must carefully weigh the benefits of AI-powered transcription against the potential risks of misinformation in patient records.

References

Summarized by

Navi

[1]

Futurism

|AI Model Used By Hospitals Caught Making Up Details About Patients, Inventing Nonexistent Medications and Sexual Acts

[2]

Wired

|OpenAI's Transcription Tool Hallucinates. Hospitals Are Using It Anyway

[3]

TechSpot

|OpenAI transcription tool widely used by doctors and hospitals raises concerns over hallucinations

[4]

Ars Technica

|Hospitals adopt error-prone AI transcription tools despite warnings

[5]

engadget

|OpenAI's Whisper invents parts of transcriptions -- a lot

Explore today's top stories

NVIDIA Unveils Major GeForce NOW Upgrade with RTX 5080 Performance and Expanded Game Library

NVIDIA announces significant upgrades to its GeForce NOW cloud gaming service, including RTX 5080-class performance, improved streaming quality, and an expanded game library, set to launch in September 2025.

9 Sources

Technology

6 hrs ago

NVIDIA Unveils Major GeForce NOW Upgrade with RTX 5080

9 Sources

Technology

6 hrs ago

Space: The New Frontier of 21st Century Warfare

As nations compete for dominance in space, the risk of satellite hijacking and space-based weapons escalates, transforming outer space into a potential battlefield with far-reaching consequences for global security and economy.

7 Sources

Technology

22 hrs ago

Space: The New Frontier of 21st Century Warfare

7 Sources

Technology

22 hrs ago

OpenAI Tweaks GPT-5 to Be 'Warmer and Friendlier' Amid User Backlash

OpenAI updates GPT-5 to make it more approachable following user feedback, sparking debate about AI personality and user preferences.

6 Sources

Technology

14 hrs ago

OpenAI Tweaks GPT-5 to Be 'Warmer and Friendlier' Amid User

6 Sources

Technology

14 hrs ago

Russian Disinformation Campaign Exploits AI to Spread Fake News

A pro-Russian propaganda group, Storm-1679, is using AI-generated content and impersonating legitimate news outlets to spread disinformation, raising concerns about the growing threat of AI-powered fake news.

2 Sources

Technology

22 hrs ago

Russian Disinformation Campaign Exploits AI to Spread Fake

2 Sources

Technology

22 hrs ago

AI in Healthcare: Patients Trust AI Medical Advice Over Doctors, Raising Concerns and Challenges

A study reveals patients' increasing reliance on AI for medical advice, often trusting it over doctors. This trend is reshaping doctor-patient dynamics and raising concerns about AI's limitations in healthcare.

3 Sources

Health

14 hrs ago

3 Sources

Health

14 hrs ago

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

The Outpost

News

About