3 Sources
[1]
Your doctor's AI notetaker may be making things up, Ontario audit finds
In recent years, many overworked doctors have turned to so-called AI medical scribes to help automatically summarize patient conversations, diagnoses, and care decisions into structured notes for health record logging. But a recent audit by the auditor general of Ontario found that AI scribes recommended by the provincial government regularly generated incorrect, incomplete and hallucinated information that could "potentially result in inadequate or harmful treatment plans that may potentially impact patient health outcomes." In a recent report on Use of Artificial Intelligence in the Ontario Government, the auditor general reviewed transcription tests of two simulated patient-doctor conversations performed across 20 AI scribe vendors that were approved and pre-qualified by the provincial government for purchase by healthcare providers. All 20 of those vendors showed some issue with accuracy or completeness in at least one of these simple tests, including nine that hallucinated patient information, 12 that recorded information incorrectly, and 17 that missed key details about discussed mental health issues. In the report, the auditor general points out multiple concerning examples of mistakes in those summaries that could have a direct and negative impact on a patient's subsequent care. That includes situations where an AI scribe hallucinated nonexistent referrals for blood tests or therapy, incorrectly transcribed the names of prescription medication, and/or missed "key details" of mental health issues discussed in the simulated conversations. Across all approved vendors, the average tested AI scribe scored only a 12 out of 20 on the "accuracy of medical notes generated" section of Supply Ontario's evaluation rubric. But that seemingly key "accuracy" metric was only responsible for about 4 percent of a vendor's overall score, making it easy to meet the minimum threshold for approval even if an AI scribe scored a "zero" on the accuracy metric (a separate metric measuring "domestic presence in Ontario" was worth 30 percent of the overall scoring). All these factors contributed to the auditor general's overall finding that these AI scribes "were not evaluated adequately." In a display of restraint and understatement, the report notes that "it is important that AI scribe systems are tested to provide assurances as to the quality of their generated notes and to minimize inaccuracies." It also recommends that IT departments using these scribes force doctors to "confirm their review of the notes produced" before committing them to patient logs. Public sector health services in Ontario are not required to use these AI scribe systems in their work and may purchase scribes from non-approved vendors if they wish. Still, the fact that the Ontario government recommended AI summary systems with such obvious and potentially patient-harming flaws should give pause to any doctors (or their patients) making use of them.
[2]
Sick and wrong: Ontario auditors find doctors' AI note takers routinely blow basic facts
The AI systems approved for Ontario healthcare providers routinely missed critical details, inserted incorrect information, and hallucinated content that neither patients nor clinicians mentioned, according to a provincial audit of 20 approved vendors' systems. The findings come from the Office of the Auditor General of Ontario, Canada, and are included in a larger report about the state of AI usage by public services in the province. They specifically address the AI Scribe program, the Ontario Ministry of Health initiated for physicians, nurse practitioners, and other healthcare professionals across the broader health sector. As part of the procurement process, officials conducted evaluations using simulated doctor-patient recordings. Medical professionals then reviewed the original recordings alongside the AI-generated notes to evaluate their accuracy. What they found was, frankly, shocking for anyone concerned about the accuracy of AI in critical situations. Nine out of 20 AI systems reportedly "fabricated information and made suggestions to patients' treatment plans" that weren't discussed in the recordings. According to the report, evaluators spotted potentially devastating incorrect information in the sample reports, such as no masses being found, or patients being anxious, even though these things were never discussed in the recordings. Twelve of the 20 systems evaluated inserted incorrect drug information into patient notes, while 17 of the systems "missed key details about the patients' mental health issues" that were discussed in the recordings. Six of the systems "missed the patients' mental health issues fully or partially or were missing key details," per the report. OntarioMD, a group that offers support for physicians in adopting new technologies and was involved in the AI Scribe procurement process, has recommended that doctors manually review their AI notes for accuracy, but the report notes there's no mandatory attestation feature in any of the AI Scribe-approved systems. AI systems making mistakes isn't exactly shocking. As we've reported previously, consumer-focused AI has a tendency to provide bad medical information to users, and some studies have found large language models failed to produce appropriate differential diagnoses in roughly 80 percent of tested cases. But the tools evaluated here are for doctors, not consumers, and such poor performance necessitates explanation. A good portion of the report blames how the systems were evaluated. According to the report, the weight given to various categories of AI Scribe performances was wonky. While 30 percent of a platform's evaluation score depended solely on whether they had a domestic presence in Ontario, the accuracy of medical notes contributed only 4 percent to the total score. Bias controls accounted for only 2 percent of the total evaluation score; threat, risk, and privacy assessments counted for another 2 percent; and SOC 2 Type 2 compliance contributed an additional 4 percentage points. In other words, criteria tied to accuracy, bias controls, and key security and privacy safeguards made up only a small portion of the total evaluation score for the AI Scribe systems. "Inaccurate weightings could result in the selection of vendors whose AI tools may produce inaccurate or biased medical records or lack adequate protection to safeguard sensitive personal health information," the report said of the scoring regime. The Register reached out to the Ontario Health Ministry for its take on the report, and whether it was going to conform to its recommendations for the AI Scribe program, but we didn't immediately hear back. A spokesperson for the Ministry told the CBC on Wednesday that more than 5,000 physicians in Ontario are participating in the AI Scribe program and there have been no known reports of patient harms associated with the technology. ®
[3]
Doctors' AI Systems Are Hallucinating Nonexistent Medical Issues During Appointments With Patients
Can't-miss innovations from the bleeding edge of science and tech If you've been to a medical appointment in the past two or three years, chances are high that your doctor was using an AI scribe: software that listens into the conversation, transcribing it and structuring it into the format of medical notes. In theory it's a cool idea, but pain points abound. Earlier this week, Ontario's auditor general -- an accountability officer acting under the Legislative Assembly of Ontario -- released a special report warning that AI medical scribes were "not evaluated adequately," and may present "fabricated information" to medical professionals. First reported by Global News, the audit took a look at 20 AI scribe platforms and found that "all AI scribe systems from the 20 [government] approved vendors showed one or more inaccuracies at the procurement testing phase," such as "hallucinations (fabrication), incorrect information, or missing or incomplete information." "Inaccuracies in medical notes generated by AI Scribe systems could potentially result in inadequate or harmful treatment plans that may potentially impact patient health outcomes," the report declared. Muddying the waters, Ontario's Minister of Public and Business Service Delivery and Procurement, Stephen Crawford, noted that the hallucinations were observed during testing by state regulators, and had not been recording during actual medical visits. "Let's be very clear about that, that's not actually in operational use with doctors, that's in the optional stage where we're reviewing the various scribes," Crawford told Global News. Still, the auditor general, Shelley Spence, noted that the various scribes are nonetheless in use by around 5,000 doctors across Ontario. Talking to reporters, Spence said she went so far as to ask her physician to "please look at the transcript when you're done with my own visit." That news comes as another AI scribe system, OpenEvidence, faces growing scrutiny in the US over hallucinations and incomplete answers. As several doctors told NBC News, for example, OpenEvidence can occasionally draw overly strong conclusions from medical studies with relatively small sample sizes. While many physicians express appreciation for the new tool, it remains to be seen how they fare under real-world conditions -- and how the medical world will judge them once the AI hype wears off.
Share
Copy Link
An Ontario audit found all 20 government-approved AI medical scribes generated incorrect or fabricated information during testing. Nine systems hallucinated nonexistent patient details, 12 recorded wrong medication names, and 17 missed mental health issues—raising serious concerns about patient safety despite 5,000 doctors already using these tools.
The Office of the Auditor General of Ontario has uncovered troubling flaws in AI medical scribes used by thousands of physicians across the province. According to a recent report on artificial intelligence use in Ontario's government, all 20 AI note-taking systems approved by provincial authorities demonstrated significant problems with accuracy during standardized testing
1
. The findings raise urgent questions about patient safety as these systems handle sensitive medical information and help shape treatment decisions.During procurement evaluations, medical professionals reviewed simulated doctor-patient recordings alongside AI-generated notes to assess their reliability. What they discovered was alarming: nine out of 20 systems fabricated information and made suggestions to patients' treatment plans that were never discussed in the recordings
2
. These AI hallucinations included nonexistent referrals for blood tests or therapy, and false claims that no masses were found during examinations1
.
Source: Futurism
The Ontario audit revealed that 12 of the 20 evaluated systems inserted incorrect drug information into patient notes, potentially leading to dangerous prescription errors
2
. Perhaps most concerning, 17 systems missed key details about patients' mental health issues that were clearly discussed in the recordings, with six missing these issues fully or partially2
. The auditor general warned that such inaccuracies in medical notes could potentially result in inadequate or harmful patient treatment plans that may impact health outcomes1
.
Source: The Register
Despite these serious issues with medical note accuracy, the AI scribe program continues to operate. More than 5,000 physicians in Ontario are currently participating in the initiative launched by the Ministry of Health for doctors, nurse practitioners, and other healthcare professionals
2
. A Ministry spokesperson told CBC that there have been no known reports of patient harms associated with the technology, though auditor general Shelley Spence has advised patients to ask their doctors to review AI-generated transcripts carefully3
.The report identified fundamental problems with how the provincial government assessed these AI systems. Medical note accuracy contributed only 4 percent to vendors' total evaluation scores, while having a domestic presence in Ontario accounted for 30 percent
1
. Bias controls represented just 2 percent of the score, and security/privacy assessments added another 2 percent2
. This skewed weighting meant vendors could achieve approval even with a zero score on accuracy metrics.Across all approved vendors, the average AI scribe scored only 12 out of 20 on the accuracy portion of Supply Ontario's evaluation rubric
1
. The auditor general concluded these systems were not evaluated adequately and recommended that IT departments force doctors to confirm their review of AI-produced notes before committing them to patient logs .Related Stories
While Minister Stephen Crawford clarified that the documented hallucinations occurred during testing rather than operational use with doctors, the fact remains that these inaccurate and fabricated information-prone systems are now handling real patient data
3
. OntarioMD, which supports physicians in adopting new technologies and was involved in the procurement process, has recommended manual review of AI notes, but no mandatory attestation feature exists in any approved system2
.The Ontario findings align with broader concerns about AI in healthcare. In the United States, OpenEvidence faces scrutiny for drawing overly strong conclusions from medical studies with small sample sizes, according to physicians interviewed by NBC News
3
. Many overworked doctors have embraced these tools to reduce administrative tasks, but the technology's reliability remains questionable. Healthcare providers are not required to use government-approved systems and may purchase from non-approved vendors, creating additional oversight challenges1
. As adoption grows, medical professionals and patients alike must remain vigilant about verifying AI-generated documentation to protect patient safety.
Source: Ars Technica
Summarized by
Navi
[2]
26 Oct 2024•Technology

27 Jan 2026•Health

09 Feb 2026•Health

1
Technology

2
Technology

3
Technology
