娇色导航

Our Network

Patients may suffer from hallucinations of AI medical transcription tools

News
Oct 29, 20244 mins

One transcription product that relies on an AI model deletes the original audio, leaving doctors no way to check the transcriptions.

Doctors, nurse or laptop in night healthcare, planning research or surgery teamwork in wellness hospital. Talking, thinking or medical women on technology for collaboration help or life insurance app
Credit: PeopleImages.com - Yuri A / Shutterstock

娇色导航

An AI-powered transcription tool widely used in the medical field, has been found to hallucinate text, posing potential risks to patient safety, according to a recent academic study.

And that tool is being used in a commercial medical transcription product that, worryingly, deletes the underlying audio from which transcriptions are generated, leaving medical staff no way to verify their accuracy, on Saturday.

OpenAI’s Whisper, the underlying AI tool, is integrated into medical transcription services from Nabla, which the company says are used by over 30,000 clinicians at more than 70 organizations. Nabla told AP its product had been used to transcribe around 7 million medical visits.

Whisper is also embedded in Microsoft’s and Oracle’s cloud computing platforms and integrated with certain versions of ChatGPT. Despite its wide adoption, researchers are now raising serious concerns about its accuracy.

In a study conducted by researchers from Cornell University, the University of Washington, and others, researchers discovered that Whisper “hallucinated” in about 1.4% of its transcriptions, sometimes inventing entire sentences, nonsensical phrases, or even dangerous content, including violent and racially charged remarks.

The study, Whisper: Speech-to-Text Hallucination Harms, found that Whisper often inserted phrases during moments of silence in medical conversations, particularly when transcribing patients with aphasia, a condition that affects language and speech patterns.

In these cases, the AI sometimes fabricated unrelated phrases, such as “Thank you for watching!” — likely due to its training on a large dataset of YouTube videos. In more concerning instances, it invented fictional medications like “hyperactivated antibiotics” and even injected racial commentary into transcripts, AP reported.

For example, Whisper correctly transcribed a speaker’s reference to “two other girls and one lady” but added “which were Black,” despite no such racial context in the original conversation.

Whisper is not the only AI model that generates such errors. In a separate study, researchers found that were also prone to hallucinations.

Harmful hallucinations

Whisper’s errors are a result of the AI model creating patterns based on its training data that do not exist in the samples, leading to nonsensical or fabricated outputs. This phenomenon, known as hallucination, has been documented across various AI models. According to the researchers, 40% of Whisper’s hallucinations could have harmful consequences, as the AI misinterpreted or misrepresented the speaker’s intent in several cases.

Although Whisper’s creators have claimed that the tool possesses “,” multiple studies have shown otherwise.

In one study of public meetings cited by AP, a researcher from the University of Michigan found hallucinations in eight of every 10 audio transcriptions. Another machine learning engineer reported hallucinations in about half of over 100 hours of transcriptions inspected. A third study identified hallucinations in nearly every one of 26,000 transcripts generated using Whisper, AP said.

Microsoft, which offers Whisper as part of its cloud computing services, in the solutions they offer to “obtain appropriate legal advice to review your solution, particularly if you will use it in sensitive or high-risk applications.”

Despite this, many healthcare providers are already adopting it for transcribing patient consultations.

Nabla, the company integrating Whisper into its medical transcription tools, has acknowledged the hallucination issue and is reportedly working to address it, the AP report said.

With over 4.2 million downloads on the open-source AI platform in the past month, Whisper has become one of the most popular speech recognition models. However, as its usage spreads, researchers are warning against its adoption in critical sectors like healthcare due to the serious implications of its errors.

While other AI transcription tools also make mistakes, the frequency and potential harm caused by Whisper’s hallucinations are raising red flags. Similar AI models, such as Google’s AI Overviews, have faced criticism for producing similarly outlandish outputs, such as recommending non-toxic glue to keep cheese from falling off pizza.

As the healthcare industry increasingly integrates AI solutions, the risks posed by such hallucinations demand immediate attention to avoid harmful consequences for patients.