Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Businesses looking to use AI models to transcribe audio, specifically human speech, from executives, employees, and customers, may be wary of the idea of an AI program listening to and recording sensitive information.
However, the Israeli audio AI startup aiOla has a new model that addresses this very concern. Built atop OpenAI's industry-standard open source model Whisper, the new Whisper-NER from aiOla is itself fully open source and available now on Hugging Face and Github for enterprises organizations, and individuals to take, use, adapt, modify and deploy.
It integrates automatic speech recognition (ASR) with named entity recognition (NER). This innovation aims to enhance privacy by automatically identifying and masking sensitive information such as names, phone numbers, and addresses during the transcription process.
A demo model is available for users to try on Hugging Face as well, allowing them to record snippets of speech and have the model mask specific words they type in, in the resulting typed transcript. The model performed successfully in my brief test of masking the word "VentureBeat" in my speech, which is a proper noun and jaron.
Whisper-NER addresses a significant challenge in the transcription of spoken content: ensuring privacy and compliance with data protection regulations. The model processes audio files and simultaneously applies NER to tag or mask specific types of sensitive information directly within the transcription pipeline. Unlike traditional multi-step systems, which leave data exposed during intermediary processing stages, Whisper-NER eliminates the need for separate ASR and NER tools, reducing vulnerability to breaches.
"We designed this as an open-source tool to advance privacy in AI," said Gill Hetz, Vice President of Research at aiOla, in a recent video call interview with VentureBeat. "It helps users mask sensitive data without needing additional software steps."
Previously, aiOla was noted for releasing Whisper variants that could accurately and reliably recognize industry-specific jargon and transcribe it, as well as a much faster speech-to-text and speech recognition model.
Fully Open Source for Community and Commercial Use
Whisper-NER is fully open source and available under the MIT License, allowing users to adopt, modify, and deploy it freely, including for commercial applications.
The model can be accessed on GitHub and Hugging Face, ensuring its advanced capabilities are broadly available. A demo is also provided to help users explore its functionality and adaptability.
The open-source release aligns with aiOla's philosophy of fostering collaboration and innovation.
"AI moves forward when people collaborate," Hetz said. "That's why we've made this model open source -- to encourage adoption and improvement by the community."
Innovation in Speech and Data Privacy
Built on OpenAI's Whisper framework, Whisper-NER was trained on a synthetic dataset combining synthetic speech and text-based NER datasets. This unique training approach allowed the model to handle transcription and entity recognition tasks simultaneously, offering superior accuracy.
"Instead of separating ASR transcription and NLP [natural language processing] entity extraction, we solved both in one block," said Hetz. "When extracting text, the model simultaneously identifies specified entities."
This integrated approach, described in a research paper published to the open access, non-peer reviewed site arXiv.org, not only simplifies workflows but also significantly enhances data security.
Additionally, Whisper-NER supports zero-shot learning, enabling it to recognize and mask entity types that were not explicitly included during training.
The flexibility of Whisper-NER makes it suitable for a variety of use cases, including compliance monitoring, inventory management, quality assurance, and more.
For applications that do not require masking, the model can be configured to simply tag sensitive entities, providing organizations with customizable options to suit their needs.
"Highly regulated industries like healthcare and law benefit most from our privacy-first approach, but even companies with limited sensitive data can use this technology," said Hetz.
Ethical AI and Adaptability
Whisper-NER represents a step forward in ethical AI development by enabling secure, privacy-focused transcription. Its open-source availability ensures that developers, researchers, and organizations can freely incorporate the model into their operations. By reducing risks associated with data breaches, it aligns with the growing demand for secure, AI-powered solutions in industries like healthcare, legal, and customer service.
"This version, built on Whisper, is best for English but supports multiple languages. Open-source contributors can adapt it further for diverse languages and jargon," Hetz explained. aiOla encourages global contributions to extend the model's reach and functionality.
With Whisper-NER now available to the public, aiOla reinforces its commitment to creating responsible AI tools that prioritize user privacy and security while fostering collaboration and innovation through open access.