AI startup aiOla has introduced a new model called Whisper-NER, designed to address privacy concerns that businesses might face when using artificial intelligence for audio transcription. This model is built upon OpenAI's open-source Whisper model and incorporates both automatic speech recognition (ASR) and named entity recognition (NER). During the transcription process, Whisper-NER automatically identifies and obscures sensitive information such as names, phone numbers, and addresses, ensuring privacy and compliance with data protection regulations while handling speech content.
This new model is now released as fully open-source and is available on Hugging Face and GitHub for enterprises, organizations, and individuals to use, modify, and deploy. Users can try a demo of the model on Hugging Face, experiencing the ability to record audio snippets and automatically mask designated terms in the final text transcription. Tests have demonstrated that the model effectively masks specific terms, including proper nouns and jargon.
Gill Hetz, aiOla's Vice President of Research, stated that the development of this open-source tool aims to advance privacy protection in the AI field. By reducing the need for additional software steps, Whisper-NER assists users in masking sensitive data without increasing complexity. Compared to traditional multi-stage systems, this model eliminates the risk of data exposure during intermediate processing stages, thereby reducing the likelihood of data breaches.
Whisper-NER's source code is released under the MIT License, allowing for free adoption and modification for both community and commercial purposes. The model can be accessed through GitHub and Hugging Face, with its advanced features widely available. Additionally, a demo version is provided, enabling users to explore its functionalities and adaptability.
In terms of training methodology, Whisper-NER is trained using synthetic speech and text-based NER datasets, enabling it to perform transcription and entity recognition tasks simultaneously, thereby enhancing accuracy. The model is designed for zero-shot learning, meaning it can identify and mask entity types that were not explicitly included during training.
For application scenarios where masking is not required, Whisper-NER can be configured to merely tag sensitive entities, offering organizations customizable options according to their needs. Hetz noted that highly regulated industries such as healthcare and legal sectors would benefit the most from this privacy-focused approach, although companies handling less sensitive data can also leverage this technology.