ElevenLabs Launches Standalone Speech-to-Text Model Scribe

2025-02-27

An artificial intelligence startup named ElevenLabs has successfully raised $180 million. The company is well-known for its audio generation technology. Recently, it launched its first standalone speech-to-text model called Scribe, which signifies a new direction in its technological development.

ElevenLabs is now valued at $3.3 billion and has previously offered speech-to-text services to numerous companies using its extensive voice database. Moving forward, the company aims to enter the speech recognition market, competing against firms like Gladia, Speechmatics, AssemblyAI, Deepgram, and OpenAI's Whisper model.

Upon release, the Scribe model supports more than 99 languages. ElevenLabs categorizes the recognition accuracy of 25 of these languages as "excellent," with a word error rate below 5%. These include English (with an accuracy rate claimed to be 97%), French, German, Hindi, Indonesian, Japanese, Kannada, Malayalam, Polish, Portuguese, Spanish, and Vietnamese. Other languages are grouped into different categories based on their word error rates: "high" (5% to 10%), "good" (10% to 20%), and "moderate" (25% to 50%).

According to ElevenLabs, the Scribe model outperforms Google’s Gemini 2.0 Flash and Whisper Large V3 across various languages in the FLEURS and Common Voice benchmarks.

Last year, ElevenLabs developed a speech-to-text component for its AI conversational agent platform. However, this marks the first time the company has released an independent speech recognition model.

Scribe also features smart speaker speaker identification, enabling it to recognize speakers, provide word-level timestamps for precise captioning, and automatically tag sound events such as audience laughter. ElevenLabs offers its studio clients a direct way to transcribe video content into subtitles or captions.

Currently, Scribe only supports pre-recorded audio formats. The company states that a low-latency real-time version of the model will be introduced soon. This means it is not yet suitable for meeting transcription or voice notes.

ElevenLabs prices Scribe at $0.40 per hour of transcribed audio. While competitive, some competitors currently offer lower rates for audio transcription along with varying features.