DeepL Launches Real-Time Voice Translation Feature, Expands into Audio Domain

2024-11-14

DeepL, a prominent German AI translation service provider, has earned a reputation for its online text translation services, which are touted to be more nuanced and precise than those of competitors like Google. The company has recently achieved a valuation of $2 billion and now serves over 100,000 paying customers. As the AI services boom continues, DeepL has introduced a new audio mode to its platform, known as DeepL Voice. This feature enables users to automatically translate spoken language into another language in real-time as they listen.

Currently, DeepL Voice supports languages including English, German, Japanese, Korean, Swedish, Dutch, French, Turkish, Polish, Portuguese, Russian, Spanish, and Italian. Although DeepL Voice does not provide translations in audio or video file formats, users can view the translated text during live conversations and video meetings. These translations can be displayed as mirrored text on smartphones for both parties to view or shared as transcribed text with the counterpart. In video conferencing services, translations appear as subtitles.

The founder and CEO of DeepL has hinted at the possibility of offering additional output formats in the future. While DeepL Voice is the company's first voice product, it is likely not the only one. He emphasized that voice translation will be a major focus in the translation industry over the next year.

In reality, other technology companies are also entering this space. For example, one of DeepL's main competitors, Google, has added real-time translation subtitles to its Meet video conferencing service. Additionally, several AI startups are developing voice translation services, such as AI voice specialist ElevenLabs and Panjaya, which uses "deepfake" voice and video for translation. Notably, Panjaya utilizes ElevenLabs' API, and ElevenLabs itself leverages DeepL's technology to support its translation services.

DeepL Voice currently does not offer an API and is primarily targeted at the B2B market, working directly with clients and partners. In terms of video calling services, only Microsoft's Teams currently supports DeepL's subtitle functionality. There is no definitive information yet on whether Zoom and Google Meet will integrate DeepL Voice in the future.

Since its establishment in 2017, real-time voice translation has remained one of DeepL's most demanded features by users. However, DeepL has adopted a robust product development strategy, aiming to build services from the ground up rather than relying on and adjusting large language models from other companies. For instance, in July this year, DeepL released a new large language model optimized specifically for translation, reportedly outperforming GPT-4 as well as Google's and Microsoft's products. Additionally, DeepL continues to enhance the quality and vocabulary of its written translations.

A key unique selling point of DeepL Voice is its real-time translation capability, which poses a challenge for many "AI translation" services in the market that often suffer from delays, making them difficult to use in real-time scenarios. DeepL stated that its focus on text translation is also driven by technical considerations: text translation computation and generation are extremely fast, whereas audio and video processing and AI architecture in this area still require improvement.

Beyond video conferencing and meeting scenarios, DeepL envisions applying this functionality in the service industry, such as enabling frontline restaurant staff to communicate more easily with customers using the service.