Resemble AI, an artificial intelligence startup, has partnered with Elevenlabs to launch "Chatterbox Turbo," an open-source text-to-speech model capable of cloning a voice from just five seconds of audio input.
The company claims this new model outperforms both Elevenlabs and Cartesia in terms of voice quality, while delivering first audio output in under 150 milliseconds. This level of speed makes it particularly appealing for developers building real-time applications such as customer service bots, support systems, gaming avatars, virtual characters, and social platforms. Additionally, enterprises in regulated industries may benefit from the built-in "PerTh" watermarking feature, which helps verify whether speech was generated by AI.
Resemble AI has released Chatterbox Turbo under the MIT license, allowing unrestricted use, modification, and redistribution—even for commercial purposes—at no cost. The model is available for testing on Hugging Face, RunPod, Modal, Replicate, and Fal, with full source code accessible on GitHub. Resemble AI also offers hosted solutions and plans to release a low-latency version soon.