On Monday, ElevenLabs, a startup offering AI voice cloning and text-to-speech APIs, officially unveiled its capability to create conversational AI bots.
The company announced that users can now build comprehensive conversational agents on ElevenLabs' developer platform, including customizable variables such as tone, response length, and more.
Previously, ElevenLabs primarily focused on providing diverse voices and AI tools for text-to-speech services. Sam Sklar, the company's growth director, told TechCrunch that many customers have already utilized this feature to create conversational AI agents. However, integrating knowledge bases and handling customer interruptions remained the most challenging aspects. Consequently, the company decided to develop a complete workflow for chatbots.
After logging into their ElevenLabs account, users can begin building conversational agents by selecting a template or creating a new project. They can choose the agent's primary language, initial message, and system prompts to define the agent's personality. Developers also need to select a large language model (such as Gemini, GPT, or Claude), set the response "temperature" (determining the creativity of responses), and impose token usage limits.
Additionally, users can adjust other parameters, including voice, latency, stability, authentication standards, and the maximum length of conversations with the AI agent.
Users can add their own knowledge bases for the chatbot, such as documents, URLs, or text blocks. They can also integrate custom large language models (LLMs) with the bot. ElevenLabs' SDK is compatible with Python, JavaScript, React, and Swift. The company also offers a WebSocket API for enhanced customization capabilities.
Businesses can establish standards to collect specific data points, such as the names and emails of customers interacting with the agent, as well as natural language-based evaluation criteria to define the success or failure of interactions.
ElevenLabs is leveraging its existing text-to-speech workflow. The company needs to develop speech-to-text functionality for its new conversational AI products. Currently, the company does not offer a speech-to-text API as a standalone product, but may do so in the future, competing with Google, Microsoft, and Amazon's speech-to-text APIs, as well as specialized APIs like OpenAI's Whisper, AssemblyAI, Deepgram, Speechmatics, and Gladia.
ElevenLabs is seeking a new round of funding, with a valuation exceeding $3 billion. Additionally, the company is competing with other voice AI startups such as Vapi and Retell, which are also building conversational agents. Notably, ElevenLabs will compete with OpenAI's real-time conversational API. However, ElevenLabs believes that its customization features and model-switching capabilities will give it an advantage over OpenAI.