OpenAI Launches GPT-Realtime Voice Dialogue Model with Precise Emotional Perception and Seamless Multilingual Switching for Diverse Scenarios AI NEWS

Home
AInews
OpenAI Launches GPT-Realtime Voice Dialogue Model with Precise Emotional Perception and Seamless Multilingual Switching for Diverse Scenarios

OpenAI Launches GPT-Realtime Voice Dialogue Model with Precise Emotional Perception and Seamless Multilingual Switching for Diverse Scenarios

2025-08-29

On August 29, OpenAI officially launched its “Realtime API” into production, moving it out of the Beta testing phase.

This API is primarily aimed at businesses and developers, enabling them to build voice assistants for real-world applications such as customer service, education, and personal productivity. Its core component, the “gpt-realtime” model, employs an end-to-end Speech-to-Speech architecture, allowing direct generation and processing of audio without the need for intermediate text conversion. According to OpenAI, this model delivers faster response times, more natural-sounding voices, and improved handling of complex commands compared to its predecessor.

OpenAI highlighted that the gpt-realtime model can detect non-speech signals like laughter, supports mid-conversation language switching, and allows customization of voice tone—for instance, a friendly tone with a French accent or a fast-paced professional delivery. Additionally, two new voices, “Cedar” and “Marin,” have been introduced, along with enhancements to eight existing voice options.

In benchmark testing, the model demonstrated significant improvements: accuracy in the Big Bench Audio test increased from 65.6% to 82.8%, from 20.6% to 30.5% in the MultiChallenge test, and from 49.7% to 66.5% in the ComplexFuncBench test.

The API update also streamlines tool integration. The model now better selects appropriate tools, triggers them at the right moments, and configures parameters accurately, enhancing the reliability of function calls. Developers can connect external tools and services using the Session Initiation Protocol (SIP) and Media Control Protocol (MCP) servers. Reusable prompt functionality allows saving configurations and tool settings for different use cases, further improving development efficiency.

Image input support is now available in the API. During conversations, users can send screenshots or photos, and the model can interpret the visual content—such as reading text within an image or answering questions related to the image. Developers have control over which images the model can access.

Additionally, two new features have been introduced: developers can set token usage limits and condense long conversation histories. These features help manage costs more effectively during extended interactions. Pricing for the gpt-realtime model has been reduced by 20%, with current rates at $32 per million input audio tokens, $64 per million output audio tokens, and $0.40 per million cached input tokens.

OpenAI noted that the API includes the capability to detect harmful content and can automatically terminate conversations that violate platform policies. However, as seen in the safety evolution of language models, this should not be the sole security measure—developers are still encouraged to implement their own safety protocols.

For users in the European Union, the API offers data localization options and customized privacy policies for enterprise clients, ensuring compliance with regional data protection regulations.

Warp

Warp - AI coding using the terminal

Pixop

Pixop - AI video enhancement and upscaling platform

Swimm

Swimm - Reverse engineer your code

Retell AI

Retell AI - AI voice and chat agents that can make calls and send chat messages

Muset

Muset - The AI-native workspace for deep creators

Glasp

Glasp - Highlight and summarize web content effortlessly

Hybrid AI

Hybrid AI - Build custom AI agents for Web3 data

RECENT AI TOOLS

ScrapFly

Warp

Pixop

Swimm

Retell AI

RECENT AI NEWS

OpenAI's Non-Profit Parent Company Will Receive Over $100 Billion in Shares from Its Profit-Making Unit

F5 Acquires AI Security Company CalypsoAI for $180 Million

Microsoft Visual Studio 2026 Introduces “AI Integration into Workflows”

NVIDIA Supports QuEra in Expanded $230M Funding Round

FTC Investigates AI Chatbot Companions from Companies like Meta and OpenAI

OpenAI Partners with Oracle on $300 Billion Cloud Computing Agreement to Advance AI Development

Microsoft and OpenAI Continue to Surpass Partnership Boundaries

Arm Launches Lumex Chip Series Optimized for Mobile AI

RECENT AI TOOLS