OpenAI Plans to Launch New Audio Model in Q1

2026-01-04

OpenAI is reportedly developing a new artificial intelligence model optimized specifically for audio generation tasks, according to recent reports.

The model is expected to launch by the end of March and is anticipated to produce more natural-sounding speech than OpenAI’s current offerings. It is also said to significantly improve real-time interaction capabilities with users.

The upcoming AI system will allegedly be built on a novel architectural foundation. While OpenAI's existing real-time audio model, GPT-realtime, relies on the widely adopted transformer architecture, it remains unclear whether the company is shifting toward an entirely new algorithmic framework or refining the transformer design with innovative implementations.

Some transformer-based models process speech directly, while others—like OpenAI’s 2022 Whisper model—first convert audio into spectrogram images before analysis. Both Whisper and subsequent audio models from OpenAI have been released in multiple versions with varying output quality, suggesting that the forthcoming release could also come in several iterations.

To accelerate development, OpenAI has consolidated several engineering, product, and research teams focused on advancing its audio AI initiatives. The project is reportedly led by Kundan Kumar, a researcher formerly with Character.AI Inc., a venture-backed AI startup. Many employees from that startup joined Google LLC in late 2024 as part of a $2.7 billion reverse acquisition deal.

The new model may extend beyond voice synthesis to include AI-generated music—a rapidly expanding domain. As highlighted by The Wall Street Journal, emerging player Suno Inc. already generates over $200 million in annual revenue. By entering this space, OpenAI could strengthen its consumer-facing AI services.

This next-generation audio model forms part of OpenAI’s broader strategy to enter the consumer electronics market. According to The Information, the company plans to release an “audio-first personal device” approximately one year from now, with long-term ambitions to roll out a full suite of smart hardware, including intelligent speakers and augmented reality glasses.

In May of last year, OpenAI acquired product design startup io Products Inc. to bolster its hardware ambitions. That acquisition valued the Jony Ive-founded firm at $6.5 billion. As reported by the Financial Times in October, Ive has been working on a smartphone-sized device designed for desktop use.

To support its push into consumer hardware, OpenAI may develop a lightweight, on-device audio model. Local processing reduces reliance on cloud infrastructure, cutting costs and latency—a strategy mirrored by Google in its Pixel smartphones, which use the on-device Gemini Nano model to power select AI features.