release

Introducing next-generation audio models in the API

Builders can now create more lifelike and adaptable voice interactions with less effort, opening up new possibilities for personalized user experiences in customer service, virtual assistants, and accessibility tools.

OpenAI Blog·March 20, 2025·1 min readrelease

releaseIntroducing next-generation audio models in the API

openai.com

What happened

OpenAI has released next-generation audio models in its API, introducing enhanced text-to-speech capabilities. The new models allow developers to provide specific instructions for voice style, such as 'talk like a sympathetic customer service agent,' enabling fine-grained control over tone and delivery. This update adds to OpenAI's existing speech and audio offerings, which include speech recognition and voice synthesis. For developers building AI-powered voice agents or interactive voice response systems, this feature reduces the need for complex audio post-processing or separate voice modulation tools, streamlining the creation of more natural and context-appropriate vocal interactions. The models are available via the API, giving builders programmatic access to generate customized speech outputs. This move reflects growing demand for expressive voice AI in applications like virtual assistants, customer support bots, and content creation.

Key takeaways

OpenAI released new audio models in its API with advanced text-to-speech capabilities.
Developers can now instruct the model to speak in a specific style, e.g., 'like a sympathetic customer service agent.'
The feature enables customization of tone and delivery without additional audio processing.
The models are accessible via API for integration into voice agents and other audio applications.
This update addresses demand for more natural and controllable voice AI in various domains.