
OpenAI has introduced three new audio models through its API aimed at supporting developers building voice-based applications with live reasoning, translation and transcription capabilities.
The models are designed to support more natural voice interactions and real-time task execution across applications such as customer support, travel assistance and multilingual communication.
The new models include GPT Realtime 2, a voice model with GPT-5 class reasoning capabilities; GPT Realtime Translate, a live speech translation model; and GPT Realtime Whisper, a streaming speech-to-text transcription model.
According to the company, GPT Realtime 2 is designed for live voice interactions where the system can process requests, manage interruptions, call tools and continue conversations while handling more complex tasks. The model includes support for audible preambles such as “let me check that,” parallel tool calls, longer context windows of up to 128K tokens and adjustable reasoning levels.
The GPT Realtime Translate supports more than 70 input languages and 13 output languages. The model is intended for use cases including customer support, education, live events and multilingual business communication.
GPT Realtime Whisper is a low-latency speech-to-text model that transcribes speech while users are speaking. The model can be used for captions, meeting notes, customer support workflows and live voice assistants.
The company also outlined safety measures built into the Realtime API, including active classifiers that can halt conversations violating harmful content policies. Developers can add additional safeguards through the company’s Agents SDK. Developers must disclose when users are interacting with AI systems unless it is already clear from the context.
The Realtime API supports EU data residency requirements for Europe-based applications, according to the company.
The models are now available through the company’s Realtime API.




