OpenAI's New Models Listen, Translate & Act in Real Time

May 8, 2026

TL;DR

OpenAI introduces three new audio models for its developer platform: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper.
These models enable the creation of conversational voice agents that can listen, reason, and act in real time.
GPT-Realtime-2 offers GPT-5 class reasoning for natural conversation flow and complex requests.
GPT-Realtime-Translate provides live translation from over 70 languages to 13 output languages, useful for customer support and education.
GPT-Realtime-Whisper is a streaming speech-to-text model for live transcription, aiding in captions and meeting notes.
Priceline and Deutsche Telekom are among the companies exploring these new voice AI capabilities.
Safeguards against misuse, including harmful content classifiers and usage policies, are integrated into the Realtime API.
Developers must ensure end users are aware when interacting with AI.

Continue reading the original article