OpenAI launches three voice AI models that translate 70+ languages and reason through live conversations

Reviewed byNidhi Govil

4 Sources

Share

OpenAI has introduced three audio models through its Realtime API that enable developers to build voice-powered applications with advanced conversational capabilities. GPT-Realtime-2 brings GPT-5-class reasoning to live interactions, GPT-Realtime-Translate handles real-time language translation across 70+ input languages, and GPT-Realtime-Whisper delivers streaming transcription. Companies like Zillow, Priceline, and Deutsche Telekom are already testing these models to build voice assistants that can reason through requests and take action during conversations.

OpenAI Unveils Three Audio Models for Advanced Voice Interactions

OpenAI has launched three new audio models through its Realtime API that mark a significant shift in how voice AI systems operate. The models—GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper—enable developers to build voice-powered applications that move beyond simple call-and-response patterns toward systems capable of listening, reasoning, and taking action during live conversations

1

. Together, these audio models address three critical requirements for effective voice systems: the ability to understand context, execute multi-step tasks, and maintain natural conversation flow without interruption

3

.

Source: Inc.

Source: Inc.

GPT-Realtime-2 Brings GPT-5-Class Reasoning to Live Conversations

GPT-Realtime-2 represents OpenAI's first voice model with GPT-5-class reasoning designed specifically for live conversational use cases

3

. The model can handle complex requests without losing the thread of conversation, call multiple tools simultaneously, and even narrate its actions with phrases like "checking your calendar" or "let me look into that"

1

. OpenAI expanded the context window from 32K to 128K tokens, allowing for longer and more detailed conversations that maintain coherence across extended sessions

4

. Developers can adjust the reasoning effort based on task complexity, choosing between faster responses for simple queries or deeper thinking for more demanding requests

1

. The model also gives developers granular control, including the ability to specify phrases the voice agent should use frequently and direct it to call multiple tools at once, enabling simultaneous searches across different data sources

2

. Zillow is currently testing GPT-Realtime-2 to build an assistant that helps prospective homebuyers identify locations and autonomously schedule home tours from a single spoken request

1

.

Source: FoneArena

Source: FoneArena

Real-Time Language Translation Across 70+ Languages Arrives

GPT-Realtime-Translate delivers what resembles a Universal Translator from science fiction, supporting live speech translation across more than 70 input languages and 13 output languages

1

. The model maintains accuracy even during natural speech conditions including interruptions, accent variations, and context switching

3

. During demonstrations, the system seamlessly translated conversations when new speakers joined and spoke different languages, converting both speakers into English in real time without missing context

1

. This multilingual real-time translation capability enables instant cross-language communication while preserving meaning and pacing throughout conversations

3

. Deutsche Telekom is testing the model for multilingual voice interactions where users speaking different languages can communicate with low latency

3

. Priceline is working toward full trip management through voice, incorporating translation capabilities alongside flight search, hotel changes, delay handling, and TSA updates

3

.

Streaming Transcription Eliminates Waiting for Speech-to-Text

GPT-Realtime-Whisper introduces a low-latency streaming transcription model that converts speech to text as the speaker talks, rather than waiting for them to finish

1

. This speech-to-text transcription capability supports continuous transcription, making voice data immediately usable in workflows

3

. Teams can power live captions for meetings, classrooms, broadcasts, and events, generate notes and summaries while conversations are still in progress, and build voice agents that need to understand users continuously

4

. Vimeo is already using the model for real-time transcription in its platform

1

. The streaming approach proves particularly valuable for customer support, healthcare, sales, recruiting, and other high-volume spoken interactions where waiting for complete transcription would slow workflows

4

.

Business Applications and Developer Access

Companies including Zillow, Priceline, Deutsche Telekom, Vimeo, and Glean are already using these models to build advanced travel agents, multilingual customer support assistants, and more capable voice assistants that can reason through requests and take action in real time

2

. The models enable three key patterns shaping voice-powered applications: agentic voice where users describe tasks and the system executes them using reasoning and tools, contextual voice guidance that turns real-time context into spoken guidance, and multilingual voice enabling real-time conversations across users and contexts

3

. All three models are available through the Realtime API with pricing starting at $0.017 per minute for GPT-Realtime-Whisper, $0.034 per minute for GPT-Realtime-Translate, and $32 per 1 million audio input tokens for GPT-Realtime-2

1

. Developers can test the models in the OpenAI Playground or integrate them into applications immediately

3

. The Realtime API includes multiple layers of safety and compliance protections to ensure responsible deployment across business applications

3

.

Source: Digit

Source: Digit

Today's Top Stories

TheOutpost.ai

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Instagram logo
LinkedIn logo
Youtube logo
© 2026 TheOutpost.AI All rights reserved