OpenAI launches voice AI models with GPT-5 class reasoning for real-time multilingual conversations

Reviewed byNidhi Govil

5 Sources

Share

OpenAI introduced three specialized voice AI models that bring advanced reasoning to live conversations. GPT-Realtime-2 features GPT-5 class reasoning with a 128K token context window, GPT-Realtime-Translate handles real-time multilingual translation across 70+ languages, and GPT-Realtime-Whisper delivers streaming speech-to-text transcription. Companies like Zillow and Priceline are already building voice agents that can complete complex tasks during ongoing conversations.

OpenAI Introduces Three Specialized Voice AI Models for Developers

OpenAI has launched three new voice AI models that fundamentally change how developers can build voice-powered applications. The company introduced GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper through its Realtime API, separating conversational reasoning, translation, and transcription into specialized components rather than bundling them in a single voice product

1

. This modular approach allows enterprises to assign each task to the appropriate model rather than routing everything through a single, all-encompassing voice system.

Source: Inc.

Source: Inc.

GPT-Realtime-2 Brings GPT-5 Class Reasoning to Live Conversations

GPT-Realtime-2 represents OpenAI's first voice model with GPT-5 class reasoning designed specifically for live conversational use cases

1

. The model can handle difficult requests and keep conversations flowing naturally while processing complex agentic tasks. OpenAI expanded the context window from 32K to 128K tokens, enabling longer and more detailed customer interaction without requiring session resets or state compression layers that previously plagued enterprise deployments

5

.

Developers gain granular control over the model's behavior, including the ability to specify phrases the voice agents should use, adjust reasoning effort based on task complexity, and call multiple tools simultaneously

3

. The model can even narrate its actions with phrases like "checking your calendar" or "let me look into that," making interactions feel more natural and transparent

2

.

Source: FoneArena

Source: FoneArena

Real-Time Multilingual Translation Across 70+ Languages

GPT-Realtime-Translate functions as a universal translator, supporting real-time multilingual translation across more than 70 input languages and 13 output languages

2

. The model translates speech instantly while preserving meaning and pacing, maintaining accuracy even with interruptions, accent variations, or context switching

4

. Deutsche Telekom is testing the technology for scenarios where users speak different languages while the system translates conversations instantly with low latency

4

.

Streaming Speech-to-Text with GPT-Realtime-Whisper

GPT-Realtime-Whisper delivers low-latency speech-to-text transcription by converting spoken audio into text as the speaker talks, unlike traditional models that wait for the speaker to finish

2

. This streaming speech-to-text capability enables real-time understanding for live captions, meeting notes, and voice-powered workflows where immediate transcription is essential

5

.

Business Applications Already in Development

Companies including Zillow, Priceline, Deutsche Telekom, Vimeo, and Glean are already building business applications with these real-time voice AI models

3

. Zillow is developing a voice assistant that can search homes and schedule tours from a single spoken request

2

. Priceline is working toward full trip management through voice, enabling users to check flights and hotels, cancel them, book new ones, handle delays, and receive TSA updates—all through conversational reasoning

4

.

Source: VentureBeat

Source: VentureBeat

Pricing and Availability Through Realtime API

All three models are now available through OpenAI's Realtime API. GPT-Realtime-2 costs $32 per 1 million audio input tokens and $64 per 1 million audio output tokens

5

. GPT-Realtime-Translate is priced at $0.034 per minute, while GPT-Realtime-Whisper costs $0.017 per minute

2

. Developers can test the models in the OpenAI Playground or integrate them into applications immediately

4

.

Organizations evaluating these models will need to consider their orchestration architecture and whether their stack can route discrete voice tasks to specialized models and manage state across the expanded context window

1

. The new models compete against alternatives like Mistral's Voxtral models, which also separate transcription and target enterprise use cases

1

.

Today's Top Stories

TheOutpost.ai

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Instagram logo
LinkedIn logo
Youtube logo
© 2026 TheOutpost.AI All rights reserved