OpenAI Voice AI Models: GPT-Realtime-2 Gets GPT-5 Reasoning

OpenAI Introduces Three Specialized Voice AI Models for Developers

OpenAI has launched three new voice AI models that fundamentally change how developers can build voice-powered applications. The company introduced GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper through its Realtime API, separating conversational reasoning, translation, and transcription into specialized components rather than bundling them in a single voice product 1

. This modular approach allows enterprises to assign each task to the appropriate model rather than routing everything through a single, all-encompassing voice system.

Source: Inc.

GPT-Realtime-2 Brings GPT-5 Class Reasoning to Live Conversations

GPT-Realtime-2 represents OpenAI's first voice model with GPT-5 class reasoning designed specifically for live conversational use cases 1

. The model can handle difficult requests and keep conversations flowing naturally while processing complex agentic tasks. OpenAI expanded the context window from 32K to 128K tokens, enabling longer and more detailed customer interaction without requiring session resets or state compression layers that previously plagued enterprise deployments 5

Developers gain granular control over the model's behavior, including the ability to specify phrases the voice agents should use, adjust reasoning effort based on task complexity, and call multiple tools simultaneously 3

. The model can even narrate its actions with phrases like "checking your calendar" or "let me look into that," making interactions feel more natural and transparent 2

Source: FoneArena

Real-Time Multilingual Translation Across 70+ Languages

GPT-Realtime-Translate functions as a universal translator, supporting real-time multilingual translation across more than 70 input languages and 13 output languages 2

. The model translates speech instantly while preserving meaning and pacing, maintaining accuracy even with interruptions, accent variations, or context switching 4

. Deutsche Telekom is testing the technology for scenarios where users speak different languages while the system translates conversations instantly with low latency 4

Streaming Speech-to-Text with GPT-Realtime-Whisper

GPT-Realtime-Whisper delivers low-latency speech-to-text transcription by converting spoken audio into text as the speaker talks, unlike traditional models that wait for the speaker to finish 2

. This streaming speech-to-text capability enables real-time understanding for live captions, meeting notes, and voice-powered workflows where immediate transcription is essential 5

Business Applications Already in Development

Companies including Zillow, Priceline, Deutsche Telekom, Vimeo, and Glean are already building business applications with these real-time voice AI models 3

. Zillow is developing a voice assistant that can search homes and schedule tours from a single spoken request 2

. Priceline is working toward full trip management through voice, enabling users to check flights and hotels, cancel them, book new ones, handle delays, and receive TSA updates—all through conversational reasoning 4

Source: VentureBeat

Pricing and Availability Through Realtime API

All three models are now available through OpenAI's Realtime API. GPT-Realtime-2 costs $32 per 1 million audio input tokens and $64 per 1 million audio output tokens 5

. GPT-Realtime-Translate is priced at $0.034 per minute, while GPT-Realtime-Whisper costs $0.017 per minute 2

. Developers can test the models in the OpenAI Playground or integrate them into applications immediately 4

Organizations evaluating these models will need to consider their orchestration architecture and whether their stack can route discrete voice tasks to specialized models and manage state across the expanded context window 1

. The new models compete against alternatives like Mistral's Voxtral models, which also separate transcription and target enterprise use cases 1

OpenAI launches voice AI models with GPT-5 class reasoning for real-time multilingual conversations

OpenAI Introduces Three Specialized Voice AI Models for Developers

GPT-Realtime-2 Brings GPT-5 Class Reasoning to Live Conversations

Real-Time Multilingual Translation Across 70+ Languages

Streaming Speech-to-Text with GPT-Realtime-Whisper

Business Applications Already in Development

Pricing and Availability Through Realtime API

References

OpenAI voice models get GPT-5-class reasoning

OpenAI's new voice AI can listen, think, and talk back in 70+ languages

OpenAI's Brand New Voice AI Is Here. It Could Change How Companies Talk to Their Customers

OpenAI rolls out GPT-Realtime-2, Translate and Whisper audio models for voice AI

OpenAI launches 3 advanced realtime voice AI models: Here is what they can do

Related Stories

OpenAI Unveils GPT-Realtime: A Game-Changer for Enterprise Voice AI

OpenAI Unveils Advanced AI Audio Models for Transcription and Voice Generation

OpenAI's Realtime API: A Game-Changer for Smart Speakers and Voice Assistants

Recent Highlights

Pope Leo XIV releases first AI encyclical calling for disarmament from monopolistic control

AI passes the Turing Test as GPT-4.5 appears more human than actual people in landmark study

Google AI Search officially replaces traditional web search with Gemini-powered conversations

Recent Highlights

Today's Top Stories

Anthropic releases Claude Opus 4.8, making honesty a defining feature in AI models

Google Search adds Preferred Sources to AI Overviews, introduces article carousels

Oura Ring 5 debuts 40% smaller design with AI health coach and proactive health monitoring

Meta Launches Paid Subscription Plans for Facebook, Instagram, and WhatsApp with AI Tiers