OpenAI Unveils Advanced AI Audio Models for Transcription and Voice Generation

7 Sources

OpenAI introduces new AI models for speech-to-text and text-to-speech, offering improved accuracy, customization, and potential for building AI agents with voice capabilities.

News article

OpenAI Introduces Next-Generation Audio AI Models

OpenAI has unveiled a new suite of AI models designed to revolutionize speech-to-text and text-to-speech capabilities. These models, integrated into OpenAI's API, promise enhanced accuracy, customization, and the potential to build more sophisticated AI agents with voice interactions 12.

Advanced Transcription Models

The company has introduced two new speech-to-text models: gpt-4o-transcribe and gpt-4o-mini-transcribe. These models are set to replace OpenAI's previous Whisper model, offering significant improvements in transcription accuracy 1.

Key features of the new transcription models include:

  • Improved performance in challenging environments with diverse accents and speech patterns
  • Reduced hallucination, addressing a known issue with the Whisper model
  • A word error rate approaching 30% for Indic and Dravidian languages 13

Jeff Harris, a member of OpenAI's product staff, emphasized the importance of accuracy: "Making sure the models are accurate is completely essential to getting a reliable voice experience" 1.

Innovative Text-to-Speech Model

The new text-to-speech model, gpt-4o-mini-tts, introduces enhanced "steerability" and customization options 12. Developers can now:

  • Instruct the model to adopt specific speaking styles or emotions
  • Tailor voice experiences for different contexts, such as customer support scenarios
  • Control both the content and manner of spoken outputs 14

Integration with AI Agents

These audio models align with OpenAI's broader vision of creating "agentic" AI systems capable of independently accomplishing tasks 1. The company recently released an Agents SDK, allowing developers to incorporate voice interactions into existing text-based applications with minimal code changes 25.

Pricing and Availability

The new models are available through OpenAI's API with the following pricing structure:

  • GPT-4o-based audio model: $40 per million input tokens, $80 per million output tokens
  • GPT-4o mini-based audio models: $10 per million input tokens, $20 per million output tokens 5

Industry Impact and Competition

These advancements come at a time of increasing competition in the AI transcription and speech space. Companies like ElevenLabs and Hume AI are offering their own specialized models with unique features such as diarization and word-level customization 2.

Departure from Open-Source Approach

Unlike its predecessor Whisper, OpenAI has chosen not to make these new transcription models openly available. The company cites the models' increased size and complexity as reasons for this decision, stating that they are not suitable for local execution on personal devices 13.

As AI continues to evolve, OpenAI's latest audio models represent a significant step forward in creating more natural and versatile voice interactions, potentially transforming various industries from customer service to creative storytelling.

Explore today's top stories

Databricks Secures $1 Billion Funding at $100 Billion Valuation, Targets AI Database Market

Databricks raises $1 billion in a new funding round, valuing the company at over $100 billion. The data analytics firm plans to invest in AI database technology and an AI agent platform, positioning itself for growth in the evolving AI market.

TechCrunch logoReuters logoCNBC logo

11 Sources

Business

10 hrs ago

Databricks Secures $1 Billion Funding at $100 Billion

SoftBank's $2 Billion Investment in Intel: A Strategic Move in the AI Chip Race

SoftBank makes a significant $2 billion investment in Intel, boosting the chipmaker's efforts to regain its competitive edge in the AI semiconductor market.

TechCrunch logoTom's Hardware logoReuters logo

22 Sources

Business

18 hrs ago

SoftBank's $2 Billion Investment in Intel: A Strategic Move

OpenAI Launches Affordable ChatGPT Go Plan in India, Eyeing Global Expansion

OpenAI introduces ChatGPT Go, a new subscription plan priced at ₹399 ($4.60) per month exclusively for Indian users, offering enhanced features and affordability to capture a larger market share.

TechCrunch logoBloomberg Business logoReuters logo

15 Sources

Technology

18 hrs ago

OpenAI Launches Affordable ChatGPT Go Plan in India, Eyeing

Microsoft Integrates AI-Powered 'COPILOT' Function into Excel Cells

Microsoft introduces a new AI-powered 'COPILOT' function in Excel, allowing users to perform complex data analysis and content generation using natural language prompts within spreadsheet cells.

The Verge logoThe Register logoGeekWire logo

8 Sources

Technology

10 hrs ago

Microsoft Integrates AI-Powered 'COPILOT' Function into

Adobe Revolutionizes PDF with AI-Powered Acrobat Studio

Adobe launches Acrobat Studio, integrating AI assistants and PDF Spaces to transform document management and collaboration, marking a significant evolution in PDF technology.

Wired logoThe Verge logoXDA-Developers logo

10 Sources

Technology

10 hrs ago

Adobe Revolutionizes PDF with AI-Powered Acrobat Studio
TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo