OpenAI Unveils Advanced AI Audio Models for Transcription and Voice Generation

7 Sources

OpenAI introduces new AI models for speech-to-text and text-to-speech, offering improved accuracy, customization, and potential for building AI agents with voice capabilities.

News article

OpenAI Introduces Next-Generation Audio AI Models

OpenAI has unveiled a new suite of AI models designed to revolutionize speech-to-text and text-to-speech capabilities. These models, integrated into OpenAI's API, promise enhanced accuracy, customization, and the potential to build more sophisticated AI agents with voice interactions 12.

Advanced Transcription Models

The company has introduced two new speech-to-text models: gpt-4o-transcribe and gpt-4o-mini-transcribe. These models are set to replace OpenAI's previous Whisper model, offering significant improvements in transcription accuracy 1.

Key features of the new transcription models include:

  • Improved performance in challenging environments with diverse accents and speech patterns
  • Reduced hallucination, addressing a known issue with the Whisper model
  • A word error rate approaching 30% for Indic and Dravidian languages 13

Jeff Harris, a member of OpenAI's product staff, emphasized the importance of accuracy: "Making sure the models are accurate is completely essential to getting a reliable voice experience" 1.

Innovative Text-to-Speech Model

The new text-to-speech model, gpt-4o-mini-tts, introduces enhanced "steerability" and customization options 12. Developers can now:

  • Instruct the model to adopt specific speaking styles or emotions
  • Tailor voice experiences for different contexts, such as customer support scenarios
  • Control both the content and manner of spoken outputs 14

Integration with AI Agents

These audio models align with OpenAI's broader vision of creating "agentic" AI systems capable of independently accomplishing tasks 1. The company recently released an Agents SDK, allowing developers to incorporate voice interactions into existing text-based applications with minimal code changes 25.

Pricing and Availability

The new models are available through OpenAI's API with the following pricing structure:

  • GPT-4o-based audio model: $40 per million input tokens, $80 per million output tokens
  • GPT-4o mini-based audio models: $10 per million input tokens, $20 per million output tokens 5

Industry Impact and Competition

These advancements come at a time of increasing competition in the AI transcription and speech space. Companies like ElevenLabs and Hume AI are offering their own specialized models with unique features such as diarization and word-level customization 2.

Departure from Open-Source Approach

Unlike its predecessor Whisper, OpenAI has chosen not to make these new transcription models openly available. The company cites the models' increased size and complexity as reasons for this decision, stating that they are not suitable for local execution on personal devices 13.

As AI continues to evolve, OpenAI's latest audio models represent a significant step forward in creating more natural and versatile voice interactions, potentially transforming various industries from customer service to creative storytelling.

Explore today's top stories

ChatGPT Usage Usage Soars to 2.5 Billion Prompts, Challenging Google'sAnce

OpenAI's ChatGPT has seen a dramatic increase in daily usage, now processing over 2.5 billion prompts per day. This rapid growth positions the AI chatbot as a potential challenger to Google's search dominance and highlights the increasing role of AI in daily digital interactions.

The Verge logoPCWorld logoMashable logo

9 Sources

Technology

19 hrs ago

ChatGPT Usage Usage Soars to 2.5 Billion Prompts,

Alphabet Faces AI Competition Ahead of Q2 Earnings Report

Alphabet aims to reassure investors about its AI investments and strategy as it faces increasing competition from AI startups in search and advertising. The company's Q2 earnings report will be closely watched for signs of AI integration success and financial performance.

Reuters logoCNBC logoQuartz logo

8 Sources

Technology

11 hrs ago

Alphabet Faces AI Competition Ahead of Q2 Earnings Report

OpenAI and Oracle Expand Stargate Project with 4.5 GW Data Center Capacity</parameter boost

OpenAI and Oracle announce plans to develop an additional 4.5 gigawatts of data center capacity in the US, expanding their Stargate project to power over 2 million AI chips. The move aims to strengthen the US position in the global AI race.

Tom's Hardware logoBloomberg Business logoReuters logo

21 Sources

Business and Economy

19 hrs ago

OpenAI and Oracle Expand Stargate Project with 4.5 GW Data

UN Reports: Renewable Energy Reaches Global Tipping Point, Outcompeting Fossil Fuels

Two UN reports highlight a significant shift towards renewable energy, with solar and wind power becoming cheaper and more widespread globally. This marks a "positive tipping point" in the fight against climate change.

AP NEWS logoThe Guardian logoABC News logo

5 Sources

Technology

3 hrs ago

UN Reports: Renewable Energy Reaches Global Tipping Point,

OpenAI CEO Sam Altman Warns of Imminent AI Voice Fraud Crisis in Banking

Sam Altman, CEO of OpenAI, warns financial institutions about the risks of AI-powered voice fraud and calls for urgent changes in authentication methods.

AP NEWS logoAxios logoThe Seattle Times logo

6 Sources

Business and Economy

3 hrs ago

OpenAI CEO Sam Altman Warns of Imminent AI Voice Fraud
TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo