Curated by THEOUTPOST
On Thu, 31 Oct, 8:04 AM UTC
2 Sources
[1]
OpenAI Just Made an Important Service 80 Percent Cheaper
At a developers' event in London, AI market leader OpenAI announced new ways to save money while using its RealTime API. For developers creating voice-to-voice chatbots, it could be a huge money saver. At the event, OpenAI announced that it would soon add the ability for its RealTime API to automatically cache audio and text inputs, which could reduce the cost of long conversations by as much as 80 percent. (APIs enable different software applications to communicate and share data with each other.) The RealTime API is designed to create applications and software that feature voice assistants and AI agents, and is currently being used by companies including Healthify, Speak, and Twilio. The API enables developers to create bots that people can engage with either through voice or text, and that can take action, like ordering a pizza or setting an appointment. While the API, which was released at the start of October, was welcomed by developers, some complained that the pricing was too expensive for many use cases. OpenAI's APIs charge developers depending on how many tokens (fragments of data) are processed as inputs and generated as outputs. According to OpenAI, text input is priced at $5 per one million tokens and output is priced at $20 per one million tokens. Audio input is priced at $100 per one million tokens and output is $200 per one million tokens.
[2]
OpenAI expands Realtime API with new voices and cuts prices for developers
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI updated its Realtime API today, which is currently in beta. This update adds new voices for speech-to-speech applications to its platform and cuts costs associated with caching prompts. Beta users of the Realtime API will now have five new voices they can use to build their applications. OpenAI showcased three of the new voices, Ash, Verse and the British-sounding Ballad, in a post on X. The company said in its API documentation that the native speech-to-speech feature "skip[s] an intermediate text format means low latency and nuanced output," while the voices are easier to steer and more expressive than its previous voices. However, OpenAI warns it cannot offer client-side authentication for the API now as it's still in beta. It also said that there may be issues with processing real-time audio. "Network conditions heavily affect real-time audio, and delivering audio reliably from a client to a server at scale is challenging when network conditions are unpredictable," the company shared. OpenAI's history with AI-powered speech and voices has been controversial. In March, it released Voice Engine, a voice cloning platform to rival ElevenLabs, but it limited access to only a few researchers. In May, after the company demoed its GPT-4o and Voice Mode, it paused using one of the voices, Sky, after the actress Scarlett Johansson spoke out about its similarity to her voice. The company rolled out ChatGPT Advanced Voice Mode for paying subscribers (those using ChatGPT Plus, Enterprise, Teams and Edu) in the U.S. in September. Speech-to-speech AI would ideally let enterprises build more real-time responses using a voice. Suppose a customer calls a company's customer service platform. In that case, the speech-to-speech capability can take the person's voice, understand what they are asking, and respond using an AI-generated voice with lower latency. Speech-to-speech also lets users generate voice-overs, with a user speaking their lines, but the voice output is not theirs. One platform that offers this is Replica and, of course, ElevenLabs. OpenAI released the Realtime API this month during its Dev Day. The API aims to speed up the building of voice assistants. Lowering costs Using speech-to-speech features, though, could get expensive. When Realtime API launched, the pricing structure was at $0.06 per minute of audio input and $0.24 per audio output, which is not cheap. However, the company plans to lower real-time API prices with prompt caching. Cached text inputs will drop by 50%, and cached audio inputs will be discounted by 80%. OpenAI also announced Prompt Caching during Dev Day and would keep frequently requested contexts and prompts in the model's memory. This will drop the number of tokens it needs to create to generate responses. Lowering input prices, could encourage more interested developers to connect to the API.
Share
Share
Copy Link
OpenAI announces significant cost reductions for its Realtime API and introduces new voice options, potentially revolutionizing AI-powered voice assistants and chatbots.
OpenAI, the leading AI company, has unveiled significant cost reductions and new features for its Realtime API at a developers' event in London. The company plans to implement automatic caching of audio and text inputs, which could slash the cost of long conversations by up to 80 percent 1.
The new pricing structure aims to make the Realtime API more accessible to developers:
This move addresses concerns from developers who previously found the API pricing prohibitively expensive for many use cases 2.
In addition to cost reductions, OpenAI has introduced five new voices for speech-to-speech applications on its platform. The company showcased three of these voices - Ash, Verse, and the British-sounding Ballad - in a post on X [2]. These new voices are designed to be more expressive and easier to control than previous iterations.
The Realtime API, released in early October, is designed for creating applications featuring voice assistants and AI agents. It's already being utilized by companies such as Healthify, Speak, and Twilio [1]. The API enables developers to build bots that can interact through voice or text and perform actions like ordering food or scheduling appointments.
With the new pricing structure and enhanced voice capabilities, OpenAI is positioning itself to revolutionize various industries:
While these advancements are promising, OpenAI acknowledges some challenges:
As OpenAI continues to innovate in the realm of AI-powered speech and text interactions, these latest developments in the Realtime API represent a significant step forward in making advanced AI capabilities more accessible and affordable for developers and businesses alike.
Reference
OpenAI's DevDay 2024 unveiled groundbreaking updates to its API services, including real-time voice interactions, vision fine-tuning, prompt caching, and model distillation techniques. These advancements aim to enhance developer capabilities and unlock new possibilities in AI-powered applications.
5 Sources
OpenAI introduces a suite of new tools for developers, including real-time voice capabilities and improved image processing, aimed at simplifying AI application development and maintaining its competitive edge in the AI market.
5 Sources
OpenAI has made its advanced o1 reasoning model available to select developers, offering improved AI capabilities but at a premium cost. The release includes updates to the Realtime API and new fine-tuning methods.
6 Sources
OpenAI introduces Realtime API, potentially revolutionizing smart speaker technology with advanced voice features, real-time interactions, and more natural conversations.
2 Sources
OpenAI has finally released its advanced voice feature for ChatGPT Plus and Team users, allowing for more natural conversations with the AI. The feature was initially paused due to concerns over potential misuse.
14 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved