2 Sources
[1]
OpenAI Just Made an Important Service 80 Percent Cheaper
At a developers' event in London, AI market leader OpenAI announced new ways to save money while using its RealTime API. For developers creating voice-to-voice chatbots, it could be a huge money saver. At the event, OpenAI announced that it would soon add the ability for its RealTime API to automatically cache audio and text inputs, which could reduce the cost of long conversations by as much as 80 percent. (APIs enable different software applications to communicate and share data with each other.) The RealTime API is designed to create applications and software that feature voice assistants and AI agents, and is currently being used by companies including Healthify, Speak, and Twilio. The API enables developers to create bots that people can engage with either through voice or text, and that can take action, like ordering a pizza or setting an appointment. While the API, which was released at the start of October, was welcomed by developers, some complained that the pricing was too expensive for many use cases. OpenAI's APIs charge developers depending on how many tokens (fragments of data) are processed as inputs and generated as outputs. According to OpenAI, text input is priced at $5 per one million tokens and output is priced at $20 per one million tokens. Audio input is priced at $100 per one million tokens and output is $200 per one million tokens.
[2]
OpenAI expands Realtime API with new voices and cuts prices for developers
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI updated its Realtime API today, which is currently in beta. This update adds new voices for speech-to-speech applications to its platform and cuts costs associated with caching prompts. Beta users of the Realtime API will now have five new voices they can use to build their applications. OpenAI showcased three of the new voices, Ash, Verse and the British-sounding Ballad, in a post on X. The company said in its API documentation that the native speech-to-speech feature "skip[s] an intermediate text format means low latency and nuanced output," while the voices are easier to steer and more expressive than its previous voices. However, OpenAI warns it cannot offer client-side authentication for the API now as it's still in beta. It also said that there may be issues with processing real-time audio. "Network conditions heavily affect real-time audio, and delivering audio reliably from a client to a server at scale is challenging when network conditions are unpredictable," the company shared. OpenAI's history with AI-powered speech and voices has been controversial. In March, it released Voice Engine, a voice cloning platform to rival ElevenLabs, but it limited access to only a few researchers. In May, after the company demoed its GPT-4o and Voice Mode, it paused using one of the voices, Sky, after the actress Scarlett Johansson spoke out about its similarity to her voice. The company rolled out ChatGPT Advanced Voice Mode for paying subscribers (those using ChatGPT Plus, Enterprise, Teams and Edu) in the U.S. in September. Speech-to-speech AI would ideally let enterprises build more real-time responses using a voice. Suppose a customer calls a company's customer service platform. In that case, the speech-to-speech capability can take the person's voice, understand what they are asking, and respond using an AI-generated voice with lower latency. Speech-to-speech also lets users generate voice-overs, with a user speaking their lines, but the voice output is not theirs. One platform that offers this is Replica and, of course, ElevenLabs. OpenAI released the Realtime API this month during its Dev Day. The API aims to speed up the building of voice assistants. Lowering costs Using speech-to-speech features, though, could get expensive. When Realtime API launched, the pricing structure was at $0.06 per minute of audio input and $0.24 per audio output, which is not cheap. However, the company plans to lower real-time API prices with prompt caching. Cached text inputs will drop by 50%, and cached audio inputs will be discounted by 80%. OpenAI also announced Prompt Caching during Dev Day and would keep frequently requested contexts and prompts in the model's memory. This will drop the number of tokens it needs to create to generate responses. Lowering input prices, could encourage more interested developers to connect to the API.
Share
Copy Link
OpenAI announces significant cost reductions for its Realtime API and introduces new voice options, potentially revolutionizing AI-powered voice assistants and chatbots.
OpenAI, the leading AI company, has unveiled significant cost reductions and new features for its Realtime API at a developers' event in London. The company plans to implement automatic caching of audio and text inputs, which could slash the cost of long conversations by up to 80 percent 1.
The new pricing structure aims to make the Realtime API more accessible to developers:
This move addresses concerns from developers who previously found the API pricing prohibitively expensive for many use cases 2.
In addition to cost reductions, OpenAI has introduced five new voices for speech-to-speech applications on its platform. The company showcased three of these voices - Ash, Verse, and the British-sounding Ballad - in a post on X 2. These new voices are designed to be more expressive and easier to control than previous iterations.
The Realtime API, released in early October, is designed for creating applications featuring voice assistants and AI agents. It's already being utilized by companies such as Healthify, Speak, and Twilio 1. The API enables developers to build bots that can interact through voice or text and perform actions like ordering food or scheduling appointments.
With the new pricing structure and enhanced voice capabilities, OpenAI is positioning itself to revolutionize various industries:
While these advancements are promising, OpenAI acknowledges some challenges:
As OpenAI continues to innovate in the realm of AI-powered speech and text interactions, these latest developments in the Realtime API represent a significant step forward in making advanced AI capabilities more accessible and affordable for developers and businesses alike.
Summarized by
Navi
[1]
Salesforce CEO Marc Benioff reveals that AI is now responsible for 30-50% of the company's work, signaling a significant shift in how tech companies are integrating AI into their operations and workforce management.
7 Sources
Technology
8 hrs ago
7 Sources
Technology
8 hrs ago
Microsoft and OpenAI are in a dispute over a contractual clause regarding access to Artificial General Intelligence (AGI), highlighting tensions in their partnership as OpenAI seeks to transition into a public-benefit corporation.
6 Sources
Technology
1 day ago
6 Sources
Technology
1 day ago
A new report suggests that the ambitious climate pledges of major tech companies are becoming increasingly unrealistic due to the surge in energy consumption driven by AI development and data center expansion.
5 Sources
Technology
16 hrs ago
5 Sources
Technology
16 hrs ago
YouTube rolls out AI-generated search results carousel and expands conversational AI tool, mirroring Google's AI Overviews, potentially impacting creator engagement and user experience.
10 Sources
Technology
7 hrs ago
10 Sources
Technology
7 hrs ago
Amazon's AWS has lost its vice president overseeing generative AI development, Vasi Philomin, as competition for AI talent intensifies in the tech industry. This departure comes as Amazon strives to strengthen its position in AI development against rivals like OpenAI and Google.
6 Sources
Technology
7 hrs ago
6 Sources
Technology
7 hrs ago