2 Sources
[1]
OpenAI Just Made an Important Service 80 Percent Cheaper
At a developers' event in London, AI market leader OpenAI announced new ways to save money while using its RealTime API. For developers creating voice-to-voice chatbots, it could be a huge money saver. At the event, OpenAI announced that it would soon add the ability for its RealTime API to automatically cache audio and text inputs, which could reduce the cost of long conversations by as much as 80 percent. (APIs enable different software applications to communicate and share data with each other.) The RealTime API is designed to create applications and software that feature voice assistants and AI agents, and is currently being used by companies including Healthify, Speak, and Twilio. The API enables developers to create bots that people can engage with either through voice or text, and that can take action, like ordering a pizza or setting an appointment. While the API, which was released at the start of October, was welcomed by developers, some complained that the pricing was too expensive for many use cases. OpenAI's APIs charge developers depending on how many tokens (fragments of data) are processed as inputs and generated as outputs. According to OpenAI, text input is priced at $5 per one million tokens and output is priced at $20 per one million tokens. Audio input is priced at $100 per one million tokens and output is $200 per one million tokens.
[2]
OpenAI expands Realtime API with new voices and cuts prices for developers
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI updated its Realtime API today, which is currently in beta. This update adds new voices for speech-to-speech applications to its platform and cuts costs associated with caching prompts. Beta users of the Realtime API will now have five new voices they can use to build their applications. OpenAI showcased three of the new voices, Ash, Verse and the British-sounding Ballad, in a post on X. The company said in its API documentation that the native speech-to-speech feature "skip[s] an intermediate text format means low latency and nuanced output," while the voices are easier to steer and more expressive than its previous voices. However, OpenAI warns it cannot offer client-side authentication for the API now as it's still in beta. It also said that there may be issues with processing real-time audio. "Network conditions heavily affect real-time audio, and delivering audio reliably from a client to a server at scale is challenging when network conditions are unpredictable," the company shared. OpenAI's history with AI-powered speech and voices has been controversial. In March, it released Voice Engine, a voice cloning platform to rival ElevenLabs, but it limited access to only a few researchers. In May, after the company demoed its GPT-4o and Voice Mode, it paused using one of the voices, Sky, after the actress Scarlett Johansson spoke out about its similarity to her voice. The company rolled out ChatGPT Advanced Voice Mode for paying subscribers (those using ChatGPT Plus, Enterprise, Teams and Edu) in the U.S. in September. Speech-to-speech AI would ideally let enterprises build more real-time responses using a voice. Suppose a customer calls a company's customer service platform. In that case, the speech-to-speech capability can take the person's voice, understand what they are asking, and respond using an AI-generated voice with lower latency. Speech-to-speech also lets users generate voice-overs, with a user speaking their lines, but the voice output is not theirs. One platform that offers this is Replica and, of course, ElevenLabs. OpenAI released the Realtime API this month during its Dev Day. The API aims to speed up the building of voice assistants. Lowering costs Using speech-to-speech features, though, could get expensive. When Realtime API launched, the pricing structure was at $0.06 per minute of audio input and $0.24 per audio output, which is not cheap. However, the company plans to lower real-time API prices with prompt caching. Cached text inputs will drop by 50%, and cached audio inputs will be discounted by 80%. OpenAI also announced Prompt Caching during Dev Day and would keep frequently requested contexts and prompts in the model's memory. This will drop the number of tokens it needs to create to generate responses. Lowering input prices, could encourage more interested developers to connect to the API.
Share
Copy Link
OpenAI announces significant cost reductions for its Realtime API and introduces new voice options, potentially revolutionizing AI-powered voice assistants and chatbots.
OpenAI, the leading AI company, has unveiled significant cost reductions and new features for its Realtime API at a developers' event in London. The company plans to implement automatic caching of audio and text inputs, which could slash the cost of long conversations by up to 80 percent 1.
The new pricing structure aims to make the Realtime API more accessible to developers:
This move addresses concerns from developers who previously found the API pricing prohibitively expensive for many use cases 2.
In addition to cost reductions, OpenAI has introduced five new voices for speech-to-speech applications on its platform. The company showcased three of these voices - Ash, Verse, and the British-sounding Ballad - in a post on X 2. These new voices are designed to be more expressive and easier to control than previous iterations.
The Realtime API, released in early October, is designed for creating applications featuring voice assistants and AI agents. It's already being utilized by companies such as Healthify, Speak, and Twilio 1. The API enables developers to build bots that can interact through voice or text and perform actions like ordering food or scheduling appointments.
With the new pricing structure and enhanced voice capabilities, OpenAI is positioning itself to revolutionize various industries:
While these advancements are promising, OpenAI acknowledges some challenges:
As OpenAI continues to innovate in the realm of AI-powered speech and text interactions, these latest developments in the Realtime API represent a significant step forward in making advanced AI capabilities more accessible and affordable for developers and businesses alike.
Summarized by
Navi
[1]
Hangzhou, particularly its Liangzhu suburb, has become the epicenter of China's AI revolution, attracting tech talent and entrepreneurs with its vibrant startup ecosystem, government support, and proximity to tech giants.
2 Sources
Technology
5 hrs ago
2 Sources
Technology
5 hrs ago
Aigen, a startup, introduces Element, an AI-powered solar robot designed to weed crops efficiently. This innovation addresses labor shortages and herbicide resistance in U.S. farms while promoting environmentally friendly farming practices.
4 Sources
Technology
5 hrs ago
4 Sources
Technology
5 hrs ago
ASUS is set to launch the Ascent GX10, a compact AI workstation powered by NVIDIA's GB200 Grace Blackwell Superchip, marking NVIDIA's entry into the desktop CPU market with a focus on AI development and inference workloads.
2 Sources
Technology
5 hrs ago
2 Sources
Technology
5 hrs ago
Elon Musk's xAI obtains an air permit for 15 gas turbines at its Memphis data center, sparking debate over pollution and environmental justice in predominantly Black neighborhoods.
6 Sources
Technology
2 days ago
6 Sources
Technology
2 days ago
IREN Limited has announced a significant expansion of its AI cloud capabilities through the purchase of 2,400 NVIDIA Blackwell GPUs for $130 million, positioning itself as a leading provider of next-generation compute power in the AI industry.
2 Sources
Technology
2 days ago
2 Sources
Technology
2 days ago