Curated by THEOUTPOST
On Wed, 2 Oct, 4:04 PM UTC
5 Sources
[1]
OpenAI Unveils Realtime API and Other Improvements for Developers
OpenAI is also making the process of model distillation easier OpenAI hosted its annual DevDay conference in San Francisco on Tuesday and announced several new upgrades to the application programming interface (API) version of ChatGPT, which can be remodelled and fine-tuned to power other applications and software. Among them, the major introductions are the realtime API, prompt coaching, and vision fine-tuning with GPT-4o. The company is also making the process of model distillation easier for developers. OpenAI also announced the completion of its funding round and stated it raised $6.6 billion (roughly Rs. 55 thousand crore) during the event. In several blog posts, the AI firm highlighted the new features and tools for developers. The first is realtime API which will be available to the paid subscribers of ChatGPT API. This new capability offers a low-latency multimodal experience, allowing speech-to-speech conversations similar to the ChatGPT Advanced Voice Mode. Developers can also make use of the six preset voices that were earlier added to the API. Another new introduction is the prompt coaching capability in the API. OpenAI is introducing this feature as a way for developers to save costs on prompts which are frequently used. The company noticed that developers usually keep sending the same input prompts when editing a codebase or having a multi-turn conversation with the chatbot. With prompt coaching, they can now reuse recently used input prompts at a discounted rate. The processing for the same will also be faster. The new rates can be checked here. The GPT-4o model can also be fine-tuned for vision-related tasks. Developers can customise the large language model (LLM) by training it on a fixed set of visual data and improving its output efficiency. As per the blog post, the performance of GPT-4o for vision tasks can be improved with as few as 100 images. Finally, the company is also making the process of model distillation easier for developers. Model distillation is the process of building smaller, fine-tuned AI models from a larger language model. Earlier, the process was convoluted and required taking a multi-step approach. Now, OpenAI is offering new tools such as Stored Completions (to easily generate distillation datasets), Evals (to run custom evaluations and measure performance), and Fine-Tuning (fine-tuning the smaller models directly after running an Eval). Notably, all of these features are currently available in beta and will be available to all developers using the paid version of the API at a later date. Further, the company said it will be taking steps to further reduce the costs of input and output tokens.
[2]
OpenAI Just Announced 4 New AI Features and They're Available Now
OpenAI announced a slew of updates to its API services at a developer day event today in San Francisco. These updates will enable developers to further customize models, develop new speech-based applications, reduce prices for repetitive prompts, and get better performance out of smaller models. OpenAI announced four major API updates during the event: Model Distillation, Prompt Caching, Vision Fine-Tuning, and the introduction of a new API service called RealTime. For the uninitiated, an API (application programming interface) enables software developers to integrate features from an external application into their own product. Model Distillation The company introduced a new way to enhance the capabilities of smaller models like GPT-4o mini by fine-tuning them with the outputs of larger models, called Model Distillation. In a blog post, the company said that "until now, distillation has been a multi-step, error-prone process, which required developers to manually orchestrate multiple operations across disconnected tools, from generating datasets to fine-tuning models and measuring performance improvements." To make the process more efficient, OpenAI built a Model Distillation suite within its API platform. The platform enables developers to build their own datasets by using advanced models like GPT-4o and o1-preview to generate high-quality responses, fine-tune a smaller model to follow those responses, and then create and run custom evaluations to measure how the model performs at specific tasks.
[3]
OpenAI DevDay 2024 - Everything You Need To Know
OpenAI's DevDay 2024 introduced several significant updates aimed at enhancing developer capabilities. The key announcements include a real-time API for voice interactions, a vision fine-tuning API, prompt casing APIs, and model distillation techniques. These updates are designed to improve the efficiency and functionality of applications using OpenAI's technology. OpenAI's DevDay 2024 unveiled several pivotal updates designed to enhance developer capabilities and unlock new possibilities for creating intelligent applications. The key announcements included: These updates aim to significantly boost the efficiency and functionality of applications using OpenAI's innovative language models and AI technology. By providing developers with more powerful and versatile tools, OpenAI is empowering the creation of a new generation of AI-powered applications that can understand and interact through voice, images, and text. The real-time API is a groundbreaking tool that enables direct audio input and output, allowing developers to seamlessly integrate voice-based interactions with the advanced language capabilities of GPT-4. This API supports function calling, allowing sophisticated voice-controlled tasks like ordering a pizza or booking a flight. In the future, this API will expand to include real-time image and video support as well, greatly broadening the scope of multimodal applications that developers can build. The real-time API is currently available in public beta for paid developers, with pricing set at $100 per million audio tokens in and $200 per million audio tokens out. Here are a selection of other articles from our extensive library of content you may find of interest on the subject of OpenAI : Another major update is the vision fine-tuning API, which allows developers to fine-tune GPT-4 with images, greatly enhancing its ability to perform visual question answering, image captioning, and other image understanding tasks. This API opens up exciting possibilities for applications in areas like robotic process automation (RPA), web design, and augmented reality. For instance, developers can create tools that automatically generate web page layouts or UI designs based on hand-drawn sketches or wireframe images. The pricing for the vision fine-tuning API is set at $25 per million tokens for training and $15 per million output tokens. The new prompt casing APIs introduce an innovative way to optimize prompts and reduce token usage, similar to techniques pioneered by Google and Anthropic. This API aims to substantially reduce costs for applications that require long, detailed prompts, making it much more economical to provide extensive context to language models. This is particularly beneficial for applications like customer service chatbots, knowledge management systems, or data analysis tools that need to process lengthy inputs and maintain conversational context over many turns. Finally, OpenAI introduced model distillation, a technique that allows developers to create smaller, faster versions of large language models that are optimized for specific tasks. This is incredibly useful for fine-tuning models to target particular use cases and deploying them efficiently in resource-constrained environments like mobile devices or web browsers. To help developers get started with model distillation, OpenAI is generously offering free fine-tuning up to a million tokens per day until the end of the month. They have also released tools for easily storing completions and evaluations to streamline the model optimization process. These transformative updates from OpenAI DevDay 2024 are poised to usher in a new era of intelligent application development. By putting more efficient and versatile tools in the hands of developers, these APIs and model optimization techniques will unlock new frontiers in voice interfaces, computer vision, natural language processing, and more. Whether you are building a voice-controlled smart home system, an AI-powered design tool, a highly personalized recommendation engine, or an optimized chatbot for your business, these new capabilities offer endless possibilities to take your applications to the next level. As OpenAI continues to push the boundaries of what's possible with AI, it's an exciting time to be a developer and harness these innovative technologies to build amazing things.
[4]
OpenAI lets developers build real-time voice apps - at a substantial premium
OpenAI's annual developer day took place Wednesday in San Francisco, with a raft of product and feature announcements. The event's centerpiece was the company's introduction of its real-time application programming interface (API). The feature for developers makes it possible to send and receive spoken-language inputs and outputs during inference operations, or making predictions with a production large language model (LLM). It is hoped this type of interaction can enable a more fluid, real-time conversation between a person and a language model. Also: OpenAI's Altman sees 'superintelligence' just around the corner - but he's short on details This capability also comes at a hefty premium. OpenAI currently prices the GPT-4o large language model, which is the model that forms the basis for the real-time API, at $2.50 per million tokens of input text, and $10 per million output tokens. The real-time input and output cost is at least twice that rate, based on both text and audio tokens, since GPT-4o needs both kinds of input and output. Input and output tokens for GPT-4o when using the real-time API cost $5 and $20, respectively, per million tokens. For voice tokens, the cost is a whopping $100 per million audio input tokens and $200 per million audio output tokens. Also: How to use ChatGPT to optimize your resume OpenAI notes that with standard statistics for voice conversations, the pricing of audio tokens "equates to approximately $0.06 per minute of audio input and $0.24 per minute of audio output." OpenAI gives examples of how real-time voice can be used in generative AI, including an automated health coach giving a person advice, and a language tutor that can engage in conversations with a student to practice a new language. During the developer conference, OpenAI offered a way to reduce the total cost to developers, with prompt caching, which is re-using tokens on inputs that have been previously submitted to the model. That approach cuts the price of GPT-4o input text tokens in half. Also: OpenAI's budget GPT-4o mini model is now cheaper to fine-tune, too Also introduced Wednesday was LLM "distillation", which lets developers use the data from larger models to train smaller models. A developer captures the input and output of one of OpenAI's more capable language models, such as GPT-4o, using the technique known as "stored completions". Those stored completions then become the training data to "fine tune" a smaller model, such as GPT-4o mini. OpenAI bills the distillation service as a way to eliminate a lot of iterative work required by developers to train smaller models from larger models. "Until now, distillation has been a multi-step, error-prone process," says the company's blog on the matter, "which required developers to manually orchestrate multiple operations across disconnected tools, from generating datasets to fine-tuning models and measuring performance improvements." Also: Businesses can reach decision dominance using AI. Here's how Distillation comes in addition to OpenAI's existing fine-tuning service, the difference being that you can use the larger model's input-output pairs as the fine-tuning data. To the fine-tuning service, the company Wednesday added image fine tuning. A developer submits a data set of images, just as they would with text, to make an existing model, such as GPT-4o, more specific to a task or a domain of knowledge. An example in practice is work by food delivery service Grab. The company uses real-world images of street signs to have GPT-4o perform mapping of the company's delivery routes. "Grab was able to improve lane count accuracy by 20% and speed limit sign localization by 13% over a base GPT-4o model, enabling them to better automate their mapping operations from a previously manual process," states OpenAI. Pricing is based on chopping up each image a developer submits into tokens, which are then priced at $3.75 per million input tokens and $15 per million output tokens, the same as standard fine-tuning. For training image models, the cost is $25 per million tokens.
[5]
OpenAI DevDay 2024 - What No One is Talking About
OpenAI's highly anticipated DevDay 2024 event has taken the AI community by storm, introducing a suite of transformative updates to their already powerful API. These enhancements promise to transform the way developers and businesses harness the potential of artificial intelligence across various sectors. From real-time audio processing to advanced vision fine-tuning, the unveiled features aim to boost efficiency, expand capabilities, and unlock new possibilities for AI-driven solutions. Prompt Engineering covers the event in more detail explaining little more about what no one seems to be talking about. One of the most significant announcements at OpenAI DevDay 2024 was the introduction of the Realtime API. This groundbreaking update enables seamless integration of audio input and output within a single API, opening up a world of opportunities for applications that require real-time audio processing. Whether it's developing interactive voice response systems, virtual assistants, or real-time translation services, the Realtime API simplifies the process and enhances the user experience. Moreover, the API's support for function calling allows developers to automate actions based on audio inputs, streamlining the creation of sophisticated automated communication tools. Another highlight of DevDay 2024 was the announcement of Vision Fine-Tuning. This powerful update empowers developers to fine-tune models using both images and text, significantly improving the accuracy and precision of computer vision tasks. Industries such as manufacturing, healthcare, and retail can greatly benefit from this capability, as it enables the development of highly accurate quality control systems, automated inspection tools, and personalized product recommendations. OpenAI's generous offer of free training tokens for vision fine-tuning until October 31, 2024, presents an excellent opportunity for businesses and researchers to experiment with these advanced models and push the boundaries of computer vision applications. Efficiency and optimization were also key themes at DevDay 2024, with the introduction of Prompt Caching. This innovative feature is designed to enhance the handling of long prompts, automatically applying optimization techniques to prompts exceeding 1024 tokens. By reducing costs and improving processing speed, Prompt Caching addresses a common challenge faced by developers working with extensive prompts. What sets OpenAI's approach apart is its unique ability to optimize long prompt handling without compromising performance, unlike similar implementations by competitors such as Google and Anthropic. OpenAI also showcased advancements in Model Distillation, a technique that involves fine-tuning smaller, cost-efficient models using outputs from larger, more resource-intensive models. This approach allows developers to deploy more efficient models without sacrificing accuracy, making it particularly beneficial for applications where computational resources are limited but high performance is still required. By mirroring the strategies employed by industry giants like Google and Meta, OpenAI demonstrates its commitment to providing accessible and efficient AI solutions. Here are a selection of other articles from our extensive library of content you may find of interest on the subject of OpenAI : Retrieval systems, a crucial component of many AI applications, also received significant upgrades at DevDay 2024. OpenAI introduced several improvements, including chunking, reranking, query expansion, and tool usage, all aimed at enhancing retrieval accuracy and ensuring the most relevant information is retrieved quickly and efficiently. By emphasizing evaluation-driven development and encouraging the setting and measuring of performance targets, OpenAI empowers developers to continuously refine and optimize their retrieval systems. Finally, OpenAI showcased advancements in Structured JSON Output, addressing the need for consistent and precise data formatting. The process involves token masking to ensure the correct format, along with the creation of a grammar and parser to maintain format integrity. While the initial setup for structured output generation may be more time-consuming, subsequent outputs benefit from faster processing thanks to pre-built indexes. This feature is invaluable for applications that rely on structured data, such as database management, data analysis, and API integrations. OpenAI's DevDay 2024 has set the stage for a new era of AI development, empowering businesses and developers with innovative tools and techniques. By using these advancements, organizations can create more efficient, accurate, and powerful AI applications that drive innovation and deliver tangible results. As the AI landscape continues to evolve, OpenAI remains at the forefront, providing the resources and guidance needed to unlock the full potential of artificial intelligence.
Share
Share
Copy Link
OpenAI's DevDay 2024 unveiled groundbreaking updates to its API services, including real-time voice interactions, vision fine-tuning, prompt caching, and model distillation techniques. These advancements aim to enhance developer capabilities and unlock new possibilities in AI-powered applications.
OpenAI introduced a groundbreaking Real-Time API, enabling seamless integration of audio input and output within a single API [1][3]. This feature supports direct audio input and output, allowing developers to create sophisticated voice-controlled applications [3]. The API is currently available in public beta for paid developers, with pricing set at $100 per million audio tokens in and $200 per million audio tokens out [4].
The Vision Fine-Tuning API allows developers to fine-tune GPT-4 with images, significantly improving its ability to perform visual question answering, image captioning, and other image understanding tasks [2][3]. This opens up exciting possibilities for applications in areas like robotic process automation, web design, and augmented reality [3]. The pricing for the vision fine-tuning API is set at $25 per million tokens for training and $15 per million output tokens [3].
OpenAI introduced Prompt Caching as a way to optimize prompts and reduce token usage [2][3]. This feature is particularly beneficial for applications that require long, detailed prompts, making it more economical to provide extensive context to language models [3]. The new rates for prompt caching can be checked on OpenAI's website [1].
Model Distillation is a technique that allows developers to create smaller, faster versions of large language models optimized for specific tasks [2][3]. OpenAI has simplified this process by introducing a Model Distillation suite within its API platform [2]. To encourage adoption, OpenAI is offering free fine-tuning up to a million tokens per day until the end of the month [3].
While these new features offer significant advancements, they come at a premium. The Real-Time API, for instance, is priced at least twice the rate of standard GPT-4o usage [4]. However, OpenAI has introduced ways to reduce costs, such as prompt caching, which cuts the price of GPT-4o input text tokens in half [4].
These updates from OpenAI DevDay 2024 are set to usher in a new era of intelligent application development [3]. By providing more efficient and versatile tools, these APIs and model optimization techniques will unlock new frontiers in voice interfaces, computer vision, natural language processing, and more [3]. Developers can now create more sophisticated AI-powered applications, from voice-controlled smart home systems to AI-powered design tools and highly personalized recommendation engines [3].
As OpenAI continues to push the boundaries of what's possible with AI, the developer community can expect even more innovations in the future [3]. The company's commitment to enhancing AI capabilities while also focusing on efficiency and accessibility demonstrates its dedication to driving the field forward [5]. With these new tools at their disposal, developers are well-positioned to create the next generation of AI-powered applications that will shape the future of technology.
Reference
[1]
[3]
[5]
OpenAI introduces a suite of new tools for developers, including real-time voice capabilities and improved image processing, aimed at simplifying AI application development and maintaining its competitive edge in the AI market.
5 Sources
OpenAI announces significant cost reductions for its Realtime API and introduces new voice options, potentially revolutionizing AI-powered voice assistants and chatbots.
2 Sources
OpenAI introduces GPT-4o Mini, a smaller and more affordable version of GPT-4. This new AI model aims to reduce costs for developers while maintaining impressive capabilities.
24 Sources
OpenAI introduces Realtime API, potentially revolutionizing smart speaker technology with advanced voice features, real-time interactions, and more natural conversations.
2 Sources
OpenAI has updated its GPT-4o model, enhancing its creative writing, file handling, and overall performance. The upgrade has helped ChatGPT reclaim the top position in LLM rankings, surpassing Google's Gemini.
3 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2024 TheOutpost.AI All rights reserved