2 Sources
[1]
OpenAI unveils sCM, a new model that generates video media 50 times faster than current diffusion models
Two experts with the OpenAI team have developed a new kind of continuous-time consistency model (sCM) that they claim can generate video media 50 times faster than models currently in use. Cheng Lu and Yang Song have published a paper describing their new model on the arXiv preprint server. They have also posted an introductory paper on the company's website. In machine learning methods by which AI apps are trained, diffusion models, sometimes called diffusion probabilistic models or score-based generative models, are a type of variable generative model. Such models typically have three major components: forward and reverse processes and a sampling procedure. These models are the basis for generating visually based products such as video or still images, though they have been used with other applications, as well, such as in audio generation. As with other machine-learning models, diffusion models work by sampling large amounts of data. Most such models execute hundreds of steps to generate an end product, which is why most of them take a few moments to carry out their tasks. In sharp contrast, Lu and Song have developed a model that carries out all its work using just two steps. That reduction in steps, they note, has drastically reduced the amount of time their model takes to generate a video -- without any loss in quality. The new model uses more than 1.5 billion parameters and can produce a sample video in a fraction of a second running on a machine with a single A100 GPU. This is approximately 50 times faster than models currently in use. The researchers note that their new model requires a lot less computational power than other models, as well, an ongoing issue with AI applications in general as their use skyrockets. They also note that their new approach has already undergone benchmarking to compare their results with other models, both those in current use and those under development by other teams. They suggest their model should allow for real-time generative AI applications in the near future.
[2]
OpenAI researchers develop new model that speeds up media generation by 50X
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More A pair of researchers at OpenAI has published a paper describing a new type of model -- specifically, a new type of continuous-time consistency model (sCM) -- that increases the speed at which multimedia including images, video, and audio can be generated by AI by 50 times compared to traditional diffusion models, generating images in nearly a 10th of a second compared to more than 5 seconds for regular diffusion. With the introduction of sCM, OpenAI has managed to achieve comparable sample quality with only two sampling steps, offering a solution that accelerates the generative process without compromising on quality. Described in the pre-peer reviewed paper published on arXiv.org and blog post released today, authored by Cheng Lu and Yang Song, the innovation enables these models to generate high-quality samples in just two steps -- significantly faster than previous diffusion-based models that require hundreds of steps. Song was also a leading author on a 2023 paper from OpenAI researchers including former chief scientist Ilya Sutskever that coined the idea of "consistency models," as having "points on the same trajectory map to the same initial point." While diffusion models have delivered outstanding results in producing realistic images, 3D models, audio, and video, their inefficiency in sampling -- often requiring dozens to hundreds of sequential steps -- has made them less suitable for real-time applications. Theoretically, the technology could provide the basis for a near-realtime AI image generation model from OpenAI. As fellow VentureBeat reporter Sean Michael Kerner mused in our internal Slack channels, "can DALL-E 4 be far behind?" Faster sampling while retaining high quality In traditional diffusion models, a large number of denoising steps are needed to create a sample, which contributes to their slow speed. In contrast, sCM converts noise into high-quality samples directly within one or two steps, cutting down on the computational cost and time. OpenAI's largest sCM model, which boasts 1.5 billion parameters, can generate a sample in just 0.11 seconds on a single A100 GPU. This results in a 50x speed-up in wall-clock time compared to diffusion models, making real-time generative AI applications much more feasible. Reaching diffusion-model quality with far less computational resources The team behind sCM trained a continuous-time consistency model on ImageNet 512×512, scaling up to 1.5 billion parameters. Even at this scale, the model maintains a sample quality that rivals the best diffusion models, achieving a Fréchet Inception Distance (FID) score of 1.88 on ImageNet 512×512. This brings the sample quality within 10% of diffusion models, which require significantly more computational effort to achieve similar results. Benchmarks reveal strong performance OpenAI's new approach has undergone extensive benchmarking against other state-of-the-art generative models. By measuring both the sample quality using FID scores and the effective sampling compute, the research demonstrates that sCM provides top-tier results with significantly less computational overhead. While previous fast-sampling methods have struggled with reduced sample quality or complex training setups, sCM manages to overcome these challenges, offering both speed and high fidelity. The success of sCM is also attributed to its ability to scale proportionally with the teacher diffusion model from which it distills knowledge. As both the sCM and the teacher diffusion model grow in size, the gap in sample quality narrows further, and increasing the number of sampling steps in sCM reduces the quality difference even more. Applications and future uses The fast sampling and scalability of sCM models open new possibilities for real-time generative AI across multiple domains. From image generation to audio and video synthesis, sCM provides a practical solution for applications that demand rapid, high-quality output. Additionally, OpenAI's research hints at the potential for further system optimization that could accelerate performance even more, tailoring these models to the specific needs of various industries.
Share
Copy Link
OpenAI researchers have developed a new continuous-time consistency model (sCM) that can generate high-quality video, images, and audio 50 times faster than current diffusion models, potentially revolutionizing real-time AI applications.
OpenAI researchers Cheng Lu and Yang Song have introduced a groundbreaking continuous-time consistency model (sCM) that promises to revolutionize AI-generated media. This new model can produce high-quality video, images, and audio up to 50 times faster than current diffusion models, marking a significant leap in generative AI technology 12.
Traditional diffusion models, which are the backbone of many AI-generated visual and audio products, typically require hundreds of steps to create an end product. In contrast, sCM accomplishes the same task in just two steps, dramatically reducing processing time without compromising on quality 1.
The sCM model utilizes over 1.5 billion parameters and can generate a sample video in a fraction of a second when run on a machine with a single A100 GPU. This represents a 50-fold increase in speed compared to models currently in use 1.
OpenAI's largest sCM model has demonstrated impressive performance metrics:
The researchers have conducted extensive benchmarking to compare sCM with other state-of-the-art generative models. By measuring both sample quality using FID scores and effective sampling compute, they have shown that sCM provides top-tier results with substantially reduced computational overhead 2.
Key advantages of the sCM model include:
The development of sCM opens up new possibilities for real-time generative AI across multiple domains. Its fast sampling and scalability make it particularly suitable for applications demanding rapid, high-quality output. Potential use cases span various industries, including:
The researchers suggest that their model could enable real-time generative AI applications in the near future, potentially transforming industries that rely on quick, high-quality media production 12.
This breakthrough in generative AI technology could have far-reaching implications for the field:
As the AI community continues to push the boundaries of what's possible, OpenAI's sCM model represents a significant step forward in making generative AI more efficient and accessible. The technology's potential to provide near-realtime AI image generation has led to speculation about future developments, such as the possibility of a DALL-E 4 model 2.
With its combination of speed, quality, and efficiency, sCM is poised to play a crucial role in shaping the future of AI-generated media and its applications across diverse sectors.
Nvidia prepares to release its Q1 earnings amid high expectations driven by AI demand, while facing challenges from China export restrictions and market competition.
4 Sources
Business and Economy
13 hrs ago
4 Sources
Business and Economy
13 hrs ago
OpenAI has updated its Operator AI agent with the more advanced o3 model, improving its reasoning capabilities, task performance, and safety measures. This upgrade marks a significant step in the development of autonomous AI agents.
4 Sources
Technology
21 hrs ago
4 Sources
Technology
21 hrs ago
Nvidia CEO Jensen Huang lauds President Trump's re-industrialization policies as 'visionary' while announcing a partnership to develop AI infrastructure in Sweden with companies like Ericsson and AstraZeneca.
4 Sources
Business and Economy
13 hrs ago
4 Sources
Business and Economy
13 hrs ago
Wall Street anticipates Nvidia's earnings report as concerns over rising Treasury yields and federal deficits impact the market. The report is expected to reflect significant growth in AI-related revenue and could reignite enthusiasm for AI investments.
2 Sources
Business and Economy
21 hrs ago
2 Sources
Business and Economy
21 hrs ago
The US House of Representatives has approved President Trump's "One Big Beautiful Bill," which includes a contentious provision to freeze state-level AI regulations for a decade, sparking debate over innovation, safety, and federal-state power balance.
2 Sources
Policy and Regulation
21 hrs ago
2 Sources
Policy and Regulation
21 hrs ago