Curated by THEOUTPOST
On Thu, 25 Jul, 12:04 AM UTC
3 Sources
[1]
Stability AI introduces Stable Video 4D, its new AI model for 3D video generation - SiliconANGLE
Stability AI introduces Stable Video 4D, its new AI model for 3D video generation Open generative artificial intelligence startup Stability AI Ltd., best known for its image generation tool Stable Diffusion, is working hard on developing AI models for 3D video. Its newest model announced today can take a single video of an object from one angle and reproduce it from multiple angles. Stable Video 4D can transform a video of a 3D object into multiple-angle views of the same object from eight different perspectives. That means it can interpret what the object looks like, including its movements, from the side it cannot see, to allow it to reproduce its movement and appearance from different angles. The new model builds on the foundation of Stability AI's Stable Video Diffusion model, which the company released in November. The Stable Video model can take a still image and convert it into a photorealistic video, including motion. "The Stable Video 4D model takes a video as input and generates multiple novel-view videos from different perspectives," the company said in the announcement. "This advancement represents a leap in our capabilities, moving from image-based video generation to full 3D dynamic video synthesis." This isn't the first time Stability AI has worked on 3D video. In March the company introduced Stable Video 3D, which cann take images of objects and produce rotating 3D videos of those objects based on the image. Unlike SV3D, the new Stable Video 4D adds to its capabilities so it can handle an object's motion. Similar to the SV3D model, SV4D needs to interpret the parts of the object it cannot see to produce the necessary additional perspectives. It also must reproduce invisible motion - such as what might be blocked from view -- by understanding the object and its components. "The key aspects that enabled Stable Video 4D are that we combined the strengths of our previously-released Stable Video Diffusion and Stable Video 3D models, and fine-tuned it with a carefully curated dynamic 3D object dataset," Varun Jampani, team lead of 3D Research at Stability AI, told VentureBeat in an interview. According to the researchers, SV4D is currently capable of generating five-frame videos across eight perspectives in about 40 seconds, with the entire optimization process taking around 20 to 25 minutes. The research team said that using a new approach to multiview diffusion by building on its previous work, it has produced a model that can faithfully reproduce 3D video across both frames and different perspectives. Although the model is still in the research stages, Stability AI said, SV4D will be a significant innovation for movie production, augmented reality, virtual reality, gaming and other industries where dynamic views of moving objects would be needed. The model is currently available for developers and researchers to view and use on Hugging Face. It's the company's first video-to-video generation model, though it's still under development as Stability AI continues to refine the model with better optimization to handle a wider range of real-world videos beyond the synthetic datasets it was trained on.
[2]
Stability AI steps into a new gen AI dimension with Stable Video 4D
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Stability AI is expanding its growing roster of generative AI models, quite literally adding a new dimension with the debut of Stable Video 4D. While there is a growing set of gen AI tools for video generation, including OpenAI's Sora, Runway, Haiper and Luma AI among others, Stable Video 4D is something a bit different. Stable Video 4D builds on the foundation of Stability AI's existing Stable Video Diffusion model, which converts images into videos. The new model takes this concept further by accepting video input and generating multiple novel-view videos from 8 different perspectives. "We see Stable Video 4D being used in movie production, gaming, AR/VR, and other use cases where there is a need to view dynamically moving 3D objects from arbitrary camera angles," Varun Jampani, team lead, 3D Research at Stability AI told VentureBeat. Stable Video 4D is different than just 3D for gen AI This isn't Stability AI's first foray beyond the flat world of 2D space. In March, Stable Video 3D was announced, enabling users to generate short 3D video from an image or text prompt. Stable Video 4D is going a significant step further. While the concept of 3D, that is 3 dimensions, is commonly understood as a type of image or video with depth, 4D isn't perhaps as universally understood. Jampani explained that the four dimensions include width (x), height (y), depth (z) and time (t). That means Stable Video 4D is able to view a moving 3D object from various camera angles as well as at different timestamps. "The key aspects that enabled Stable Video 4D are that we combined the strengths of our previously-released Stable Video Diffusion and Stable Video 3D models, and fine-tuned it with a carefully curated dynamic 3D object dataset," Jampani explained. Jampani noted that Stable Video 4D is a first-of-its-kind network where a single network does both novel view synthesis and video generation. Existing works leverage separate video generation and novel view synthesis networks for this task. He also explained that Stable Video 4D is different from Stable Video Diffusion and Stable Video 3D, in terms of how the attention mechanisms work. "We carefully design attention mechanisms in the diffusion network which allow generation of each video frame to attend to its neighbors at different camera views or timestamps, thus resulting in better 3D coherence and temporal smoothness in the output videos," Jampani said. How Stable Video 4D works differently than gen AI infill With gen AI tools for 2D image generation the concept of infill and outfill, to fill in gaps, is well established. The infill/outfill approach however is not how Stable Video 4D works. Jampani explained that the approach is different from generative infill/outfill, where the networks typically complete the partially given information. That is, the output is already partially filled by the explicit transfer of information from the input image. "Stable Video 4D completely synthesizes the 8 novel view videos from scratch by using the original input video as guidance," he said. "There is no explicit transfer of pixel information from input to output, all of this information transfer is done implicitly by the network." Stable Video 4D is currently available for research evaluation on Hugging Face. Stability AI has not yet announced what commercial options will be available for it in the future. "Stable Video 4D can already process single-object videos of several seconds with a plain background," Jampani said. "We plan to generalize it to longer videos and also to more complex scenes."
[3]
Meet Stability AI's Stable Video 4D, a nuanced take on AI video generation
With its first video-to-video AI model, Stability AI pushes the boundaries for AI video generation even further. The popularity of AI image generation has led us to the next frontier: video generation. Nearly every major player in the AI space, including Google and OpenAI, is developing text-to-video generators. Stability AI just released its latest AI video model, but its take is a bit different from the rest. Also: This AI video platform will assemble a short for you from start to finish On Wednesday, Stability AI unveiled Stable Video 4D, an AI model that can transform a single object video into multiple novel-view videos of the object from different angles and perspectives. This allows users to generate realistic, multi-angle videos from just one video, as seen below. When using the model, a user would start by uploading a single video and delineating what 3D camera poses they'd like to see rendered. Then, in 40 seconds, Stable Video 4D generates five frames across eight views, with the entire 4D optimization taking 20 to 25 minutes, according to the blog post. Stability AI sees Stable Video 4D as a tool capable of helping working professionals across various industries in the near future. "Our team envisions future applications in game development, video editing, and virtual reality," said Stability AI. "Professionals in these fields can significantly benefit from the ability to visualize objects from multiple perspectives, enhancing the realism and immersion of their products." Even though Stability AI has released other AI video generators, such as Stable Video Diffusion, which converts images into videos, and Stable Video 3D, which generates 3D objects from single images, this is the company's first video-to-video generator, which the company calls an "exciting milestone." Also: The best AI image generators: Tested and reviewed Alongside the AI model, the company also released a technical report showcasing the model's development, including the methodologies, challenges, and more. Stable Video 4D is currently available on Hugging Face. Stability AI says it is actively working on improving the model.
Share
Share
Copy Link
Stability AI introduces Stable Video 4D, a groundbreaking AI model capable of generating 3D videos from text prompts. This innovation marks a significant advancement in the field of AI-generated content, offering new possibilities for creators and industries.
Stability AI, a leading player in the artificial intelligence arena, has unveiled its latest innovation: Stable Video 4D. This cutting-edge AI model represents a significant leap forward in the realm of AI-generated content, specifically in the domain of 3D video creation 1.
Stable Video 4D stands out for its ability to generate 3D videos from text prompts, a feat that pushes the boundaries of what's possible in AI-driven content creation. The model can produce videos up to 4 seconds long, with a resolution of 576x1024 pixels, running at 24 frames per second 2. What sets it apart is its capacity to create 3D-aware content, allowing for seamless camera movement and rotation around generated objects or scenes.
At its core, Stable Video 4D is built upon Stability AI's existing text-to-image models. It leverages a novel architecture that combines these models with additional components to enable 3D video generation. The system utilizes a technique called Neural Radiance Fields (NeRF) to create 3D representations of scenes, which are then rendered into 2D frames to compose the final video 3.
The introduction of Stable Video 4D opens up a wide array of possibilities across various industries. In the entertainment sector, it could revolutionize special effects creation and virtual production. For e-commerce, the technology could enable the generation of 360-degree product views from simple text descriptions. Educators might use it to create immersive learning experiences, while architects and designers could quickly visualize 3D concepts [2].
Despite its impressive capabilities, Stable Video 4D is not without limitations. The current version struggles with generating realistic human faces and complex scenes with multiple moving objects. Additionally, as with other AI-generated content, there are concerns about potential misuse, such as the creation of deepfakes or misleading content [3].
Stability AI has emphasized that Stable Video 4D is still in its early stages. The company plans to continue refining the model, improving its capabilities, and addressing its current limitations. As the technology evolves, it's expected to have an increasingly significant impact on various industries and creative processes [1].
The unveiling of Stable Video 4D has generated considerable excitement within the tech and creative communities. Experts see it as a major step forward in the democratization of 3D content creation, potentially lowering the barriers to entry for complex video production [2]. However, some also caution about the need for responsible development and use of such powerful AI tools [3].
Stability AI releases Stable Diffusion 3.5, addressing previous issues and offering enhanced image quality, prompt adherence, and diversity in AI-generated images.
9 Sources
Runway introduces Gen-3 Alpha Turbo, an AI-powered tool that can turn selfies into action-packed videos. This advancement in AI technology promises faster and more cost-effective video generation for content creators.
2 Sources
Runway AI, a leader in AI-powered video generation, has launched an API for its advanced video model. This move aims to expand access to its technology, enabling developers and enterprises to integrate powerful video generation capabilities into their applications and products.
8 Sources
Stability AI has introduced three cutting-edge text-to-image models to Amazon Bedrock, expanding the platform's AI capabilities and offering developers new tools for visual content generation.
4 Sources
Genmo releases Mochi 1, an open-source text-to-video AI model, offering high-quality video generation capabilities comparable to proprietary models. The launch is accompanied by a $28.4 million Series A funding round.
4 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2024 TheOutpost.AI All rights reserved