Curated by THEOUTPOST
On Tue, 22 Oct, 4:06 PM UTC
4 Sources
[1]
Genmo launches Mochi 1 Text-to-Video Generation Model, But Server Crashes Within Hours
On October 22, Genmo, the AI based video generating platform, released Mochi 1, a new state-of-the-art open-source text-to-video generation model. It can generate high quality videos with mere text prompts. In addition to Mochi 1, the company also unveiled its hosted playground, where users can try Mochi 1 for free. The weights and architecture for Mochi 1 are open and available on HuggingFace. However, the company gained traction soon after its release as Genmo acknowledged the website crash, attributing it to the rise in traffic. "We're seeing extremely high load. Some users are seeing errors, please bear with us as we scale up capacity," the company announced on X. The company claims Mochi 1 excels in realistic motion dynamics and follows text instructions. Trained with a 10 billion parameter diffusion model, Mochi 1 creates videos at 30 frames per second. Genmo has released the model under a permissive Apache 2.0 license. Mochi 1 is similar to proprietary models, including Runway's Gen3 Alpha, Luma AI's Dream Machine , Kuaishou's Kling, and Minimax's Hailuo , among others. "At Genmo, our mission is to unlock the right brain of artificial general intelligence. Mochi 1 is the first step toward building world simulators that can imagine anything, whether possible or impossible," says the Genmo team. In the research preview of Mochi 1, it unlocked multiple possibilities across different fields. Mochi 1 advances video generation techniques and fosters exploration of novel methodologies in research and development. For product development, it supports innovative applications in entertainment, advertising, and education. It also enables artists and creators to express their visions through AI-generated videos, expanding creative capabilities. Additionally, in robotics, it aids in generating synthetic data for training AI models in autonomous vehicles, robotics, and virtual environments, catering to various industries. Genmo has partnered with top platforms to ensure Mochi 1 is easily accessible to developers. Additionally, it can integrate Mochi 1 into applications seamlessly using APIs from partner platforms, offering a straightforward way to leverage its capabilities in projects.
[2]
Genmo introduces Mochi 1, an open-source text-to-video generation model - SiliconANGLE
The company said Mochi 1 represents dramatic improvements in state-of-the-art quality of motion as well as the ability to keep in line with the query text written by users. It's not uncommon for AI models to "daydream" even when given specific instructions in text, so Genmo said that its model has been trained to adhere strongly to instructions. In addition to the new model release, Genmo unveiled a new hosted playground, where users can try out Mochi 1 for free. The weights are also available on the AI model hosting site Hugging Face. Alongside the news, Genmo shared that the company raised $28.4 million in Series A funding led by NEA with participation from The House Fund, Gold House Ventures and WndrCo, Eastlink Capital Partners and Essence VC. The company said that it would use the funding to help unlock what it calls the "right brain of artificial general intelligence." Mochi 1 represents what the company says is the first step toward building that right brain, which is commonly associated with creativity, whereas the left brain is associated with analytical and logical thinking. Much investment and work has been placed in video generation since the launch of highly featured AI video generators such as Runway AI Inc.'s model and OpenAI's Sora. The company said the new model sets a high bar for realistic motion dynamics by understanding physics such as fluid movement, fur and hair simulation, and most importantly human motion. The model can generate smooth videos at 30 frames per second for durations up to 5.4 seconds - which is currently the industry standard for most models on the market. When prompting, it sticks very closely to what people tell it when they are clear and concise with what they want it to display. This ensures that it delivers accurate videos that reflect what users instruct it to perform, the company said, giving users detailed control over characters, scenes and other controls. To build Mochi 1, Genmo used a 10 billion-parameter diffusion model, representing the number of variables that can be used to train a model to make it more accurate. Under the hood, the company used its own Asymmetric Diffusion Transformer, or AsymmDiT, architecture that the company said can efficiently process user prompts and compressed video tokens by streamlining text processing to focus on visuals. AsymmDiT jointly builds video using text and visual tokens, similar to Stable Diffusion 3, but the company said its streaming architecture has nearly four times as many parameters as the text stream through a larger hidden dimension. Using an asymmetric design, it can lower its memory use for deployment. The Mochi 1 preview showcases a preview base model that can generate 480p video, but the company said the full release of the model is slated for release before the end of the year. The full model will include Mochi 1 HD, which will support 720p video generation and enhanced fidelity for smoother motion. Genmo said it trained Mochi 1 entirely from scratch. At 10 billion parameters, it said, it represents the largest video generative video model ever released to open source. The company's existing closed-source image and video generation models already have more than 2 million users. Released under the Apache 2.0 open-source license, Mochi 1's model weights and source code are available for developers and researchers to work with and can be found on GitHub and Hugging Face.
[3]
Meet Mochi-1 -- the latest free and open-source AI video model
The generative AI wars are building to a crescendo as more and more companies release their own models. Generative video seems to be the biggest current battleground and Genmo is taking a different approach. The company is releasing its Mochi-1 model as a 'research preview', but the new video generation model falls under an Apache 2.0 license which makes it open source and able to be taken apart and put back together again. That also means Mochi-1 is free to use, and you can try it for yourself over on Genmo's site. The beauty of it being open-source also means it will be available on all the usual generative AI platforms in the future, and one day could run on a good gaming PC. It is launching into a very competitive market with different services offering a range of capabilities including templates from Haiper, realism from Kling or Hailuo and fun effects from Pika Labs and Dream Machine. Genmo says its focus is bringing state-of-the-art to open-source. So, why use Genmo's model over any others on offer right now? It all comes down to motion. We spoke to Genmo's CEO Paras Jain, who explained that motion is a key metric when benchmarking models. "I think fundamentally for a very long time, the only uninteresting video is one which doesn't move. And I felt like a lot of AI video kind of suffered this 'Live Photo effect'", he explains. "I think our historical models had this, that was how the technology had to evolve. But videos about motion, were the most important thing we invested in, above all else." This initial release is a surprisingly small 10 billion parameter transformer diffusion model that uses a new asynchronous approach to pack more punch into a small package. Jain said they exclusively trained Mochi-1 on video, rather than the more traditional mixed video, image and text approach. This gave it a better understanding of physics. The team then worked on ensuring the model could properly understand what people wanted it to make. He told us: "We've invested really, really heavily in prompt adherence as well, just following what you say." Genmo hopes Mochi-1 can offer 'best-in-class' open-source video generation, but at present, videos are limited to 480p as part of the new research preview launching today. As Jain mentions, a big focus has been placed on prompt adherence and recognition, too. Genmo benchmarks this with a vision language model as a judge following Open AI's DALL-E 3. Will you be testing Mochi-1? Let us know. It's certainly entering a crowded landscape, but its open-source nature could see it extend further than some of its rivals. It isn't even the only open-source AI video model to launch this week. AI company Rhymes dropped Allegro "a small and efficient open-source text-to-video model". It is also available with an Apache license although its 15 frames per second and 720p, rather than the 24 frames per second and 420p of Mochi-1. Neither model will run on your laptop yet, but as Jain told us, the beauty of open-source is that one day someone will fine-tune it to run on lower powered hardware and we'll be making videos offline.
[4]
AI video startup Genmo launches Mochi 1, an open source rival to Runway, Kling, and others
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Genmo, an AI company focused on video generation, has announced the release of a research preview for Mochi 1, a groundbreaking open-source model for generating high-quality videos from text prompts -- and claims performance comparable to, or exceeding, leading closed-source/proprietary rivals such as Runway's Gen-3 Alpha, Luma AI's Dream Machine, Kuaishou's Kling, Minimax's Hailuo, and many others. Available under the permissive Apache 2.0 license, Mochi 1 offers users free access to cutting-edge video generation capabilities -- whereas pricing for other models starts at limited free tiers but goes as high as $94.99 per month (for the Hailuo Unlimited tier). In addition to the model release, Genmo is also making available a hosted playground, allowing users to experiment with Mochi 1's features firsthand. The 480p model is available for use today, and a higher-definition version, Mochi 1 HD, is expected to launch later this year. Initial videos shared with VentureBeat show impressively realistic scenery and motion, particularly with human subjects as seen in the video of an elderly woman below: Advancing the state-of-the-art Mochi 1 brings several significant advancements to the field of video generation, including high-fidelity motion and strong prompt adherence. According to Genmo, Mochi 1 excels at following detailed user instructions, allowing for precise control over characters, settings, and actions in generated videos. Genmo has positioned Mochi 1 as a solution that narrows the gap between open and closed video generation models. "We're 1% of the way to the generative video future. The real challenge is to create long, high-quality, fluid video. We're focusing heavily on improving motion quality," said Paras Jain, CEO and co-founder of Genmo, in an interview with VentureBeat. Jain and his co-founder started Genmo with a mission to make AI technology accessible to everyone. "When it came to video, the next frontier for generative AI, we just thought it was so important to get this into the hands of real people," Jain emphasized. He added, "We fundamentally believe it's really important to democratize this technology and put it in the hands of as many people as possible. That's one reason we're open sourcing it." Already, Genmo claims that in internal tests, Mochi 1 bests most other video AI models -- including the proprietary competition Runway and Luna -- at prompt adherence and motion quality. Series A funding to the tune of $28.4M In tandem with the Mochi 1 preview, Genmo also announced it has raised a $28.4 million Series A funding round, led by NEA, with additional participation from The House Fund, Gold House Ventures, WndrCo, Eastlink Capital Partners, and Essence VC. Several angel investors, including Abhay Parasnis (CEO of Typespace) and Amjad Masad (CEO of Replit), are also backing the company's vision for advanced video generation. Jain's perspective on the role of video in AI goes beyond entertainment or content creation. "Video is the ultimate form of communication -- 30 to 50% of our brain's cortex is devoted to visual signal processing. It's how humans operate," he said. Genmo's long-term vision extends to building tools that can power the future of robotics and autonomous systems. "The long-term vision is that if we nail video generation, we'll build the world's best simulators, which could help solve embodied AI, robotics, and self-driving," Jain explained. Open for collaboration -- but training data is still close to the vest Mochi 1 is built on Genmo's novel Asymmetric Diffusion Transformer (AsymmDiT) architecture. At 10 billion parameters, it's the largest open source video generation model ever released. The architecture focuses on visual reasoning, with four times the parameters dedicated to processing video data as compared to text. Efficiency is a key aspect of the model's design. Mochi 1 leverages a video VAE (Variational Autoencoder) that compresses video data to a fraction of its original size, reducing the memory requirements for end-user devices. This makes it more accessible for the developer community, who can download the model weights from HuggingFace or integrate it via API. Jain believes that the open-source nature of Mochi 1 is key to driving innovation. "Open models are like crude oil. They need to be refined and fine-tuned. That's what we want to enable for the community -- so they can build incredible new things on top of it," he said. However, when asked about the model's training dataset -- among the most controversial aspects of AI creative tools, as evidence has shown many to have trained on vast swaths of human creative work online without express permission or compensation, and some of it copyrighted works -- Jain was coy. "Generally, we use publicly available data and sometimes work with a variety of data partners," he told VentureBeat, declining to go into specifics due to competitive reasons. "It's really important to have diverse data, and that's critical for us." Limitations and roadmap As a preview, Mochi 1 still has some limitations. The current version supports only 480p resolution, and minor visual distortions can occur in edge cases involving complex motion. Additionally, while the model excels in photorealistic styles, it struggles with animated content. However, Genmo plans to release Mochi 1 HD later this year, which will support 720p resolution and offer even greater motion fidelity. "The only uninteresting video is one that doesn't move -- motion is the heart of video. That's why we've invested heavily in motion quality compared to other models," said Jain. Looking ahead, Genmo is developing image-to-video synthesis capabilities and plans to improve model controllability, giving users even more precise control over video outputs. Expanding use cases via open source video AI Mochi 1's release opens up possibilities for various industries. Researchers can push the boundaries of video generation technologies, while developers and product teams may find new applications in entertainment, advertising, and education. Mochi 1 can also be used to generate synthetic data for training AI models in robotics and autonomous systems. Reflecting on the potential impact of democratizing this technology, Jain said, "In five years, I see a world where a poor kid in Mumbai can pull out their phone, have a great idea, and win an Academy Award -- that's the kind of democratization we're aiming for." Genmo invites users to try the preview version of Mochi 1 via their hosted playground at genmo.ai/play, where the model can be tested with personalized prompts. A call for talent As it continues to push the frontier of open-source AI, Genmo is actively hiring researchers and engineers to join its team. "We're a research lab working to build frontier models for video generation. This is an insanely exciting area -- the next phase for AI -- unlocking the right brain of artificial intelligence," Jain said. The company is focused on advancing the state of video generation and further developing its vision for the future of artificial general intelligence.
Share
Share
Copy Link
Genmo releases Mochi 1, an open-source text-to-video AI model, offering high-quality video generation capabilities comparable to proprietary models. The launch is accompanied by a $28.4 million Series A funding round.
Genmo, an AI-based video generation platform, has launched Mochi 1, a state-of-the-art open-source text-to-video generation model. Released on October 22, Mochi 1 represents a significant advancement in AI-powered video creation, challenging proprietary models with its high-quality output and open-source nature 1.
Mochi 1 boasts impressive capabilities, including:
The model excels in understanding physics, including fluid movement, fur and hair simulation, and human motion 2.
Mochi 1 is built on a 10 billion parameter diffusion model, making it the largest open-source video generation model to date. It utilizes Genmo's proprietary Asymmetric Diffusion Transformer (AsymmDiT) architecture, which efficiently processes user prompts and compressed video tokens 2.
Unlike its proprietary competitors, Mochi 1 is released under the Apache 2.0 license, making it freely accessible to developers and researchers. This open-source approach aims to democratize AI video generation technology and foster innovation in the field 3.
Coinciding with the Mochi 1 launch, Genmo announced a $28.4 million Series A funding round led by NEA, with participation from several other investors. The company aims to "unlock the right brain of artificial general intelligence" and views Mochi 1 as a step towards building advanced world simulators 4.
Mochi 1's release opens up possibilities across various fields:
While Mochi 1 represents a significant advancement, it still faces some limitations:
Genmo plans to address these issues with the upcoming release of Mochi 1 HD, which will support 720p resolution and offer enhanced motion fidelity 4.
Reference
[1]
Analytics India Magazine
|Genmo launches Mochi 1 Text-to-Video Generation Model, But Server Crashes Within Hours[2]
Runway AI, a leader in AI-powered video generation, has launched an API for its advanced video model. This move aims to expand access to its technology, enabling developers and enterprises to integrate powerful video generation capabilities into their applications and products.
8 Sources
Runway introduces Gen-3 Alpha Turbo, an AI-powered tool that can turn selfies into action-packed videos. This advancement in AI technology promises faster and more cost-effective video generation for content creators.
2 Sources
Meta introduces Movie Gen, an advanced AI model capable of generating and editing high-quality videos and audio from text prompts, potentially revolutionizing content creation for businesses and individuals.
46 Sources
AI2 introduces Molmo, a free and open-source AI model that outperforms GPT-4 and Claude on certain benchmarks. This development could potentially reshape the AI landscape and democratize access to advanced language models.
3 Sources
Haiper, an AI startup, has launched Haiper 2.0, a new video generation model that promises faster creation of ultra-realistic short clips. The model utilizes advanced AI architectures and is set to enhance the company's suite of video generation services.
2 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2024 TheOutpost.AI All rights reserved