Curated by THEOUTPOST
On Wed, 26 Feb, 12:04 AM UTC
8 Sources
[1]
Alibaba Releases New Open-Source Suite of AI Video Generation Models
Alibaba's Wan 2.1 supports Chinese and English text prompts It can generate videos using both text and image inputs The team used a new 3D causal VAE architecture for the models Alibaba released a suite of artificial intelligence (AI) video generation models on Wednesday. Dubbed Wan 2.1, these are open-source models that can be used for both academic and commercial purposes. The Chinese e-commerce giant released the models in several parameter-based variants. Developed by the company's Wan team, these models were first introduced in January and the company claimed that Wan 2.1 can generate highly realistic videos. Currently, these models are being hosted on the AI and machine learning (ML) hub Hugging Face. The new Alibaba video AI models are hosted on Alibaba's Wan team's Hugging Face page. The model pages also detail the Wan 2.1 suite of large language models (LLMs). There are four models in total -- T2V-1.3B, T2V-14B, I2V-14B-720P, and I2V-14B-480P. The T2V is short for text-to-video while the I2V stands for image-to-video. The researchers claim that the smallest variant, Wan 2.1 T2V-1.3B, can be run on a consumer-grade GPU with as little as 8.19GB vRAM. As per the post, the AI model can generate a five-second-long video with 480p resolution using an Nvidia RTX 4090 in about four minutes. While the Wan 2.1 suite is aimed at video generation, they can also perform other functions such as image generation, video-to-audio generation, and video editing. However, the currently open-sourced models are not capable of these advanced tasks. For video generation, it accepts text prompts in Chinese and English languages as well as image inputs. Coming to the architecture, the researchers revealed that the Wan 2.1 models are designed using a diffusion transformer architecture. However, the company innovated the base architecture with new variational autoencoders (VAE), training strategies, and more. Most notably, the AI models use a new 3D causal VAE architecture dubbed Wan-VAE. It improves spatiotemporal compression and reduces memory usage. The autoencoder can encode and decode unlimited-length 1080p resolution videos without losing historical temporal information. This enables consistent video generation. Based on internal testing, the company claimed that the Wan 2.1 models outperform OpenAI's Sora AI model in consistency, scene generation quality, single object accuracy, and spatial positioning. These models are available under the Apache 2.0 licence. While it does allow for unrestricted usage for academic and research purposes, commercial usage comes with multiple restrictions.
[2]
Alibaba Makes AI Video Generator Wan 2.1 Free to Use
Alibaba, the Chinese tech giant known for its e-commerce operations, is giving people free access to its generative AI model capable of creating realistic videos and images from text and image prompts. On Wednesday, Alibaba announced that it would make its AI model for video and image generation Wan 2.1 publicly available. According to a report by Reuters, Alibaba announced that four models that are part of its Wan 2.1 series are now open source and can be downloaded and modified by users. Wan 2.1 can generate images and video from text and image inputs. These models will be available via Alibaba Cloud's Model Scope and Hugging Face, a huge repository of AI models. They will be accessible to academics, researchers, and commercial institutions globally. The news that Wan 2.1 is now publicly available -- or open source -- will undoubtedly further escalate competition with established rivals like OpenAI. Last month, the Chinese AI company DeepSeek unveiled an open-source AI image generator that it claims can outperform OpenAI's DALL-E 3 -- a move which has sent shockwaves through the U.S. stock market. DeepSeek says its AI model, Janus-Pro-7B, is better than Stability AI's Stable Diffusion and DALL-E 3. The model is a major upgrade on its previous image generator, Janus, which was launched in late 2024. The multimodal AI can read, as well as generate images. DeepSeek made Janus open-source and available to download from Hugging Face. Open-source models differ from proprietary ones like those created by OpenAI, as they do not generate direct revenue for companies. Alibaba's announcement comes amid ongoing debate over whether AI models will eventually become commoditized. OpenAI CEO Sam Altman has made detailed plans to restructure the start-up away from its original non-profit status and convert it to a for-profit company. However, earlier this month, Elon Musk made a $97.4 billion bid to buy the nonprofit that controls OpenAI to disrupt Altman's plans. "It's time for OpenAI to return to the open-source, safety-focused force for good it once was," Musk says in a statement provided by his attorney Marc Toberoff. "We will make sure that happens."
[3]
Alibaba to release open-source version of video generating AI model
BEIJING (Reuters) - Alibaba will release an open-source version of its video and image-generating artificial intelligence model, Wan 2.1, the Chinese tech giant said in a post on X on Tuesday. The company will give full details in a recorded video at 11 p.m. (1500 GMT), a spokesperson said. The release of the model comes as competition intensifies in China's AI market, following DeepSeek's launch of its latest open-source models last month. Those models have shaken the global AI industry by delivering performance comparable to leading products from companies like OpenAI, which has shifted towards closed-source offerings. Alibaba initially introduced its latest version of video and image-generating AI model in January, later renaming it Wan from Wanx, touting the model's ability to generate highly realistic visuals. In a previous statement, Alibaba highlighted its top ranking on VBench, a leaderboard for video generative models, where it leads in key dimensions such as multi-object interactions. On Tuesday, Alibaba released a preview version of its reasoning model, QwQ-Max, which it plans to make open source upon the official release of the full version, which is expected soon. Earlier this week, Alibaba announced plans to invest at least 380 billion yuan ($52 billion) over the next three years to bolster its cloud computing and AI infrastructure. (Reporting by Beijing Newsroom; Editing by Emelia Sithole-Matarise)
[4]
Alibaba Releases Open-Source Video Generation Model Wan 2.1, Outperforms OpenAI's Sora
The company has launched multiple models optimised for video generation, offering capabilities in text-to-video, image-to-video, video editing, text-to-image, and video-to-audio. Chinese tech giant Alibaba has released Wan 2.1, its open-source video foundation model, along with the code and weights. The model can generate videos with complex motions that accurately simulate real-world physics. "Wan2.1 consistently outperforms existing open-source models and state-of-the-art commercial solutions across multiple benchmarks," the company said in a blog post. The company has launched multiple models optimised for video generation, offering capabilities in text-to-video, image-to-video, video editing, text-to-image, and video-to-audio. The suite includes three main models: Wan2.1-I2V-14B, Wan2.1-T2V-14B, and Wan2.1-T2V-1.3B. The I2V-14B model generates videos at 480P and 720P resolutions, producing complex visual scenes and motion patterns. The T2V-14B model supports similar resolutions and is "the only video model capable of producing both Chinese and English text." The T2V-1.3B model is designed for consumer-grade GPUs, requiring 8.19 GB VRAM to generate a five-second 480P video in four minutes on an RTX 4090 GPU. The model outperforms OpenAI's Sora on the VBench Leaderboard, which evaluates video generation quality across 16 dimensions, including subject identity inconsistency, motion smoothness, temporal flickering, and spatial relationships. According to the company, the technical advancements in Wan2.1 are based on a new spatio-temporal variational autoencoder (VAE), scalable pre-training strategies, large-scale data construction, and automated evaluation metrics. "We propose a novel 3D causal VAE architecture specifically designed for video generation," the company said. The model implements a feature cache mechanism, reducing memory usage and preserving temporal causality. Performance tests indicate that Wan2.1's VAE reconstructs video at 2.5 times the speed of HunYuanVideo on an A800 GPU. "This speed advantage will be further demonstrated at higher resolutions due to the small size design of our VAE model and the feature cache mechanism," the company explained. Wan2.1 employs the Flow Matching framework within the Diffusion Transformer (DiT) paradigm. It integrates the T5 encoder to process multi-language text inputs with cross-attention mechanisms. "Our experimental findings reveal a significant performance improvement with this approach at the same parameter scale," the company said. Wan2.1's data pipeline involved curating and deduplicating 1.5 billion videos and 10 billion images. Alibaba recently released QwQ-Max-Preview, a new reasoning model in its Qwen AI family. The company plans to invest over $52 billion in cloud computing and artificial intelligence over the next three years.
[5]
Alibaba vs OpenAI: The Battle for Open Source AI Supremacy
These AI models are currently available globally on Alibaba Cloud's ModelScope and Hugging Face platforms for academic, research, and commercial use. Their global availability will likely trigger widespread use and subsequent innovation in AI-driven image and video creation. initially released the newest version of its video-and image-created AI model back in January, titled 'Wanx' but later renamed it to 'Wan'. Alibaba marketed the model for creating very real-looking visuals and has branded the model as something of a visionary solution within AI-generated content. Since its release, Wan has been recognized for its superior capabilities, topping Vbench, a highly regarded leaderboard for video generative models. Alibaba has particularly highlighted its model's excellence in intricate functionalities, like multi-object interaction, distinguishing it from others. Earlier on Tuesday, released a glimpse of its reasoning model, QwQ-Max, aimed at further improving AI to process and evaluate information better.
[6]
Alibaba offers free access to its AI model that can generate realistic video and images
The company is open sourcing its latest generative AI models. Alibaba is giving people free access to its generative artificial intelligence models that can produce highly realistic videos and images from both text and image input. The company has announced that four variants of its Wan 2.1 series, the latest version of its generative AI technology, are now open source and can be downloaded and modified by users. Researchers, academics and commercial entities can all get them from Alibaba Cloud's ModelScope and Hugging Face platforms, both of which give people access to open-source AI models. As Reuters said, the models Alibaba has open sourced are called T2V-1.3B, T2V-14B, I2V-14B-720P and I2V-14B-480P, with the 14B indicating that the model can accept 14 billion parameters. Last month, Chinese company DeepSeek made its R1 reasoning model free to download and use, creating a clamor for more open-source AI technologies. DeepSeek even expanded its commitment to the open-source community and is in the process of releasing five code repositories behind its service. Alibaba was one of the companies that joined the fray to develop generative AI tech following the launch of OpenAI's ChatGPT two years ago. Just recently, Alibaba Group's Chairman, Joe Tsai, said that the company's generative AI technology will power artificial intelligence features for iPhones meant for sale in the Chinese market. Apple couldn't use the same AI tech for phones released in China due to strict regulations surrounding AI products, so it has to look for local partners, Alibaba being one of them.
[7]
Alibaba to release open-source version of video-generating AI model
BEIJING, Feb 25 (Reuters) - Alibaba (9988.HK), opens new tab will release an open-source version of its video and image-generating AI model, Wan 2.1, the Chinese tech giant said in a post on X on Tuesday. The company will hold a live press conference for the official release at 11:00 pm Beijing Time (1500 GMT), the post showed. Reporting by Beijing Newsroom Our Standards: The Thomson Reuters Trust Principles., opens new tab Suggested Topics:Artificial Intelligence
[8]
Alibaba makes AI video generation model free to use globally
Open-source AI tech has been thrown into the spotlight since Chinese firm DeepSeek rattled global markets in January, after claiming its artificial intelligence model was trained at a fraction of the cost of leading AI players and on less-advanced Nvidia chips. DeepSeek's model is open source, like Alibaba's, meaning it can be downloaded and modified by others. Open source differs from proprietary models such as those created by OpenAI and do not produce revenue for companies. Open sourcing a technology serves a number of purposes, including driving innovation and building a community around a product. A debate is currently swirling about whether AI models will become commoditized. Chinese firms in particular have been pushing forward with open source models, Alibaba's and DeepSeek's now among the most popular used globally. Alibaba published its first open source model in August 2023, while Meta is leading the open source charge with its Llama models in the U.S. Alibaba's stock has been on a tear this year, with the Hong Kong listing up 66% in 2025 to date due to factors including the company's improved financial performance, its perception as one of the key AI players in China and recent signals of further support from Chinese president Xi Jinping for the domestic private sector.
Share
Share
Copy Link
Alibaba has released Wan 2.1, a suite of open-source AI video generation models, claiming superior performance to OpenAI's Sora. The models support text-to-video and image-to-video generation in multiple languages and resolutions.
In a significant move that could reshape the landscape of AI-generated content, Chinese tech giant Alibaba has released Wan 2.1, a suite of open-source artificial intelligence video generation models. This release marks a notable advancement in the field and positions Alibaba as a formidable competitor to established players like OpenAI 12.
Wan 2.1 comprises four main models: T2V-1.3B, T2V-14B, I2V-14B-720P, and I2V-14B-480P. These models offer a range of capabilities, including:
The models utilize a diffusion transformer architecture with a novel 3D causal Variational Autoencoder (VAE) dubbed Wan-VAE. This innovation improves spatiotemporal compression and reduces memory usage, enabling consistent video generation 13.
Alibaba claims that Wan 2.1 outperforms OpenAI's Sora model in several key areas, including consistency, scene generation quality, single object accuracy, and spatial positioning. The company's internal testing and rankings on the VBench Leaderboard support these assertions 14.
The models are designed for accessibility:
Alibaba's decision to make Wan 2.1 open-source under the Apache 2.0 license is significant. This move allows for unrestricted usage in academic and research contexts, with some restrictions on commercial use 12. The open-source nature of Wan 2.1 contrasts with the proprietary approach of companies like OpenAI, potentially accelerating innovation in the field 2.
The release of Wan 2.1 comes amid intensifying competition in the AI market:
As Wan 2.1 becomes available on platforms like Alibaba Cloud's ModelScope and Hugging Face, it is expected to trigger widespread use and innovation in AI-driven image and video creation 5. The global accessibility of these models could potentially democratize advanced AI capabilities, leading to new applications and advancements in various industries.
Reference
[1]
[2]
[3]
[4]
Analytics India Magazine
|Alibaba Releases Open-Source Video Generation Model Wan 2.1, Outperforms OpenAI's Sora[5]
Alibaba Group has announced a significant expansion of its artificial intelligence capabilities, including the release of over 100 new AI models and a text-to-video generation tool. This move positions Alibaba as a major player in the global AI race.
8 Sources
8 Sources
Alibaba's Qwen research team has released QVQ-72B, an experimental open-source AI model that combines visual analysis with advanced reasoning capabilities, potentially outperforming some closed-source competitors in specific benchmarks.
2 Sources
2 Sources
Alibaba's Qwen Team unveils QwQ-32B, an open-source AI model matching DeepSeek R1's performance with significantly lower computational requirements, showcasing advancements in reinforcement learning for AI reasoning.
3 Sources
3 Sources
Genmo releases Mochi 1, an open-source text-to-video AI model, offering high-quality video generation capabilities comparable to proprietary models. The launch is accompanied by a $28.4 million Series A funding round.
4 Sources
4 Sources
Lightricks launches LTX Video (LTXV 0.9), an open-source AI model capable of generating high-quality video clips in near real-time, challenging proprietary AI systems and democratizing advanced video creation.
4 Sources
4 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved