2 Sources
[1]
How Cosmos 3 Helps Physical AI Think Before It Acts
The new, open NVIDIA world foundation model brings vision reasoning, multimodal generation and action prediction together to help robots, autonomous vehicles and vision AI agents think before acting in the real world. The real world is always in motion. To operate autonomously, physical AI systems -- including robots, autonomous vehicles (AVs) and smart spaces -- need to understand not just what they see and what caused that to happen, but what's likely to happen next. In a warehouse, a robot may encounter object configurations it's never seen before. On the road, an AV may need to respond when a pedestrian steps out from between parked cars. And in a factory, a safety system must predict where a forklift is heading, not just detect that it's there. Capturing and recreating those scenarios in the real world is slow, expensive and often impossible to repeat at scale. NVIDIA Cosmos 3 is built for that loop. The new world foundation model -- announced today at NVIDIA GTC Taipei at COMPUTEX -- combines vision reasoning and multimodal generation across text, video, images, ambient sound and action in a single model to help developers create world data with physical context.
[2]
NVIDIA Launches Cosmos 3, the Open Frontier Foundation Model for Physical AI
* NVIDIA Cosmos 3 is a new leaderboard-topping open physical AI foundation model, built on a breakthrough mixture-of-transformers architecture for physical AI reasoning, world simulation and action generation. * Cosmos 3 is the world's first fully open omnimodel with native vision reasoning and multimodal generation across text, image, video, ambient sound and action for state-of-the-art synthetic data generation and physical AI policy model development. * NVIDIA launches the NVIDIA Cosmos Coalition with leading AI labs and robotics leaders -- including Agile Robots, Black Forest Labs, Generalist, LTX, Runway and Skild AI -- to advance the next generation of open world models. NVIDIA GTC Taipei -- NVIDIA today launched NVIDIA Cosmosâ„¢ 3, an open world foundation model for physical AI built on a breakthrough mixture-of-transformers architecture that combines vision reasoning, world generation and action prediction in a single system. Cosmos 3 is the world's first fully open omnimodel that can natively understand and generate text, images, video, ambient sound and actions with leading physics accuracy, reducing physical AI training and evaluation cycles from months to days. NVIDIA also launched the NVIDIA Cosmos Coalition, a global collaboration between world model builders and AI developers -- including Agile Robots, Black Forest Labs, Generalist, LTX, Runway and Skild AI -- working together to advance next-generation world models. "The big bang of physical AI is just around the corner thanks to breakthroughs in multimodal reasoning language, vision and world models," said Jensen Huang, founder and CEO of NVIDIA. "The Cosmos 3 family of open, frontier omnimodels gives developers a generational leap in ability to build robots, autonomous vehicles and vision AI that perceive, reason, plan and act in the physical world." A New Architecture for Physical AI Cosmos 3 tackles a fundamental challenge in physical AI: enabling robots, autonomous vehicles (AVs) or vision agents to generalize in the real world with limited training data and fragmented simulation stacks. The model's mixture-of-transformers architecture pairs a reasoning transformer with an expert generation transformer, enabling Cosmos 3 to understand object interactions, motion and spatial-temporal relationships before generating video and action trajectories. Trained on one of the largest multimodal physical AI datasets -- including billions of samples across text, image, video, sound and action trajectories -- the model gives developers a powerful pretrained foundation for building physical AI systems with less data and lower training costs. Developers can use Cosmos 3 as: * A vision language model that understands and reasons across modalities. * A world model or video foundation model that simulates physical environments and predicts future world states for training and evaluation. * The backbone for world action models that help train robots to perform specific tasks. Cosmos 3 models deliver leading results on physical AI benchmarks. Among open models, it ranks first across Artificial Analysis, Physics-IQ, PAI-Bench and R-Bench for world generation accuracy, RoboLab and RoboArena for action policy, and the VANTAGE-Bench and TAR leaderboards for vision understanding. The Cosmos 3 lineup gives developers options for different stages of physical AI development: * Cosmos 3 Super for post-training robotics and AV models that need the highest physics accuracy and generation quality. * Cosmos 3 Nano for high-quality video and action reasoning in fractions of a second. * Cosmos 3 Edge, coming soon, for real-time inference at the edge. Cosmos Coalition Accelerates Open World Model Development The Cosmos Coalition is a global collaboration between world model builders, AI developers and physical AI leaders to advance open world models across industries, enabling members to contribute models, research and evaluation techniques while using Cosmos 3 technologies, training tools and NVIDIA DGXâ„¢ Cloud infrastructure for large-scale training. Founding coalition members include Agile Robots, Black Forest Labs, Generalist, LTX, Runway and Skild AI. By building in the open and contributing across a shared ecosystem, the coalition aims to enable faster innovation, broader interoperability and more rapid advances in physical AI. Developers Build on Cosmos The Cosmos platform powers NVIDIA's physical AI stack to accelerate training and evaluation workflows across industries. The platform now includes new datasets for robotics, physics, human motion, autonomous driving, warehouse safety and spatial reasoning, as well as new physical AI agent skills for neural scene reconstruction, defect-image generation and video augmentation. Physical AI developers are building on the Cosmos platform across industries -- Agile Robots, Doosan Robotics, LG Electronics, Samsung and Skild AI for robotics, Li Auto for AVs, and Centific, Fogsphere, Linker Vision, Milestone Systems and Yuan for vision AI agents to power industrial AI and smart spaces applications. Availability Cosmos 3 Super and Cosmos 3 Nano are available now, with Cosmos 3 Edge coming soon for real-time inference. Developers can try Cosmos 3 on build.nvidia.com, download open models from Hugging Face, customize models and generate synthetic data with Hugging Face Diffusers and resources on GitHub, and deploy the models as NVIDIA NIMâ„¢ microservices. Model builders and software providers can accelerate access, customization and deployment of Cosmos for key reasoning and synthetic data generation workloads using physical AI agent skills on GitHub through inference services and cloud infrastructure partners including Baseten, CoreWeave, Microsoft Azure, Nebius, Deep Infra and Classmethod. Watch the keynote from Huang, learn more at NVIDIA GTC Taipei and explore these physical AI sessions.
Share
Copy Link
NVIDIA launched Cosmos 3, an open world foundation model that combines vision reasoning, multimodal generation, and action prediction to help robots and autonomous vehicles understand and respond to real-world scenarios. Built on a breakthrough mixture-of-transformers architecture, it tops leaderboards for physical AI benchmarks and reduces training cycles from months to days.
NVIDIA announced NVIDIA Cosmos 3 at GTC Taipei during COMPUTEX, introducing what the company calls the world's first fully open omnimodel for physical AI
1
. This open world foundation model addresses a critical challenge facing developers: enabling robots, autonomous vehicles, and smart spaces to operate autonomously in unpredictable scenarios where capturing real-world training data proves slow, expensive, and often impossible to replicate at scale1
.
Source: NVIDIA
The foundation model combines vision reasoning and multimodal generation across text, video, images, ambient sound, and action prediction in a single system
1
. According to Jensen Huang, NVIDIA's founder and CEO, "The big bang of physical AI is just around the corner thanks to breakthroughs in multimodal reasoning language, vision and world models"2
. He emphasized that Cosmos 3 gives developers a generational leap in building AI for robots and autonomous vehicles that can perceive, reason, plan, and act in the physical world2
.Cosmos 3 tackles fundamental obstacles in physical AI through its novel mixture-of-transformers architecture that pairs a reasoning transformer with an expert generation transformer
2
. This design enables the model to understand object interactions, motion, and spatial-temporal relationships before generating video and action trajectories, helping systems anticipate causal relationships in dynamic environments2
.The model was trained on one of the largest multimodal physical AI datasets, comprising billions of samples across text, image, video, sound, and action trajectories
2
. This extensive training gives developers a powerful pretrained foundation for building physical AI systems with less data and lower training costs, reducing training and evaluation workflows from months to days2
.Cosmos 3 models deliver leading results on physical AI benchmarks, ranking first among open models across Artificial Analysis, Physics-IQ, PAI-Bench, and R-Bench for world generation accuracy
2
. The model also tops RoboLab and RoboArena for action policy, plus VANTAGE-Bench and TAR leaderboards for vision understanding2
.Developers can deploy Cosmos 3 in three configurations: Cosmos 3 Super for post-training robotics and AV models requiring highest physics accuracy, Cosmos 3 Nano for high-quality video and action reasoning in fractions of a second, and Cosmos 3 Edge for real-time inference at the edge, coming soon
2
. In warehouses, robots can now handle object configurations they've never encountered, while autonomous vehicles gain the ability to predict safety predictions when pedestrians step out from between parked cars1
.Related Stories
NVIDIA launched the Cosmos Coalition alongside the model release, establishing a global collaboration between world model builders and AI developers
2
. Founding members include Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI, who will contribute models, research, and evaluation techniques while accessing Cosmos 3 technologies and NVIDIA DGX Cloud infrastructure for large-scale training2
.Physical AI developers across industries are already building on the Cosmos platform, including Agile Robots, Doosan Robotics, LG Electronics, Samsung, and Skild AI for robotics applications, Li Auto for autonomous vehicles, and companies like Centific, Fogsphere, Linker Vision, and Mile for various AI agent skills
2
. The platform now includes new datasets for robotics, physics, human motion, autonomous driving, warehouse safety, and spatial reasoning, plus neural scene reconstruction, defect-image generation, and video augmentation capabilities2
.Summarized by
Navi
07 Jan 2025•Technology

12 Aug 2025•Technology

07 Jan 2025•Technology

1
Business and Economy

2
Policy and Regulation

3
Technology
