NVIDIA Cosmos 3 Brings Vision Reasoning to Physical AI for Robots and Autonomous Vehicles

2 Sources

Share

NVIDIA launched Cosmos 3, an open world foundation model that combines vision reasoning, multimodal generation, and action prediction to help robots and autonomous vehicles understand and respond to real-world scenarios. Built on a breakthrough mixture-of-transformers architecture, it tops leaderboards for physical AI benchmarks and reduces training cycles from months to days.

NVIDIA Cosmos 3 Transforms Physical AI Development

NVIDIA announced NVIDIA Cosmos 3 at GTC Taipei during COMPUTEX, introducing what the company calls the world's first fully open omnimodel for physical AI

1

. This open world foundation model addresses a critical challenge facing developers: enabling robots, autonomous vehicles, and smart spaces to operate autonomously in unpredictable scenarios where capturing real-world training data proves slow, expensive, and often impossible to replicate at scale

1

.

Source: NVIDIA

Source: NVIDIA

The foundation model combines vision reasoning and multimodal generation across text, video, images, ambient sound, and action prediction in a single system

1

. According to Jensen Huang, NVIDIA's founder and CEO, "The big bang of physical AI is just around the corner thanks to breakthroughs in multimodal reasoning language, vision and world models"

2

. He emphasized that Cosmos 3 gives developers a generational leap in building AI for robots and autonomous vehicles that can perceive, reason, plan, and act in the physical world

2

.

Breakthrough Mixture-of-Transformers Architecture Enables World Simulation

Cosmos 3 tackles fundamental obstacles in physical AI through its novel mixture-of-transformers architecture that pairs a reasoning transformer with an expert generation transformer

2

. This design enables the model to understand object interactions, motion, and spatial-temporal relationships before generating video and action trajectories, helping systems anticipate causal relationships in dynamic environments

2

.

The model was trained on one of the largest multimodal physical AI datasets, comprising billions of samples across text, image, video, sound, and action trajectories

2

. This extensive training gives developers a powerful pretrained foundation for building physical AI systems with less data and lower training costs, reducing training and evaluation workflows from months to days

2

.

Leaderboard Performance and Practical Applications for Robotics

Cosmos 3 models deliver leading results on physical AI benchmarks, ranking first among open models across Artificial Analysis, Physics-IQ, PAI-Bench, and R-Bench for world generation accuracy

2

. The model also tops RoboLab and RoboArena for action policy, plus VANTAGE-Bench and TAR leaderboards for vision understanding

2

.

Developers can deploy Cosmos 3 in three configurations: Cosmos 3 Super for post-training robotics and AV models requiring highest physics accuracy, Cosmos 3 Nano for high-quality video and action reasoning in fractions of a second, and Cosmos 3 Edge for real-time inference at the edge, coming soon

2

. In warehouses, robots can now handle object configurations they've never encountered, while autonomous vehicles gain the ability to predict safety predictions when pedestrians step out from between parked cars

1

.

Cosmos Coalition Advances Open Collaboration

NVIDIA launched the Cosmos Coalition alongside the model release, establishing a global collaboration between world model builders and AI developers

2

. Founding members include Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI, who will contribute models, research, and evaluation techniques while accessing Cosmos 3 technologies and NVIDIA DGX Cloud infrastructure for large-scale training

2

.

Physical AI developers across industries are already building on the Cosmos platform, including Agile Robots, Doosan Robotics, LG Electronics, Samsung, and Skild AI for robotics applications, Li Auto for autonomous vehicles, and companies like Centific, Fogsphere, Linker Vision, and Mile for various AI agent skills

2

. The platform now includes new datasets for robotics, physics, human motion, autonomous driving, warehouse safety, and spatial reasoning, plus neural scene reconstruction, defect-image generation, and video augmentation capabilities

2

.

Today's Top Stories

TheOutpost.ai

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Instagram logo
LinkedIn logo
Youtube logo
© 2026 TheOutpost.AI All rights reserved