NVIDIA Cosmos 3 Physical AI Foundation Model Launch

NVIDIA Introduces Cosmos 3 at GTC Taipei

NVIDIA unveiled NVIDIA Cosmos 3 at GTC Taipei during COMPUTEX, marking a significant expansion in the company's push beyond chips into physical AI systems 1

. The new open world foundation model addresses a fundamental challenge: enabling robots and autonomous systems to operate in real-world environments where capturing and recreating scenarios is slow, expensive, and often impossible to repeat at scale 1

. "The big bang of physical AI is just around the corner thanks to breakthroughs in multimodal reasoning language, vision and world models," said Jensen Huang, founder and CEO of NVIDIA 4

Source: Geeky Gadgets

Foundation Model Trained on 20 Trillion Tokens

NVIDIA trained Cosmos 3 on 20 trillion tokens of multimodal data, including nearly a billion images, 400 million real and synthetic videos, ambient audio, text and action data from humans and robots 3

. This massive dataset gives developers a powerful pretrained foundation for building physical AI systems with less data and lower training costs 4

. The action data distinguishes Cosmos from regular video generators, as it's designed to model how machines move, not just how scenes look, according to Ming-Yu Liu, VP of NVIDIA's Cosmos Lab 3

. The multimodal AI model can generate rare or dangerous scenarios such as robot collisions or unusual road events that are difficult, expensive or unsafe to capture repeatedly 3

Mixture-of-Transformers Architecture Powers Vision Reasoning

The foundation model is built on a breakthrough mixture-of-transformers architecture that pairs a reasoning transformer with an expert generation transformer 4

. This dual-tower design enables vision reasoning and multimodal generation across text, images, video, ambient sound and action in a single system 1

. The architecture allows Cosmos 3 to understand object interactions, motion and spatial-temporal relationships before generating video and action trajectories 4

. Developers can use the model as a vision language model, a world model for simulating physical environments, or as the backbone for world action models that help train robotics systems to perform specific tasks 4

Source: NVIDIA

Action Prediction Enables Real-World Applications

Cosmos 3 is designed to generate action data such as robot joint angles, gripper positions and trajectories that can help train machines to navigate and manipulate the physical world 3

. In a warehouse, a robot may encounter object configurations it's never seen before, while on the road, an autonomous vehicle may need to respond when a pedestrian steps out from between parked cars 1

. The model delivers leading results on physical AI benchmarks, ranking first among open models across Artificial Analysis, Physics-IQ, PAI-Bench and R-Bench for world generation accuracy, RoboLab and RoboArena for action policy, and the VANTAGE-Bench and TAR leaderboards for vision understanding 4

Scalable Versions for Different Use Cases

NVIDIA is releasing two versions immediately: Cosmos 3 Super, a 32-billion-parameter model for tasks requiring high physics accuracy such as training robots and autonomous vehicles, and Cosmos 3 Nano with 8 billion parameters per tower for faster inference that can generate results in fractions of a second 3

. An edge model that can run locally for real-time, on-device processing is coming soon 3

. The model reduces physical AI training and evaluation cycles from months to days 4

Cosmos Coalition and Isaac GR00T Reference Humanoid Robot

NVIDIA launched the Cosmos Coalition, a global collaboration between world model builders and AI developers including Agile Robots, Black Forest Labs, Generalist, LTX, Runway and Skild AI to advance next-generation world models in AI 4

. The company also introduced the Isaac GR00T Reference Humanoid Robot, an open reference design combining a Unitree H2 Plus humanoid robot, Sharpa dexterous hands, Jetson Thor onboard computing, and the Isaac GR00T software stack 2

. Research organizations including Ai2, ETH Zurich, Stanford Robotics Center, and UC San Diego plan to use the platform 2

Expanding Into Semiconductor Manufacturing and Industrial Automation

NVIDIA is bringing AI for semiconductor manufacturing deeper into production through its collaboration with TSMC 2

. TSMC is using NVIDIA CUDA-X libraries and AI models for computational lithography, transistor simulation, process control, wafer inspection, and fab scheduling, achieving improvements in computational efficiency while using NVIDIA Metropolis and TAO Toolkit to improve detection of nanometer-scale defects 2

. The announcements highlight NVIDIA's strategy to build a full-stack ecosystem for physical AI covering everything from synthetic data generation and simulation to real-world deployment in industrial automation 2

. Physical AI developers across industries are building on the Cosmos platform, including companies like Li Auto for autonomous vehicles and Samsung for robotics applications 4

. NVIDIA's bet is that the next wave of AI won't just answer questions or generate images but will need to predict, simulate and act in the physical world, with AI agents capable of understanding causal relationships and executing complex tasks 3

NVIDIA Cosmos 3 Brings Multimodal AI to Robots, Autonomous Vehicles and Industrial Systems

NVIDIA Introduces Cosmos 3 at GTC Taipei

Foundation Model Trained on 20 Trillion Tokens

Mixture-of-Transformers Architecture Powers Vision Reasoning

Action Prediction Enables Real-World Applications

Scalable Versions for Different Use Cases

Cosmos Coalition and Isaac GR00T Reference Humanoid Robot

Expanding Into Semiconductor Manufacturing and Industrial Automation

References

How Cosmos 3 Helps Physical AI Think Before It Acts

NVIDIA launches Cosmos 3, chip-fab tools and humanoid robot platform

Nvidia's new world model helps robots navigate the world

NVIDIA Launches Cosmos 3, the Open Frontier Foundation Model for Physical AI

Why NVIDIA's Cosmos 3 is a Massive Leap for Multimodal AI

Related Stories

Nvidia Unveils Cosmos: A Groundbreaking AI Platform for Physical World Interaction

Nvidia launches Cosmos 3 Edge model and expands its physical AI push in Japan

NVIDIA Unveils Cosmos Reason and Advanced AI Models for Robotics and Physical AI Applications

Recent Highlights

OpenAI and Anthropic AI Models Breach Multiple Companies During Security Tests

Google DeepMind unveils Gemini Robotics 2 with intelligent whole-body control for humanoids

Nvidia forms Open Secure AI Alliance with Microsoft, but OpenAI, Google and Anthropic sit out

Recent Highlights

Today's Top Stories

Rogue AI Models Launch Autonomous Cyberattacks, Raising Untested Legal Questions on Responsibility

Sam Altman's ChatGPT Parenting Suggestion Draws 122,000 Likes on Critical Reply

Chinese Military Researchers Tap US AI Models to Train Defence Systems Via Distillation

AI Scammers Now Better Than Humans at Building Trust in Romance Scams, Study Finds