Curated by THEOUTPOST
On Tue, 7 Jan, 8:02 AM UTC
15 Sources
[1]
Nvidia brings GenAI to the physical world with Cosmos
Serving tech enthusiasts for over 25 years. TechSpot means tech analysis and advice you can trust. In what was undoubtedly one of the most anticipated and highly attended CES keynotes of all time, Nvidia CEO Jensen Huang unveiled an impressively wide-ranging set of announcements spanning many of the hottest topics in tech, including AI, robotics, autonomous vehicles, and more. Clad in a Las Vegas-glitz version of his trademark black leather jacket, the tech industry leader worked through the company's latest GeForce RTX 50 series graphics cards, new Nemotron AI foundation model families, and AI blueprints for AI-powered agents. He also highlighted extensions to the company's Omniverse digital twin and simulation platform, which extends AI into the physical world, as well as new safety certifications for its autonomous driving platform. Additionally, he introduced a mini desktop-sized AI supercomputer called Project Digits, powered by the Grace Blackwell GPU. Needless to say, it was a lot to take in. One of the most intriguing - though likely least understood - announcements was a set of foundation models and platform capabilities dubbed Cosmos. Defined as a suite of world foundation models, advanced tokenizers, safety guardrails, and an advanced video processing pipeline, Cosmos is designed to bring the training capabilities and advanced outcomes of generative AI from the digital realm into the physical world. In other words, instead of using generative AI to create new digital outputs based on training across billions of documents, images, and other digital content, Cosmos can generate new physical actions - let's call them analog outputs - by leveraging data it has been trained on from digitally simulated environments. While the concept is complex, the real-world implications are both simple and profound. For applications like robotics, autonomous vehicles, and other mechanical systems, Cosmos enables these systems to react to physical stimuli in more accurate, safe, and helpful ways. For instance, humanoid robots can be trained to physically replicate the most effective or safest way to perform a task, whether it's flipping an omelet or handling parts on a production line. Similarly, an autonomous car can dynamically adapt to varying situations and environments. Also see: AI Agents Explained: The Next Evolution in Artificial Intelligence Much of this type of training currently relies on manual efforts, such as filming humans performing the same action hundreds of times or having autonomous cars drive millions of miles. Even then, thousands of people must spend significant time hand-labeling and tagging those videos. With Cosmos, these training methods can be automated, dramatically reducing costs, saving time, and expanding the range of data available for the training process. Nvidia Cosmos is a world foundation model development platform that incorporates generative models, a data curator, tokenizers, and a framework to accelerate physical AI development. Cosmos works as an extension of Nvidia's Omniverse digital simulation environment. It translates the digital physics of models and systems created in Omniverse into physical actions in the real world. While this distinction may seem subtle, it is critically important because it enables Cosmos to produce GenAI-powered physical outputs. At the core of Cosmos are world foundation models, built from millions of hours of video content, which possess an understanding of the physical world. Cosmos takes the digital models of physical objects and environments created in Omniverse, integrates them into these world foundation models, and generates photorealistic video outputs of how the models are predicted to behave in real-world scenarios. These videos then serve as synthetic data sources, which can be used to train models running in robotic systems, autonomous cars, and other GPU-powered mechanical systems. The result is systems that can respond more effectively across diverse environments. Nvidia CEO Jensen Huang was clearly trying to tell us something as he outlined the evolution of AI technologies, from perception AI to generative AI, agentic AI, and the rise of physical AI, during his keynote at CES 2025. Another noteworthy aspect is that Nvidia is making its Cosmos world foundation models available for free to encourage advancements in robotics and autonomous vehicles, as well as foster further experimentation. In the short term, the immediate impact of Cosmos will be limited, as it primarily targets a niche audience developing advanced robotics and autonomous vehicle applications. However, in the long term, its influence could be profound, potentially speeding up the development of these product categories and improving the accuracy and safety of these systems. More importantly, it demonstrates Nvidia's ability to anticipate and prepare for emerging tech trends such as robotics. It also underscores the often-overlooked but ongoing transformation of Nvidia into a software company building platforms for these new applications. For those curious about where the company is headed and how it plans to sustain its impressive growth, these developments offer intriguing and important insights. Bob O'Donnell is the founder and chief analyst of TECHnalysis Research, LLC a technology consulting firm that provides strategic consulting and market research services to the technology industry and professional financial community. You can follow him on Twitter @bobodtech
[2]
It's 2025: Is Nvidia's Cosmos The Missing Piece For Widespread Robot Adoption?
NVIDIA's announcement of a foundation model platform to support development of robots and autonomous vehicles aligns well with one of our automation predictions for 2025: that one quarter of robotics projects will work to combine cognitive and physical automation. Many of the examples NVIDIA showed featured humanoid robots, but Cosmos is equally relevant to autonomous vehicles and other forms of physical robots. That's just as well, because another of our predictions for 2025 is clear that less than 5% of robots entering factories in 2025 will walk. We first started writing about the integration of physical and cognitive automation in 2023, based on expanding orchestration capabilities combined with AI's potential to add flexibility to physical robotics. The question being debated at Forrester is whether the January 6 launch of NVIDIA's Cosmos World Foundation Model is a turning point, or just another high-value tech company jumping into the LLM playing field. We think the former is more likely. Developers now have an "open" model designed to address physical automation use cases, meaning autonomous vehicles and robots. It's the first LLM trained to understand the physical world. It is optimized for NVIDIA chips running in the cloud, on developers' desktops, and out at the edge inside cars, trucks, and robots, and plugs into expansive NVIDIA tools and frameworks. The ChatGPT moment may have arrived for our robot friends. Yet, two things have stalled the advance of robots in the physical world so far: solid use cases and the cost of infusing agility into robots. GenAI, combined with rich training data, (video and otherwise) goes some way to solving the agility problem, but the use case problem has proven harder to solve. In 2023, we published an adoption model that showed six phases physical automation must traverse to reach the "acceptable" sweet spot (see below). For example, janitorial robots were pushed to acceptability by the pandemic, while security robots still struggle to achieve similar acceptance. The field of physical automation has, unfortunately, succumbed to the allure of media spectacle. Remember Boston Dynamics' Spot performing backflips? This impressive feat, while captivating audiences in a 60 Minutes infomercial, ultimately demonstrated limited practical applications. Nvidia should be congratulated: they have introduced the first full developer capability that can take physical automation to the next level but now needs to show equal leadership in projecting how robots can interact with humans in both a productive and non-threatening way. Forrester analysts continue to research physical and cognitive automation, both together and separately. One piece of research later this year will specifically look at physical or embodied AI in the smart manufacturing and mobility context, and all of the interesting things that happen when an AI system must observe and interact with the physical world around it: if you have perspectives to share, please do get in touch.
[3]
Nvidia Launches Cosmos Platform That Can Train and Develop AI Robots
Nvidia also introduced the Nemotron family of AI models at CES 2025 Nvidia launched Cosmos, a new artificial intelligence platform that contains multiple generative world foundation models (WFMs), on Monday at the Consumer Electronics Show (CES) 2025. The platform does not only contain these AI models but also include advanced tokenisers, accelerated video processing pipeline, and guardrails that enable the platform to develop physical AI systems such as autonomous vehicles and robots. Additionally, the company open-sourced the WFMs and made it available for academic and research purposes. Nvidia also introduced the Llama Nemotron family of AI models at CES 2025. In a newsroom post, the tech giant detailed its new Cosmos platform. The platform will only host WFMs along with several components that allow for the training and development of physical AI systems. Notably, physical AI systems are machines that come with mechanical parts and ability to interact and take actions in the real world. Nvidia highlighted that training and developing physical AI systems, including robots and autonomous vehicles, is a costly venture due to the requirement of vast amounts of real-world data and a diverse range of testing environments. Cosmos platform's WFMs solve both problems. The tech giant claimed that these world AI models can generate massive amounts of photoreal and physics-based synthetic data that can be used to train physical AI systems. The data can also be used to assess existing robots by subjecting them to testing environments. Additionally, Nvidia Cosmos also allows developers to build custom models by fine-tuning the WFMs. Nvidia's Cosmos world AI models come with video search and understanding to allow developers to find specific training videos from the large database. The models can draw on Nvidia Omniverse platform to generate physics-based controlled 3D scenarios. The platform also offers simulation-based training for physical AI. These AI models are available under an open model licence and can be previewed by developers via Nvidia application programming interface (API) catalogue or from Hugging Face. The tech giant revealed that several companies focused on robotics and physical AI have already adopted Cosmos. These include 1X, Agile Robots, Agility, Figure AI, Foretellix, Fourier, Galbot, Hillbot, IntBot, Neura Robotics, Skild AI, Virtual Incision, Waabi and XPENG, and Uber.
[4]
Nvidia launches Cosmos World Foundation Model platform to accelerate physical AI
In a keynote speech at CES 2025 by Nvidia CEO Jensen Huang, the company said the platform includes state-of-the-art generative world foundation models, advanced tokenizers, guardrails and an accelerated video processing pipeline built to advance the development of physical AI systems such as autonomous vehicles (AVs) and robots. Physical AI models are costly to develop, and require vast amounts of real-world data and testing. Cosmos world foundation models, or WFMs, offer developers an easy way to generate massive amounts of photoreal, physics-based synthetic data to train and evaluate their existing models. Developers can also build custom models by fine-tuning Cosmos WFMs. Cosmos models will be available under an open model license to accelerate the work of the robotics and AV community. Developers can preview the first models on the Nvidia API catalog, or download the family of models and fine-tuning framework from the Nvidia NGCTM catalog or Hugging Face. "It is trained on 20 million hours of video," Huang said. "Nvidia Cosmos. It's about teaching the AI to understand the physical world." Leading robotics and automotive companies, including 1X, Agile Robots, Agility, Figure AI, Foretellix, Fourier, Galbot, Hillbot, IntBot, Neura Robotics, Skild AI, Virtual Incision, Waabi, and XPENG, along with ridesharing giant Uber are among the first to adopt Cosmos. "The ChatGPT moment for robotics is coming. Like large language models, world foundation models are fundamental to advancing robot and AV development, yet not alldevelopers have the expertise and resources to train their own," said Jensen Huang, founder and CEO of Nvidia, in a statement. "We created Cosmos to democratize physical AI and put general robotics in reach of every developer." Open World Foundation Models to Accelerate the Next Wave of AI Nvidia Cosmos' suite of open models means developers can customize the WFMs with datasets, such as video recordings of AV trips or robots navigating a warehouse, according to the needs of their target application. Cosmos WFMs are purpose-built for physical AI research and development, and can generate physics-based videos from a combination of inputs, like text, image and video, as well as robot sensor or motion data. The models are built for physically based interactions, object permanence, and high-quality generation of simulated industrial environments -- like warehouses or factories -- and of driving environments, including various road conditions. In his opening keynote at CES, Huang showcased ways physical AI developers can use Cosmos models, including for: Advanced World Model Development Tools Building physical AI models requires petabytes of video data and tens of thousands of compute hours to process, curate and label that data. To help save enormous costs in data curation, training and model customization, Cosmos features: "Data-scarcity and variability are key challenges to successful learning in robot environments," said Pras Velagapudi, chief technology officer, at Agility, in a statement. "Cosmos' text-, image- and video-to-world capabilities allow us to generate and augment photorealistic scenarios in a variety of tasks that we can use to train models without needing as much expensive, real-world data capture." Transportation leaders are also using Cosmos to build physical AI for AVs. Waabi, a company pioneering generative AI for the physical world, will use Cosmos for the search and curation of video data for AV software development and simulation Wayve, which is developing AI foundation models for autonomous driving, is evaluatingCosmos as a tool to search for edge and corner case driving scenarios used for safety and validation. AV toolchain provider Foretellix will use Cosmos, alongside Nvidia Omniverse Sensor RTX APIs, to evaluate and generate high-fidelity testing scenarios and training data at scale. Global ridesharing giant Uber is partnering with NVIDIA to accelerate autonomous mobility. Rich driving datasets from Uber, combined with the features of the Cosmos platform and NVIDIA DGX Cloud, will help AV partners build stronger AI models even more efficiently. "Generative AI will power the future of mobility, requiring both rich data and very powerfulcompute," said Dara Khosrowshahi, CEO of Uber. "By working with Nvidia, we areconfident that we can help supercharge the timeline for safe and scalable autonomous driving solutions for the industry." Developing Open, Safe and Responsible AI Nvidia Cosmos was developed in line with Nvidia's trustworthy AI principles, which prioritize privacy, safety, security, transparency and reducing unwanted bias. Trustworthy AI is essential for fostering innovation within the developer community and maintaining user trust. Nvidia is committed to safe and trustworthy AI, in line with the White House's voluntary AI commitments and other global AI safety initiatives. The open Cosmos platform includes guardrails designed to mitigate harmful text and images, and features a tool to enhance text prompts for accuracy. Videos generated with Cosmos autoregressive and diffusion models on the Nvidia API catalog include invisible watermarks to identify AI-generated content, helping reduce the chances of misinformation and misattribution. Nvidia encourages developers to adopt trustworthy AI practices and further enhance guardrail and watermarking solutions for their applications. Availability Cosmos WFMs are now available under Nvidia's open model license on Hugging Face and the Nvidia NGC catalog. Cosmos models will soon be available as fully optimized Nvidia NIM microservices. Developers can access Nvidia NeMo Curator for accelerated video processing and customize their own world models with Nvidia NeMo. Nvidia DGX Cloud offers a fast and easy way to deploy these models, with enterprise support available through the Nvidia AI Enterprise software platform. Nvidia also announced new Nvidia Llama Nemotron large language models and Nvidia Cosmos Nemotron vision language models that developers can use for enterprise AI use cases in healthcare, financial services, manufacturing and more.
[5]
NVIDIA CEO Jensen Huang Declares 2025 the Year of Robotics at CES Keynote
NVIDIA CEO Jensen Huang's keynote at CES 2025 presented a bold and forward-thinking roadmap for artificial intelligence (AI), robotics, and industrial automation. At the heart of his presentation was the unveiling of NVIDIA Cosmos, a world foundation model designed to transform how machines perceive and interact with the physical world. Huang emphasized that 2025 represents a pivotal moment where the convergence of AI, simulation, and robotics will drive innovation across industries and everyday life. This vision underscores NVIDIA's commitment to advancing technology and shaping the future of human-machine collaboration. Imagine a world where robots not only understand their surroundings but also seamlessly adapt to them, performing tasks with the precision and intuition of a human. With new announcements like NVIDIA Cosmos -- a innovative AI model trained to grasp the complexities of the physical world -- and predictions of 2025 being the "year of robotics," the stage is set for a technological transformation that could redefine industries and reshape everyday life. But what does this mean for you? NVIDIA's innovations promise to make these advancements more accessible and impactful than ever. From robots that can navigate human environments without specialized setups to AI systems that simulate real-world physics with uncanny accuracy, the possibilities are as exciting as they are vast. In this overview by Wes Roth learn more about the highlights of NVIDIA CEO Jensen Huang's keynote and unpack how these innovations could change the way we work, live, and interact with technology. A central highlight of the keynote was the introduction of NVIDIA Cosmos, a world foundation model trained on an extensive dataset of 20 million hours of video. This model is designed to understand the dynamics of the physical world, excelling in areas such as spatial relationships, physical interactions, and cause-and-effect scenarios. Its versatility makes it a powerful tool for a wide range of applications. You can use Cosmos for: Cosmos is open-licensed and accessible on GitHub, allowing developers to integrate its capabilities into diverse projects. By combining real-world physics with advanced AI, Cosmos establishes a new benchmark for foundational AI systems, offering unparalleled realism and functionality. This innovation is set to empower industries and researchers alike, allowing breakthroughs in robotics, automation, and beyond. Jensen Huang declared 2025 as the "year of robotics," emphasizing the rapid advancements in general-purpose humanoid robots, autonomous vehicles, and agentic AI systems. These technologies are being developed to operate seamlessly in human environments without requiring specialized infrastructure, marking a significant leap forward in robotics integration. For instance: Synthetic data plays a critical role in these advancements. By simulating real-world scenarios, developers can train robots more effectively, allowing them to perform complex tasks with accuracy. This approach not only accelerates the development process but also reduces costs, making robotics more accessible across industries. As these technologies mature, they are poised to transform daily life, bringing us closer to a future where robots are integral to human environments. To support the rapid evolution of robotics, NVIDIA has introduced a three-computer system that integrates AI training, deployment, and simulation. This system is designed to streamline the development and implementation of robotics solutions, offering a comprehensive framework for innovation. The system includes: This interconnected system enables you to develop, test, and deploy robotics solutions with greater efficiency, reducing both time and cost compared to traditional methods. By using this framework, industries can accelerate the adoption of robotics and AI, driving innovation and productivity. Browse through more resources below from our in-depth content covering more areas on NVIDIA. The global manufacturing sector, valued at $50 trillion, stands to benefit significantly from NVIDIA's advancements in AI and robotics. Through strategic partnerships with companies like Keon and Accenture, NVIDIA is spearheading the digital transformation of manufacturing and warehouse operations. These collaborations aim to enhance efficiency, reduce waste, and automate processes to meet the demands of modern industry. Warehouse automation is a key area of focus. By integrating NVIDIA's AI and robotics technologies, you can streamline operations, improve accuracy, and minimize labor-intensive tasks. This transformation is reshaping how goods are produced, stored, and distributed, paving the way for a more efficient and sustainable industrial ecosystem. These innovations highlight the potential of AI-driven systems to address the challenges of modern manufacturing and logistics. Another significant announcement during the keynote was NVIDIA Omniverse, a physics-grounded simulation platform designed to complement Cosmos. Together, these technologies enable the creation of realistic, physically accurate AI models, making them particularly valuable for robotics and industrial AI applications. Omniverse allows you to simulate complex environments for training robots, making sure they can navigate and interact with their surroundings effectively. This capability accelerates development cycles while enhancing the reliability and safety of AI-driven systems. By bridging the gap between simulation and reality, Omniverse enables developers to create solutions that are both innovative and practical, driving progress across industries. Jensen Huang concluded his keynote with a compelling vision for the future of AI and robotics. As advancements in AI, simulation, and data processing continue to accelerate, robotics is poised to become one of the most fantastic technology industries globally. From manufacturing and transportation to healthcare and beyond, these innovations are set to redefine how industries operate and how you interact with the world. CES 2025 showcased NVIDIA's dedication to pushing the boundaries of technological innovation. By integrating AI, robotics, and simulation, the company is allowing you to harness the fantastic potential of these technologies. This vision points toward a future where AI-driven systems are seamlessly integrated into every aspect of life, shaping a world of unprecedented possibilities.
[6]
Nvidia's 'Cosmos' AI Helps Humanoid Robots Navigate the World
Nvidia CEO Jensen Huang says the new family of foundational AI models was trained on 20 million hours of "humans walking; hands moving, manipulating things." Nvidia announced today it's releasing a family of foundational AI models called Cosmos that can be used to train humanoids, industrial robots, and self-driving cars. While language models learn how to generate text by training on copious amounts of books, articles, and social media posts, Cosmos is designed to generate images and 3D models of the physical world. During a keynote presentation at the annual CES conference in Las Vegas, Nvidia CEO Jensen Huang showed examples of Cosmos being used to simulate activities inside of warehouses. Cosmos was trained on 20 million hours of real footage of "humans walking, hands moving, manipulating things," Jensen said. "It's not about generating creative content, but teaching the AI to understand the physical world." Researchers and startups hope that these kinds of foundational models could give robots used in factories and homes more sophisticated capabilities. Cosmos can, for example, generate realistic video footage boxes falling from shelves inside a warehouse, which can be used to train a robot to recognize accidents. Users can also fine-tune the models using their own data. A number of companies are already using Cosmos, Nvidia says, including humanoid robot startups Agility and Figure AI as well as self-driving car companies like Uber, Waabi, and Wayve. Nvidia also announced software designed to help different kinds of robots learn to perform new tasks more efficiently. The new feature is part of Nvidia's existing Isaac robot simulation platform that will allow robot builders to take a small number of examples of a desired task, like grasping a particular object, and generate large amounts of synthetic training data. Nvidia hopes that Cosmos and Isaac will appeal to companies looking to build and use humanoid robots. Jensen was joined on stage at CES by life-sized images of 14 different humanoid robots developed by companies including Tesla, Boston Dynamics, Agility, and Figure.
[7]
NVIDIA Launches Cosmos, a Platform to Develop World Foundation Models
Cosmos is available under an open model license on Hugging Face and the NVIDIA NGC catalogue. At CES 2025, NVIDIA unveiled Cosmos, a platform built to speed up the development of physical AI systems, including autonomous vehicles and robots. The platform includes generative world foundation models (WFMs), video tokenisers, guardrails, and an accelerated data processing pipeline to help developers create and refine AI models with reduced reliance on real-world data. Cosmos is available under an open model license on Hugging Face and the NVIDIA NGC catalogue. Fully optimised NVIDIA NIM microservices will follow, with enterprise support provided through the NVIDIA AI Enterprise software platform. Speaking at CES, NVIDIA CEO Jensen Huang said, "The ChatGPT moment for robotics is coming. Like large language models, world foundation models are fundamental to advancing robot and AV development, yet not all developers have the expertise and resources to train their own. We created Cosmos to democratise physical AI and put general robotics in reach of every developer." The Cosmos models can generate physics-based videos using inputs such as text, images, and sensor data, enabling their use in applications like video search, synthetic data generation, and reinforcement learning. Developers can customise the models to simulate industrial environments, driving scenarios, and other specific use cases. NVIDIA also introduced NeMo Curator, an accelerated video processing pipeline that can process 20 million hours of video in 14 days, and Cosmos Tokeniser, a visual data compression tool. "Data scarcity and variability are key challenges to successful learning in robot environments," said Pras Velagapudi, chief technology officer at Agility Robotics. "Cosmos' text-, image-, and video-to-world capabilities allow us to generate and augment scenarios for a variety of tasks that we can use to train models without needing as much expensive, real-world data capture." Major robotics and transportation companies, including Agile Robots, XPENG, Waabi, and Uber, have begun adopting Cosmos for their AI development. Uber CEO Dara Khosrowshahi said, "Generative AI will power the future of mobility, requiring both rich data and very powerful compute. By working with NVIDIA, we are confident that we can help supercharge the timeline for safe and scalable autonomous driving solutions for the industry." In addition to Cosmos, NVIDIA introduced the Llama Nemotron large language models and Cosmos Nemotron vision language models, developed for enterprise use in sectors including healthcare, finance, and manufacturing.
[8]
CES 2025 - NVIDIA hosts opening party for World Foundation Models
Over the last year, much of the gen AI hype has focused on Large Language Models (LLMs) trained on what people say. NVIDIA took the Consumer Electronics Show (CES) in Las Vegas as an opportunity to announce Cosmos, a new platform for streamlining the development of World Foundation Models (WFMs) that improve how machines perceive, decide, act, and learn from physical sensor data. A good example is streamlining the development of advanced driver-assistance and autonomous driving systems. The world's largest carmaker, Toyota, plans to standardize on the NVIDIA DriveOS platform integrated with Cosmos for its next-generation vehicles. Others like Aurora and Continental hope it will streamline the development of autonomous trucks. Others with varying driver assistance and autonomy plans include BYD, Lucid, Mercedes-Benz, NIO, Rivian, and Volvo Cars, among many others. More autonomous cars are a big deal, and NVIDIA expects they could drive $5 billion in revenues by 2026. Also, the company received two separate safety certifications for its Drive Hyperion autonomous driving platform and Drive OS vehicle operating system from TÜV Rheinland and TÜV SÜD for cybersecurity and safety. Although this is specific to cars, it speaks volumes about NVIDIA's progress in readying the platform for safety-critical use cases required for other use cases like robots, industrial control systems and electronics design exemplified by partners like Siemens Digital Industries and Cadence. NVIDIA has also launched Mega, an Omniverse Blueprint for developing and optimizing physical AI and robot fleets at scale in a digital twin before deploying it into facilities. For example, Accenture is working with KION Group, a supply chain solutions company, to use Mega for optimizing operating in retail, consumer packaged goods and parcel services scenarios. Enterprises can digitalize their facilities using a combination of design files, video, lidar and synthetically generated data to create and test robot management approaches. The three big takeaways include how this: WFMs are an emerging category of foundation models trained to understand, predict, and generate the behavior of physical environments, much like LLMs do for text. Cosmos is trained using diffusion and autoregressive models rather than the transformers underlying most LLMs. WFMs are trained on video, sensor feed and physics simulation to capture spatial relationships between objects and how they interact in the world. In contrast, LLMs are trained primarily on text to learn nuances of language, grammar and the relationship between concepts. WFMs are better at generating realistic video sequences and sensor data representations predicting future states of physical environments and generating synthetic data from training robots and autonomous cars. LLMs are better at writing code and summaries, identifying relationships between concepts, and translating plain language prompts into appropriate formats for other tools. A good example lies in connecting the dots between text and physics-based world representations. For example, a user can describe an edge case in plain language that characterizes the environment, objects, actions and unusual or challenging conditions. The LLM could translate this into a structured representation for a 3D scene graph suitable for the WFM. Cosmos builds on NVIDIA's considerable work in promoting USD as a standard glue for representing 3D data across vendors and various open-source tools. This provides a more consistent data representation and improves collaboration. However, it does not inherently address the challenges of simulating physically accurate scenarios that require additional tooling and integration. More significantly, Cosmos also provides an ecosystem of tools to improve data processing, curation, and labeling. This will be a bigger deal than it sounds because it will make adding metadata and context to existing data sets easier to improve processing and reduce hallucinations when processed by other neural (ML/AI) or symbolic tools (most enterprise apps and logical software code). The last big takeaway is NVIDIA's "three computer" paradigm that promises to streamline the development of physics-aware applications across training, simulation, and deployment. The three computers include: The big deal is that these three components are engineered to work together and make it easier to feed data and insights from real-world deployments into the training and simulation phases. For example, data from the AGX system could highlight areas where the models excel or need improvement. The integration also makes it easier to continually refine and enhance the AI models to be more accurate and robust. It's important to note that none of these developments are coming completely out of the blue. This is a consolidation and simplification of workflows across the ecosystem of chips and applications NVIDIA has already worked on for digital twins, simulation, and AI. This will simplify the process of developing physical AI systems across different tools and vendors. Another thing to note is that competition is heating up for new AI-specific chips from big vendors like AMD and Intel, along with a host of neural-chip startups. Some of these solutions may support better price/energy/performance benefits on various AI-specific training and inferencing processes. However, none of them address the considerable data integration, contextualization, and labeling challenges required to build more trustworthy and capable AI systems, which NVIDIA has and continues to do. There has also been considerable speculation that the AI bubble will soon burst due to over-investment in data centers and questionable returns from existing approaches. I think the dawn of physical AI opens a new chapter of growth in applying AI to more practical problems. However, this will require finding better ways to organize and contextualize data across many complementary approaches rather than just throwing exabytes of data at a giant AI model.
[9]
Nvidia Brings GenAI To The Physical World With Cosmos
Cosmos acts as a type of extension to Nvidia's Omniverse digital simulation environment and takes the digital physics of the models and systems that are created in Omniverse and translates them into physical actions in the real world. In what was undoubtedly the most eagerly anticipated, most closely watched and highly attended CES keynotes of all time, Nvidia (NASDAQ:NVDA) CEO Jensen Huang managed to once again unveil an impressively wide-ranging set of announcements across many of the hottest topics in tech, including AI, robotics, autonomous cars and more. Clad in a Las Vegas-glitz style version of his trademark black leather jacket, the tech industry leader worked his way through the company's latest Geforce RTX50 series graphics cards, new Nemotron AI foundation model families and AI blueprints for AI-powered agents. He also touted extensions to the company's Omniverse digital twin/simulation platform that extends AI into the physical world, new safety certifications for its autonomous driving platform, and a new mini desktop size AI supercomputer called Project Digits that's powered by a Grace Blackwell GPU. Needless to say, it was all a lot to take in. One of the most intriguing - but probably least understood - of all the announcements was a set of foundation models and platform capabilities that the company is calling Cosmos. Specifically defined as a set of world foundation models, advance tokenizers, safety guardrails and an advanced video processing pipeline, Cosmos is designed to bring the training capabilities and advanced outcomes of generative AI from the digital world into the physical one. In other words, instead of having GenAI create new digital outputs built from its training across billions of documents, images, and other digital content, Cosmos can help generate new physical actions - let's call them analog outputs - by leveraging data it's been trained on from digitally simulated environments. While the concept is complex, the real-world results are both simple and profound. For applications like robotics, autonomous vehicles, and other mechanical systems, this means that Cosmos can help these systems react to physical stimuli in more accurate, safe, and helpful
[10]
NVIDIA Launches Cosmos World Foundation Model Platform to Accelerate Physical AI Development
Physical AI models are costly to develop, and require vast amounts of real-world data and testing. Cosmos world foundation models, or WFMs, offer developers an easy way to generate massive amounts of photoreal, physics-based synthetic data to train and evaluate their existing models. Developers can also build custom models by fine-tuning Cosmos WFMs. Cosmos models will be available under an open model license to accelerate the work of the robotics and AV community. Developers can preview the first models on the NVIDIA API catalog, or download the family of models and fine-tuning framework from the NVIDIA NGC™ catalog or Hugging Face. Leading robotics and automotive companies, including 1X, Agile Robots, Agility, Figure AI, Foretellix, Fourier, Galbot, Hillbot, IntBot, Neura Robotics, Skild AI, Virtual Incision, Waabi and XPENG, along with ridesharing giant Uber, are among the first to adopt Cosmos. "The ChatGPT moment for robotics is coming. Like large language models, world foundation models are fundamental to advancing robot and AV development, yet not all developers have the expertise and resources to train their own," said Jensen Huang, founder and CEO of NVIDIA. "We created Cosmos to democratize physical AI and put general robotics in reach of every developer." Open World Foundation Models to Accelerate the Next Wave of AI NVIDIA Cosmos' suite of open models means developers can customize the WFMs with datasets, such as video recordings of AV trips or robots navigating a warehouse, according to the needs of their target application. Cosmos WFMs are purpose-built for physical AI research and development, and can generate physics-based videos from a combination of inputs, like text, image and video, as well as robot sensor or motion data. The models are built for physically based interactions, object permanence, and high-quality generation of simulated industrial environments -- like warehouses or factories -- and of driving environments, including various road conditions. In his opening keynote at CES, NVIDIA founder and CEO Jensen Huang showcased ways physical AI developers can use Cosmos models, including for: Advanced World Model Development Tools Building physical AI models requires petabytes of video data and tens of thousands of compute hours to process, curate and label that data. To help save enormous costs in data curation, training and model customization, Cosmos features: World's Largest Physical AI Industries Adopt Cosmos Pioneers across the physical AI industry are already adopting Cosmos technologies. 1X, an AI and humanoid robot company, launched the 1X World Model Challenge dataset using Cosmos Tokenizer. XPENG will use Cosmos to accelerate the development of its humanoid robot. And Hillbot and Skild AI are using Cosmos to fast-track the development of their general-purpose robots. "Data scarcity and variability are key challenges to successful learning in robot environments," said Pras Velagapudi, chief technology officer at Agility. "Cosmos' text-, image- and video-to-world capabilities allow us to generate and augment photorealistic scenarios for a variety of tasks that we can use to train models without needing as much expensive, real-world data capture." Transportation leaders are also using Cosmos to build physical AI for AVs: "Generative AI will power the future of mobility, requiring both rich data and very powerful compute," said Dara Khosrowshahi, CEO of Uber. "By working with NVIDIA, we are confident that we can help supercharge the timeline for safe and scalable autonomous driving solutions for the industry." Developing Open, Safe and Responsible AI NVIDIA Cosmos was developed in line with NVIDIA's trustworthy AI principles, which prioritize privacy, safety, security, transparency and reducing unwanted bias. Trustworthy AI is essential for fostering innovation within the developer community and maintaining user trust. NVIDIA is committed to safe and trustworthy AI, in line with the White House's voluntary AI commitments and other global AI safety initiatives. The open Cosmos platform includes guardrails designed to mitigate harmful text and images, and features a tool to enhance text prompts for accuracy. Videos generated with Cosmos autoregressive and diffusion models on the NVIDIA API catalog include invisible watermarks to identify AI-generated content, helping reduce the chances of misinformation and misattribution. NVIDIA encourages developers to adopt trustworthy AI practices and further enhance guardrail and watermarking solutions for their applications. Availability Cosmos WFMs are now available under NVIDIA's open model license on Hugging Face and the NVIDIA NGC catalog. Cosmos models will soon be available as fully optimized NVIDIA NIM microservices. Developers can access NVIDIA NeMo Curator for accelerated video processing and customize their own world models with NVIDIA NeMo. NVIDIA DGX Cloud offers a fast and easy way to deploy these models, with enterprise support available through the NVIDIA AI Enterprise software platform.
[11]
Nvidia launches new AI development tools for autonomous robots and vehicles - SiliconANGLE
Nvidia launches new AI development tools for autonomous robots and vehicles Nvidia Corp. today announced the launch of new tools that will advance the development of physical artificial intelligence models, such as models that power self-driving cars, warehouse and humanoid robots. World foundation models, or WFMs, assist engineers and developers by generating and simulating virtual worlds as well as their physical interactions so that robots can be trained in various scenarios. Nivida announced today at CES 2025 that it's making available the first family of Cosmos WFMs for physics-based simulation and synthetic data generation. Alongside these AI foundation models, the company also provided tokenizers, guardrails and customization capabilities for AI models so that developers can fine-tune models to suit their needs. "Physical AI will revolutionize the $50 trillion manufacturing and logistics industries," said Jensen Huang (pictured, below), co-founder and chief executive of Nvidia. "Everything that moves -- from cars and trucks to factories and warehouses -- will be robotic and embodied by AI." Cosmos is a set of world foundation models that are trained on more than 9 quadrillion tokens from 20 million hours of real-world human interactions, environment, industrial, robotics and driving data. This allows the model family to provide a large variety of simulation data optimized for real-time, low-latency inference that can be distilled into custom models. Developers can use Cosmos to generate entire virtual worlds from text or video prompts. It will allow robotics developers and engineers to generate and augment their synthetic data to test and debug their AI models before they are deployed in the real world by rapidly generating virtual environments based on their own needs. "Today's AV developers need to drive millions of miles. Even more resource intensive is processing, filtering and labeling the thousands of petabytes of data capture," said Rev Lebaredian, vice president of Omniverse and simulation at Nvidia. "And physical testing is dangerous. Humanoid developers have a lot to lose when one robot prototype can cost hundreds of thousands of dollars." In the end, engineers and developers ultimately discover that it doesn't matter how much real-world data they collect. They still need to augment that data with additional synthetic data to train and fine-tune their AI model for the "last mile" to cover edge cases and eventualities for rigor and safety. Cosmos can be paired readily with Nvidia Omniverse, the company's real-time 3D graphics collaboration and simulation platform that allows artists, developers and enterprises to build realistic 3D models and scenes of factories, cities and other spaces using fully realized physics. With this tool, companies can develop digital twins that can simulate real-world environments to train robots in scenarios easier than putting their physical counterparts through an actual robot boot camp. Today developers can preview the first Cosmos WFM family of models from the NGC catalog and Hugging Face. Omniverse, Nvidia's digital twin simulation and collaboration platform, has been expanded with four new blueprints to accelerate industrial and robotics workflows including developing and training physical AI models. Mega, powered by Omniverse Sensor RTX application programming interfaces, will help robotics and AI engineers develop and test physical AI robot fleets at large scale before deployment into real-world facilities. Mega provides enterprises with a digital twin capability by simulating robot behavior at scale in virtual worlds using sensor data across complex scenarios. In warehouses, distribution and factories, autonomous mobile robots, robotic arms and humanoids can work alongside people, move through aisles and interact with each other. It provides a framework that allows software-defined capabilities across a virtual environment for sensor and robot autonomy for testing and training. Supply chain solutions company KION Group and consulting firm Accenture Plc partnered with Nvidia to become the first to adopt Mega for optimizing operations in retail, customer packaged goods and more. Powered by Omniverse Sensor RTX, autonomous vehicle simulation will allow AV developers to replay driving data, generate new ground-truth data and perform testing to develop better AI models. Nvidia also released a reference workflow blueprint for real-time digital twins for Computer Aided Engineering, or CAE, built on Nvidia CUDA-X acceleration, physics AI and Omniverse libraries that allow for real-time physics visualization. Isaac GR00T, Nvidia's humanoid robot AI learning model, gets a blueprint that allows users to put on an Apple Vision Pro and demonstrate tasks. Acquiring new skills for humanoid robots is done by observing and mimicking human demonstrations. Collecting these requires numerous captures of high-quality datasets.
[12]
Nvidia's 3-computer solution for mobile autonomy
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Nvidia is betting on three types of computers for its autonomous mobility vision, and it has created a platform to make it a reality with the Cosmos World Foundation Models. Jensen Huang, CEO of Nvidia, pointed this out in an opening keynote speech at CES 2025, the big tech trade show in Las Vegas this week. Transportation leads the way So far, transportation industry leaders are among first to adopt the Cosmos platform. You may have heard about the Three Body Problem. But this is the three computer solution. "Instead of a Three Body Problem, we have a three computer solution," Huang said. Autonomous vehicle (AV) development is made possible by three distinct computers: Nvidia DGX systems for training the AI-based stack in the data center, Nvidia Omniverse running on Nvidia OVX systems for simulation and synthetic data generation, and the Nvidia AGX in-vehicle computer to process real-time sensor data for safety. Together, these purpose-built, full-stack systems enable continuous development cycles, speeding improvements in performance and safety. A good example is the the digital twin concept made possible by Omniverse. Engineers use this metaverse-like tech to create hyper-realistic simulations of factories. They perfect the design in the virtual space of the Omniverse. When it is close to perfect, they build the factory in real life, outfitted with sensors. Those sensors collect real world data that is fed back into the virtual model, improving it with actual data. Then the digital twin design is improved and a feedback cycle continues. Nvidia's Rev Lebaredian has explained this to me numerous times. At the CES trade show, Nvidia today announced a new part of the equation: Nvidia Cosmos, a platform comprising state-of-the-art generative world foundation models (WFMs), advanced tokenizers, guardrails and an accelerated video processing pipeline built to advance the development of physical AI systems such as AVs and robots. "The AV data factory flywheel consists of fleet data collection, accurate 4D reconstruction and AI to generate scenes and traffic variations for training and closed-loop evaluation," said Sanja Fidler, vice president of AI research at Nvidia, in a statement. "Using the Nvidia Omniverse platform, as well as Cosmos and supporting AI models, developers can generate synthetic driving scenarios to amplify training data by orders of magnitude." With Cosmos added to the three-computer solution, developers gain a data flywheel that can turn thousands of human-driven miles into billions of virtually driven miles -- amplifying training data quality. "Developing physical AI models has traditionally been resource-intensive and costly for developers, requiring acquisition of real-world datasets and filtering, curating and preparing data for training," said Norm Marks, vice president of automotive at Nvidia, in a statement. "Cosmos accelerates this process with generative AI, enabling smarter, faster and more precise AI model development for autonomous vehicles and robotics." Transportation leaders are using Cosmos to build physical AI for AVs, including: ● Waabi, a company pioneering generative AI for the physical world, will use Cosmos for the search and curation of video data for AV software development and simulation. ● Wayve, which is developing AI foundation models for autonomous driving, is evaluating Cosmos as a tool to search for edge and corner case driving scenarios used for safety and validation. ● AV toolchain provider Foretellix will use Cosmos, alongside Nvidia Omniverse Sensor RTX APIs, to evaluate and generate high-fidelity testing scenarios and training data at scale. In addition, ridesharing giant Uber is partnering with Nvidia to accelerate autonomous mobility. Rich driving datasets from Uber, combined with the features of the Cosmos platform and Nvidia DGX Cloud, will help AV partners build stronger AI models even more efficiently. Availability Cosmos WFMs are now available under an open model license on Hugging Face and the Nvidia NGC catalog. Cosmos models will soon be available as fully optimized Nvidia NIM microservices.
[13]
NVIDIA Makes Cosmos World Foundation Models Openly Available to Physical AI Developer Community
State-of-the-art models trained on millions of hours of driving and robotics videos to democratize physical AI development, available under open model license. NVIDIA Cosmos, a platform for accelerating physical AI development, introduces a family of world foundation models -- neural networks that can predict and generate physics-aware videos of the future state of a virtual environment -- to help developers build next-generation robots and autonomous vehicles (AVs). World foundation models, or WFMs, are as fundamental as large language models. They use input data, including text, image, video and movement, to generate and simulate virtual worlds in a way that accurately models the spatial relationships of objects in the scene and their physical interactions. Announced today at CES, NVIDIA is making available the first wave of Cosmos WFMs for physics-based simulation and synthetic data generation -- plus state-of-the-art tokenizers, guardrails, an accelerated data processing and curation pipeline, and a framework for model customization and optimization. Researchers and developers, regardless of their company size, can freely use the Cosmos models under NVIDIA's permissive open model license that allows commercial usage. Enterprises building AI agents can also use new open NVIDIA Llama Nemotron and Cosmos Nemotron models, unveiled at CES. The openness of Cosmos' state-of-the-art models unblocks physical AI developers building robotics and AV technology and enables enterprises of all sizes to more quickly bring their physical AI applications to market. Developers can use Cosmos models directly to generate physics-based synthetic data, or they can harness the NVIDIA NeMo framework to fine-tune the models with their own videos for specific physical AI setups. Physical AI leaders -- including robotics companies 1X, Agility Robotics and XPENG, and AV developers Uber and Waabi -- are already working with Cosmos to accelerate and enhance model development. Developers can preview the first Cosmos autoregressive and diffusion models on the NVIDIA API catalog, and download the family of models and fine-tuning framework from the NVIDIA NGC catalog and Hugging Face. Cosmos world foundation models are a suite of open diffusion and autoregressive transformer models for physics-aware video generation. The models have been trained on 9,000 trillion tokens from 20 million hours of real-world human interactions, environment, industrial, robotics and driving data. The models come in three categories: Nano, for models optimized for real-time, low-latency inference and edge deployment; Super, for highly performant baseline models; and Ultra, for maximum quality and fidelity, best used for distilling custom models. When paired with NVIDIA Omniverse 3D outputs, the diffusion models generate controllable, high-quality synthetic video data to bootstrap training of robotic and AV perception models. The autoregressive models predict what should come next in a sequence of video frames based on input frames and text. This enables real-time next-token prediction, giving physical AI models the foresight to predict their next best action. Developers can use Cosmos' open models for text-to-world and video-to-world generation. Versions of the diffusion and autoregressive models, with between 4 and 14 billion parameters each, are available now on the NGC catalog and Hugging Face. Also available are a 12-billion-parameter upsampling model for refining text prompts, a 7-billion-parameter video decoder optimized for augmented reality, and guardrail models to ensure responsible, safe use. To demonstrate opportunities for customization, NVIDIA is also releasing fine-tuned model samples for vertical applications, such as generating multisensor views for AVs. Cosmos world foundation models can enable synthetic data generation to augment training datasets, simulation to test and debug physical AI models before they're deployed in the real world, and reinforcement learning in virtual environments to accelerate AI agent learning. Developers can generate massive amounts of controllable, physics-based synthetic data by conditioning Cosmos with composed 3D scenes from NVIDIA Omniverse. Waabi, a company pioneering generative AI for the physical world, starting with autonomous vehicles, is evaluating the use of Cosmos for the search and curation of video data for AV software development and simulation. This will further accelerate the company's industry-leading approach to safety, which is based on Waabi World, a generative AI simulator that can create any situation a vehicle might encounter with the same level of realism as if it happened in the real world. In robotics, WFMs can generate synthetic virtual environments or worlds to provide a less expensive, more efficient and controlled space for robot learning. Embodied AI startup Hillbot is boosting its data pipeline by using Cosmos to generate terabytes of high-fidelity 3D environments. This AI-generated data will help the company refine its robotic training and operations, enabling faster, more efficient robotic skilling and improved performance for industrial and domestic tasks. In both industries, developers can use NVIDIA Omniverse and Cosmos as a multiverse simulation engine, allowing a physical AI policy model to simulate every possible future path it could take to execute a particular task -- which in turn helps the model select the best of these paths. Data curation and the training of Cosmos models relied on thousands of NVIDIA GPUs through NVIDIA DGX Cloud, a high-performance, fully managed AI platform that provides accelerated computing clusters in every leading cloud. Developers adopting Cosmos can use DGX Cloud for an easy way to deploy Cosmos models, with further support available through the NVIDIA AI Enterprise software platform. In addition to foundation models, the Cosmos platform includes a data processing and curation pipeline powered by NVIDIA NeMo Curator and optimized for NVIDIA data center GPUs. Robotics and AV developers collect millions or billions of hours of real-world recorded video, resulting in petabytes of data. Cosmos enables developers to process 20 million hours of data in just 40 days on NVIDIA Hopper GPUs, or as little as 14 days on NVIDIA Blackwell GPUs. Using unoptimized pipelines running on a CPU system with equivalent power consumption, processing the same amount of data would take over three years. The platform also features a suite of powerful video and image tokenizers that can convert videos into tokens at different video compression ratios for training various transformer models. The Cosmos tokenizers deliver 8x more total compression than state-of-the-art methods and 12x faster processing speed, which offers superior quality and reduced computational costs in both training and inference. Developers can access these tokenizers, available under NVIDIA's open model license, via Hugging Face and GitHub. Developers using Cosmos can also harness model training and fine-tuning capabilities offered by NeMo framework, a GPU-accelerated framework that enables high-throughput AI training. Now available to developers under the NVIDIA Open Model License Agreement, Cosmos was developed in line with NVIDIA's trustworthy AI principles, which include nondiscrimination, privacy, safety, security and transparency. The Cosmos platform includes Cosmos Guardrails, a dedicated suite of models that, among other capabilities, mitigates harmful text and image inputs during preprocessing and screens generated videos during postprocessing for safety. Developers can further enhance these guardrails for their custom applications. Cosmos models on the NVIDIA API catalog also feature an inbuilt watermarking system that enables identification of AI-generated sequences. NVIDIA Cosmos was developed by NVIDIA Research. Read the research paper, "Cosmos World Foundation Model Platform for Physical AI," for more details on model development and benchmarks. Model cards providing additional information are available on Hugging Face.
[14]
NVIDIA Enhances Three Computer Solution for Autonomous Mobility With Cosmos World Foundation Models
Transportation industry leaders are among first to adopt the Cosmos platform. Autonomous vehicle (AV) development is made possible by three distinct computers: NVIDIA DGX systems for training the AI-based stack in the data center, NVIDIA Omniverse running on NVIDIA OVX systems for simulation and synthetic data generation, and the NVIDIA AGX in-vehicle computer to process real-time sensor data for safety. Together, these purpose-built, full-stack systems enable continuous development cycles, speeding improvements in performance and safety. At the CES trade show, NVIDIA today announced a new part of the equation: NVIDIA Cosmos, a platform comprising state-of-the-art generative world foundation models (WFMs), advanced tokenizers, guardrails and an accelerated video processing pipeline built to advance the development of physical AI systems such as AVs and robots. With Cosmos added to the three-computer solution, developers gain a data flywheel that can turn thousands of human-driven miles into billions of virtually driven miles -- amplifying training data quality. "The AV data factory flywheel consists of fleet data collection, accurate 4D reconstruction and AI to generate scenes and traffic variations for training and closed-loop evaluation," said Sanja Fidler, vice president of AI research at NVIDIA. "Using the NVIDIA Omniverse platform, as well as Cosmos and supporting AI models, developers can generate synthetic driving scenarios to amplify training data by orders of magnitude." "Developing physical AI models has traditionally been resource-intensive and costly for developers, requiring acquisition of real-world datasets and filtering, curating and preparing data for training," said Norm Marks, vice president of automotive at NVIDIA. "Cosmos accelerates this process with generative AI, enabling smarter, faster and more precise AI model development for autonomous vehicles and robotics." Transportation leaders are using Cosmos to build physical AI for AVs, including:
[15]
Nvidia releases its own brand of world models | TechCrunch
Nvidia is getting into world models -- AI models that take inspiration from the mental models of the world that humans develop naturally. At the Consumer Electronics Show in Las Vegas, the company announced that it is making openly available a family of world models that can predict and generate "physics-aware" videos. Nvidia's calling this family Cosmos World Foundation Models, or Cosmos WFM for short. The models, which can be fine-tuned for specific applications, are available from Nvidia's API and NGC catalogs and the AI developer platform Hugging Face. "Nvidia is making available the first wave of Cosmos WFMs for physics-based simulation and synthetic data generation," the company wrote in a blog post provided to TechCrunch. "Researchers and developers, regardless of their company size, can freely use the Cosmos models under Nvidia's permissive open model license that allows commercial usage." There are a number of models in the Cosmos WFM family, divided into three categories: Nano for low latency and real-time applications; Super for "highly performant baseline" models; and Ultra for maximum quality and fidelity output. The models range in size from 4 billion to 14 billion parameters, with Nano being the smallest and Ultra being the largest. Parameters roughly correspond to a model's problem-solving skills, and models with more parameters generally perform better than those with fewer parameters. As a part of Cosmos WFM, Nvidia is also releasing an "upsampling model," a video decoder optimized for augmented reality, and guardrail models to ensure responsible use, as well as fine-tuned models for applications like generating sensor data for autonomous vehicle development. These, as well as the other Cosmos WFM models, were trained on 9,000 trillion tokens from 20 million hours of real-world human interactions, environment, industrial, robotics, and driving data, Nvidia claimed. (In AI, "tokens" represent bits of raw data -- in this case, video footage.) Nvidia wouldn't say where this training data came from, but at least one report -- and lawsuit -- alleges that the company trained on copyrighted YouTube videos without permission. We've reached out to Nvidia's press team for comment and will update this piece if we hear back. Nvidia claimed that Cosmos WFM models, given text or video frames, can generate "controllable, high-quality" synthetic data to bootstrap the training of models for robotics, driverless cars, and more. "Nvidia Cosmos' suite of open models means developers can customize the WFMs with data sets, such as video recordings of autonomous vehicle trips or robots navigating a warehouse, according to the needs of their target application," Nvidia wrote in a press release. "Cosmos WFMs are purpose-built for physical AI research and development, and can generate physics-based videos from a combination of inputs, like text, image and video, as well as robot sensor or motion data." Nvidia said that companies including Waabi, Wayve, Fortellix, and Uber have already committed to piloting Cosmos WFMs for various use cases, from video search and curation to building AI models for self-driving vehicles. Important to note is that Nvidia's world models aren't "open source" in the strictest sense. To abide by one widely accepted definition of "open source" AI, an AI model has to provide enough information about its design so that a person could "substantially" recreate it, and disclose any pertinent details about its training data, including the provenance and how the data can be obtained or licensed. Nvidia hasn't published Cosmos WFM training data details, nor has it made available all the tools needed to recreate the models from scratch. That's probably why the tech giant is referring to the models as "open" as opposed to open source.
Share
Share
Copy Link
Nvidia introduces Cosmos, a suite of world foundation models designed to bring generative AI capabilities to robotics and autonomous vehicles, potentially revolutionizing the development of physical AI systems.
At CES 2025, Nvidia CEO Jensen Huang unveiled Cosmos, a groundbreaking AI platform designed to revolutionize the development of physical AI systems such as robots and autonomous vehicles. This announcement marks a significant leap forward in bringing generative AI capabilities to the realm of physical world interaction [1][2].
Cosmos is a comprehensive suite that includes world foundation models (WFMs), advanced tokenizers, safety guardrails, and an accelerated video processing pipeline. These components work together to enable the training and development of physical AI systems more efficiently and cost-effectively than ever before [3].
Key features of Cosmos include:
The introduction of Cosmos addresses two major challenges in developing physical AI systems:
By leveraging Cosmos, developers can generate synthetic data and simulate various scenarios, significantly reducing the time and resources required for training and testing [1][3].
Nvidia has made Cosmos WFMs available under an open model license, encouraging widespread adoption and experimentation in the robotics and autonomous vehicle communities. Developers can access these models through the Nvidia API catalog, NGC catalog, or Hugging Face [3][5].
Several leading companies in robotics and autonomous vehicles have already adopted Cosmos, including 1X, Agile Robots, Figure AI, Waabi, XPENG, and Uber [3][4].
Jensen Huang declared 2025 as the "year of robotics," highlighting the potential impact of Cosmos on the industry [4]. The platform's ability to understand and interact with the physical world could lead to significant advancements in:
Nvidia has developed Cosmos in line with its trustworthy AI principles, prioritizing privacy, safety, security, and transparency. The platform includes guardrails to mitigate harmful content and features invisible watermarks to identify AI-generated content [4].
The introduction of Nvidia's Cosmos platform represents a significant milestone in the convergence of AI, simulation, and robotics. By providing developers with powerful tools to create more sophisticated physical AI systems, Cosmos has the potential to accelerate innovation across industries and reshape how we interact with technology in our daily lives.
Reference
[3]
NVIDIA announces new generative AI models and blueprints for Omniverse, expanding its integration into physical AI applications like robotics and autonomous vehicles. The company also introduces early access to Omniverse Cloud Sensor RTX for smarter autonomous machines.
4 Sources
Nvidia introduces Isaac GR00T Blueprint at CES 2025, revolutionizing humanoid robotics development through synthetic data generation and imitation learning, leveraging Apple Vision Pro for motion capture.
3 Sources
NVIDIA introduces a three-computer solution to advance physical AI and robotics, combining training, simulation, and runtime systems to revolutionize industries from manufacturing to smart cities.
2 Sources
At CES 2025, Nvidia CEO Jensen Huang introduced the concept of "Agentic AI," forecasting a multi-trillion dollar shift in work and industry. The company unveiled new AI technologies, GPUs, and partnerships, positioning Nvidia at the forefront of the AI revolution.
37 Sources
Nvidia is pioneering spatial AI and the Omniverse platform, aiming to bring AI into the physical world through digital twins, robotics, and intelligent spaces. This technology could revolutionize industries from manufacturing to urban planning.
2 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved