Spirit AI overtakes Nvidia on RoboArena benchmark as physical AI competition intensifies

2 Sources

Share

Chinese robotics startup Spirit AI has claimed the top spot on the RoboArena leaderboard with its Spirit v1.6 model scoring 1,924, edging out Nvidia's Cosmos3-Nano-Policy at 1,881. The achievement marks the first time a Chinese model has led this benchmark, co-developed by Nvidia with Stanford and UC Berkeley, and signals intensifying competition in physical AI as Chinese firms dominate multiple robotics benchmarks.

Spirit AI Claims Top Position on RoboArena Robotics Benchmark

Nvidia's dominance in physical AI lasted exactly two days. On Wednesday, Spirit AI, a Chinese robotics startup based in Hangzhou, announced that its Spirit v1.6 model had scored 1,924 on the RoboArena robotics benchmark, surpassing Nvidia's Cosmos3-Nano-Policy at 1,881

1

. This marks the first time a Chinese model has claimed the leading position on RoboArena, a benchmark co-developed by Nvidia with Stanford University and the University of California, Berkeley

2

.

The timing carries particular weight. Nvidia had just launched its Cosmos 3 omnimodel at Computex in Taipei on June 1, branding it the "open frontier foundation model for physical AI" and trained on 20 trillion tokens of multimodal data

1

. Nvidia's DreamZero, another project introduced in February, ranked third with 1,763 points

2

. The result represents a significant milestone in embodied artificial intelligence, where machines must translate perception into real-world actions.

Understanding What Physical AI Measures

Source: VnExpress

Source: VnExpress

The RoboArena benchmark evaluates how effectively a generalist robot policy translates into tangible actions: object manipulation, navigation, tool usage, perception, planning, and adaptability in unfamiliar environments

1

. Unlike benchmarks that test chatbot fluency or image generation quality, RoboArena measures whether a machine can both think and execute.

Physical AI relies on two core capabilities that define its effectiveness. Policy capabilities determine a model's ability to act on what it observes, which is precisely what RoboArena measures. World models, meanwhile, determine a model's ability to simulate and predict outcomes when specific actions are taken

2

. The industry is moving toward integrating both capabilities, with Chinese researchers introducing a unified "Policy World Model" architecture last September that merges world modelling and trajectory planning into a single system

1

.

Chinese Firms Dominate Multiple AI Rankings

The AI development from Chinese robotics startup Spirit AI represents just one data point in a broader pattern. Across the ecosystem of physical AI benchmarks, Chinese firms hold leading positions in nearly every category. On the WorldArena benchmark, which evaluates embodied world models, Manifold AI's WorldScape-0.2 claims the top spot, outperforming Nvidia's Cosmos-Predict 2.5 in the policy evaluator track

1

.

AgiBot, one of China's largest robotics firms, leads the perception track with its GenieEnvisioner-Sim2.0-2B model unveiled last week

2

. DexForce, another Chinese startup, tops the data engines track with DSCFuncWorld, a platform designed to optimize robot training data pipelines

2

. On the WorldScore benchmark, which tests a model's ability to generate worlds from text prompts, Manifold AI's WorldScape-0.2 again leads, outperforming WonderJourney, a joint project from Stanford and Google

1

.

Massive Capital Fuels Rapid Development

These technical achievements are backed by extraordinary capital flows. Spirit AI announced a 1.5 billion yuan ($222 million) financing round on Wednesday, its fourth in just three months—reportedly the most aggressive fundraising pace seen in the embodied AI sector

1

. Earlier rounds had already pushed the company's valuation past 10 billion yuan ($1.4 billion). XYZ Embodied AI, incubated by the Beijing Academy of Artificial Intelligence, closed its pre-A round after raising 1 billion yuan in just 10 months to develop "embodied brains" and world models

1

.

Manifold AI has completed five funding rounds in 10 months, with its latest in April securing reportedly hundreds of millions of yuan. The broader Chinese robotics sector attracted $3.4 billion in venture funding in 2025 alone, 42 per cent more than the United States, with that gap appearing to widen in 2026

1

.

Nvidia's Response and Strategic Shift

Nvidia, currently the world's most valuable company with a market capitalization exceeding $5 trillion

2

, is adapting its strategy. At Computex, CEO Jensen Huang announced a partnership with Chinese robotics firm Unitree, which is preparing a $7 billion IPO, and Singaporean robotic hand maker Sharpa to build a humanoid robot reference design

1

. The platform combines Unitree's H2 Plus humanoid body, Sharpa's Wave tactile hands, and Nvidia's Jetson AGX Thor T5000 processor.

Huang also launched the Cosmos Coalition, enlisting AI labs including Agile Robots, Black Forest Labs, Runway, and Skild AI to advance open world models

1

. The message is clear: Nvidia wants to position itself as the infrastructure layer for the entire physical AI ecosystem, even if individual models lose benchmark crowns. However, Huang himself identified the sector's fundamental bottleneck at Computex: "For robotic systems and physical AI, data is the hardest problem"

1

. That admission points to why China may hold a structural advantage, with Scale AI founder Alexandr Wang reportedly stating last year that China was "fundamentally very well positioned on data" and that many US companies relied on Chinese data to train robotics foundation models

1

.

Today's Top Stories

© 2026 TheOutpost.AI All rights reserved