XDOF Raises $70M for Robot Training Data Collection

XDOF Tackles the Robot Training Data Shortage

A new infrastructure company called XDOF has emerged from stealth with $70 million in funding to address what co-founder and CEO Philippe Wu calls the next great bottleneck in AI robotics: the scarcity of robot training data that captures physical interactions1

. The timing couldn't be more critical. Just two weeks before XDOF's announcement, OpenAI revealed plans to relaunch the robotics program it shuttered in 2021, signaling an industry-wide race among frontier AI labs to master physical AI systems1

The robotic teleoperation data startup raised funds from Thrive Capital, Spark Capital, a16z, Lux, and WndrCo to build the data pipelines, collection tools, and data annotation systems that AI companies struggle to develop in-house1

. With about 60 employees, XDOF is already working with 20 customers including several frontier AI labs, though Wu cannot name them publicly1

Why Physical AI Needs Different Data Infrastructure

Unlike large language models trained on vast quantities of publicly available text scraped from the internet, robots require high-quality training data that captures nuanced physical interactions—and that kind of data barely exists1

. YouTube videos and footage captured by gig workers are low-fidelity and hard to reconcile with the spatial requirements robots need to navigate the physical world2

Source: SiliconANGLE

Philippe Wu encountered this challenge firsthand as a PhD student at UC Berkeley, where his research focused on enabling robots to learn skills from large-scale datasets. "We didn't have large-scale data to work with," Wu told TechCrunch. "There was this chicken-and-egg problem -- we first needed to actually collect data before we could even ask how to train a foundation model for robotics"1

From Academic Research to Commercial Infrastructure

Wu and future XDOF co-founder and CTO Fred Shentu worked on a project called GELLO, a low-cost teleoperation system that lets human operators control robotic arms to generate training data1

. "It ended up becoming a very influential paper in robotics, because a lot of people had similar needs and bottlenecks, and many started leveraging this type of device for data collection," Wu said1

Source: TechCrunch

Spotting the commercial opportunity, Wu, Shentu, and third co-founder and Chief Operating Officer Nemo Jin launched XDOF in October 20241

. Mindful that data provision alone can be a dead-end business, the company also focuses on data cleaning, tooling, and annotation—creating a self-reinforcing feedback loop for robot trainers1

The ABC Dataset: 130,000 Trajectories of Robot Manipulation Data

Alongside its funding announcement, XDOF partnered with UC Berkeley's AI Research lab to release the ABC dataset, which the company believes is the largest collection of high-quality robot training data ever assembled1

. The robot manipulation dataset includes 130,000 trajectories of robot manipulation data, 300 hours of simulation, and 100 hours of evaluations—a scale of pre-training data never before available to academia1

The team has already used the data to train robots on benchmark tasks requiring extreme precision, like folding T-shirts, flattening boxes, and loading AirPods into their cases1

. David McAllister, a Berkeley PhD student who helped organize the release, told TechCrunch: "We've seen in language, image generation, and other fields, that when models and data are released, the community achieves things that you wouldn't necessarily have expected"1

Building a Three-Tier Robotics Data Infrastructure

XDOF plans to scale its robotics data infrastructure across three tiers of a data pyramid1

. The most valuable tier is teleoperation data collected on the actual robot being deployed. Next comes teleoperated robots gathering more general data, similar to GELLO. Finally, "egocentric" data gathered by humans performing everyday tasks, for which XDOF plans to build its own wearable sensors to ensure compatibility with hand-tracking algorithms1

"Your camera choice is going to affect the quality of your data -- which is going to affect how your hand-tracking algorithm performs," Wu said. "If you don't design the hardware well from the start, the data you collect might have very specific problems that you didn't anticipate"1

Why AI Labs Are Outsourcing Data Collection

The company plans to hire and train armies of teleoperators and egocentric data operators around the world—a labor-intensive model that explains why major labs prefer outsourcing this work1

. "You need a warehouse of hundreds of thousands of square feet with hundreds of robots," Wu explained. "You need to maintain these robots, calibrate their physical parameters, and properly train operators"1

This build-out requires focus, capital, and operational scale that most AI labs would rather outsource—which is precisely the market XDOF is betting on1

. "All of the top labs are trying to pursue robotics," Wu said. "We've already seen some of the downfalls of falling a little bit behind in the language model race ... you don't want to be in this type of situation where you pursue this technology too late, and everyone is in this boat where physical AI is the next frontier"1

XDOF raises $70 million to solve AI robotics' dirty data problem as labs race to build physical AI

XDOF Tackles the Robot Training Data Shortage

Why Physical AI Needs Different Data Infrastructure

From Academic Research to Commercial Infrastructure

The ABC Dataset: 130,000 Trajectories of Robot Manipulation Data

Building a Three-Tier Robotics Data Infrastructure

Why AI Labs Are Outsourcing Data Collection

References

Collecting robot training data is dirty, unglamorous work. Some AI labs are already paying XDOF to do it

Robotic teleoperation data startup XDOF launches with $70M in funding

Related Stories

Antioch raises $8.5 million to build simulation tools that accelerate physical AI development

FieldAI Secures $405M Funding to Revolutionize Robotics with Universal AI Brains

Dexterity Secures $95 Million Funding, Reaching $1.65 Billion Valuation for AI-Powered Industrial Robots

Recent Highlights

OpenAI releases GPT-5.6 models after government review, unveils ChatGPT Work to compete in AI agent race

US-China AI tensions reach new heights as both nations move to restrict each other's models

OpenAI proposes giving US government 5% stake worth $42.6 billion amid regulatory pressure

Recent Highlights

Today's Top Stories

OpenAI launches ChatGPT Work, an AI agent designed to automate entire workflows across apps

Anthropic launches Claude Reflect to track your AI habits and help you use it more mindfully

Apple in talks with PrismML to run massive AI models directly on iPhone without cloud servers

Microsoft deploys AI security to accelerate Windows vulnerability discovery and patch releases