2 Sources
[1]
Collecting robot training data is dirty, unglamorous work. Some AI labs are already paying XDOF to do it
Two weeks ago, OpenAI said it would relaunch the robotics program it shuttered in 2021 -- the latest signal that the biggest AI labs are racing to teach machines to operate in the physical world. But building capable robots requires something the AI industry doesn't yet have, which is the training data to match that used for language models. That gap is creating a new kind of infrastructure business. Unlike LLMs that were trained on a vast sea of publicly available text, robots need data that captures physical interaction, and that kind of data barely exists. YouTube videos and footage captured by gig workers are low-fidelity and hard to reconcile with the physical world. XDOF (pronounced "ecks-doff"), emerging from stealth today, is betting that the next great bottleneck in AI isn't models or chips, but the data feedback loop needed to teach robots how to interact with the physical world. The startup aims to build the data pipelines, collection tools, and annotation systems that frontier labs and robotics companies can't easily build themselves -- and has raised $70 million from Thrive Capital, Spark Capital, a16z, Lux, and WndrCo to do it. Co-founder and CEO Philippe Wu says XDOF, which has about 60 employees, is already working with 20 customers including several frontier AI labs, but cannot name them. "All of the top labs are trying to pursue robotics," Wu said. "We've already seen some of the downfalls of falling a little bit behind in the language model race ... you don't want to be in this type of situation where you pursue this technology too late, and everyone is in this boat where physical AI is the next frontier." Wu ran into this problem himself as a PhD student at UC Berkeley. His focus was on enabling robots to learn skills from large-scale data sets. There was just one problem. "We didn't have large-scale data to work with," he told TechCrunch. "There was this chicken-and-egg problem -- we first needed to actually collect data before we could even ask how to train a foundation model for robotics." Wu and his future XDOF co-founder and CTO, Fred Shentu, worked on a project called GELLO, a low-cost teleoperation system that lets a human operator control a robotic arm to generate training data. "It ended up becoming a very influential paper in robotics, because a lot of people had similar needs and bottlenecks, and many started leveraging this type of device for data collection," Wu said. Spotting the opportunity, Wu, Shentu, and third co-founder and Chief Operating Officer Nemo Jin launched XDOF in October 2024 to provide a data ecosystem for companies pursuing robotics models. Mindful that data provision alone can be a dead-end business, the company is also focused on data cleaning, tooling, and annotation -- creating a self-reinforcing feedback loop for robot trainers. As a starting point, the company is partnering with UC Berkeley's AI Research lab to release what it believes is the largest collection of high-quality robot training data ever assembled, dubbed ABC. It includes 130,000 trajectories of robot manipulation data, 300 hours of simulation, and 100 hours of evaluations. That kind of scaled-up pre-training data has never been available to academia before. "We've seen in language, image generation, and other fields, that when models and data are released, the community achieves things that you wouldn't necessarily have expected," David McAllister, a Berkeley PhD student who helped organize the release, told TechCrunch. The team has already used the data to train robots on benchmark tasks like folding T-shirts and flattening boxes, or loading AirPods into their cases. Unlimited degrees of freedom The company plans to work across three tiers of a data pyramid. The most valuable tier is teleoperation data collected on the actual robot being deployed; next comes teleoperated robots gathering more general data, as with GELLO; and finally "egocentric" data gathered by humans performing everyday tasks, for which XDOF plans to build its own wearable sensors. "Your camera choice is going to affect the quality of your data -- which is going to affect how your hand-tracking algorithm performs," Wu said. "If you don't design the hardware well from the start, the data you collect might have very specific problems that you didn't anticipate." The company plans to hire and train armies of teleoperators and egocentric data operators around the world -- a labor-intensive model that raises an obvious question: Why aren't the major labs doing this data production work themselves? "You need a warehouse of hundreds of thousands of square feet with hundreds of robots," Wu said. "You need to maintain these robots, calibrate their physical parameters, and properly train operators." It's a build-out that requires focus, capital, and operational scale that most AI labs would rather outsource -- which is precisely the market XDOF is betting on. The name XDOF is a play on the robotics term "degrees of freedom," which describes the number of independent motions a robot can perform. Your arm, from shoulder to wrist, has seven degrees of freedom. Humanoid robotics company Figure.AI's latest robot has 30. The X in the company's name captures its ambition: "Arbitrary degrees of freedom, unlimited degrees of freedom," Wu says.
[2]
Robotic teleoperation data startup XDOF launches with $70M in funding
Robotic teleoperation data startup XDOF launches with $70M in funding Robotics training infrastructure startup XDOF said today it has raised $70 million in funding to try and solve one of the biggest challenges in artificial intelligence: teaching machines the skills they need to safely navigate and work in the real world. The round involved a number of heavyweight venture capitalists, including Thrive Capital, Spark Capital, Andreessen Horowitz, Lux Capital and WndrCo. In addition to the money, the startup also released ABC-130K, which it says is the world's largest open-source bimanual robot manipulation dataset. It will provide robotics researchers with access to an unprecedented amount of high-quality, freely available training data. XDOF's debut comes at a critical juncture, just weeks after OpenAI Group PBC announced it's going to revive its own robotics training program that had been shut down in 2021. That move signified the growing interest in what's known as "physical AI," but frontier model makers face a significant challenge. While large language models can be trained on vast oceans of easily-accessible data from the internet, building intelligent robots requires much more nuanced data that captures very specific, real-world actions and interactions. This data is so scarce that it's essentially non-existent. Some developers have tried to get around this problem by downloading YouTube videos or using low-quality footage captured by factory workers and so on, but this data is virtually impossible to reconcile with the complex spatial requirements of robots. Co-founder and Chief Executive Philipp Wu told TechCrunch in an interview that he experienced this challenge himself while studying as a Ph.D. student at UC Berkeley. "We didn't have large-scale data to work with," he explained. "There was this chicken-and-egg problem -- we first needed to actually collect data before we could even ask how to train a foundation model for robotics." XDOF believes physical AI's biggest hurdle is not the models that actually power the robots or the high-end chips needed for onboard processing, but the data feedback loops needed to teach robots physical interactions. That's why the company is focused on building the highly specialized data pipelines, data collection tools and annotation systems needed to gather this essential training resource. It's an entirely new category of infrastructure, Wu said. The startup traces its roots back to a project called GELLO that Wu worked on alongside a number of other UC Berkeley researchers. With GELLO, they developed a low-cost teleoperation system that enables human operators to control robotic arms and perform various tasks to generate accurate training data. When he teamed up with Chief Technology Officer Fred Shentu and Chief Operating Officer Nemo Jin, Wu realized that simply creating the data itself is a poor business model, and so they also decided to offer data cleaning and annotation services and develop specialized tools, creating a self-reinforcing feedback loop. The ABC-130K dataset is meant to be a showcase of what XDOF can do. It includes 130,000 trajectories of robotic manipulation data, plus 300 hours of simulations and 100 hours of evaluations. The startup has used this dataset itself to train robots on a number of tasks that require extreme precision, such as folding T-shirts, flattening cardboard boxes and putting AirPods into their plastic cases. Although it has been operating under the radar until now, it already has around 20 active customers, including a number of frontier AI labs, and more than 60 employees. Wu said XDOF will scale its business across a three-tier "data pyramid" that includes bespoke teleoperation data that's collected directly from the remote operation of the specific robot being trained. The middle tier includes generalized teleoperational data, similar to what GELLO produced. Finally, it includes "egocentric" data that's gathered by humans performing the everyday tasks that robots need to learn. Of course, creating all of this teleoperation data is going to be a significant undertaking, which is precisely why XDOF needed money. It's going to hire a global arm of teleoperators and data gatherers, and will even develop its own, proprietary wearable sensors to ensure that whatever robots are being trained will match the hand-tracking algorithms it has developed. Because creating this data is such a labor intensive job, XDOF believes that AI labs will be only too happy to outsource it. "You need a warehouse of hundreds of thousands of square feet with hundreds of robots," Wu explained. "You need to maintain these robots, calibrate their physical parameters, and properly train operators."
Share
Copy Link
XDOF emerged from stealth with $70 million to tackle what may be AI robotics' biggest obstacle: the lack of quality training data. Unlike language models trained on internet text, robots need data capturing physical interactions—and that barely exists. The startup is building data pipelines and teleoperation systems that frontier labs are already using, while releasing the largest open-source robot manipulation dataset to date.
A new infrastructure company called XDOF has emerged from stealth with $70 million in funding to address what co-founder and CEO Philippe Wu calls the next great bottleneck in AI robotics: the scarcity of robot training data that captures physical interactions
1
. The timing couldn't be more critical. Just two weeks before XDOF's announcement, OpenAI revealed plans to relaunch the robotics program it shuttered in 2021, signaling an industry-wide race among frontier AI labs to master physical AI systems1
.The robotic teleoperation data startup raised funds from Thrive Capital, Spark Capital, a16z, Lux, and WndrCo to build the data pipelines, collection tools, and data annotation systems that AI companies struggle to develop in-house
1
. With about 60 employees, XDOF is already working with 20 customers including several frontier AI labs, though Wu cannot name them publicly1
.Unlike large language models trained on vast quantities of publicly available text scraped from the internet, robots require high-quality training data that captures nuanced physical interactions—and that kind of data barely exists
1
. YouTube videos and footage captured by gig workers are low-fidelity and hard to reconcile with the spatial requirements robots need to navigate the physical world2
.
Source: SiliconANGLE
Philippe Wu encountered this challenge firsthand as a PhD student at UC Berkeley, where his research focused on enabling robots to learn skills from large-scale datasets. "We didn't have large-scale data to work with," Wu told TechCrunch. "There was this chicken-and-egg problem -- we first needed to actually collect data before we could even ask how to train a foundation model for robotics"
1
.Wu and future XDOF co-founder and CTO Fred Shentu worked on a project called GELLO, a low-cost teleoperation system that lets human operators control robotic arms to generate training data
1
. "It ended up becoming a very influential paper in robotics, because a lot of people had similar needs and bottlenecks, and many started leveraging this type of device for data collection," Wu said1
.
Source: TechCrunch
Spotting the commercial opportunity, Wu, Shentu, and third co-founder and Chief Operating Officer Nemo Jin launched XDOF in October 2024
1
. Mindful that data provision alone can be a dead-end business, the company also focuses on data cleaning, tooling, and annotation—creating a self-reinforcing feedback loop for robot trainers1
.Alongside its funding announcement, XDOF partnered with UC Berkeley's AI Research lab to release the ABC dataset, which the company believes is the largest collection of high-quality robot training data ever assembled
1
. The robot manipulation dataset includes 130,000 trajectories of robot manipulation data, 300 hours of simulation, and 100 hours of evaluations—a scale of pre-training data never before available to academia1
2
.The team has already used the data to train robots on benchmark tasks requiring extreme precision, like folding T-shirts, flattening boxes, and loading AirPods into their cases
1
2
. David McAllister, a Berkeley PhD student who helped organize the release, told TechCrunch: "We've seen in language, image generation, and other fields, that when models and data are released, the community achieves things that you wouldn't necessarily have expected"1
.Related Stories
XDOF plans to scale its robotics data infrastructure across three tiers of a data pyramid
1
2
. The most valuable tier is teleoperation data collected on the actual robot being deployed. Next comes teleoperated robots gathering more general data, similar to GELLO. Finally, "egocentric" data gathered by humans performing everyday tasks, for which XDOF plans to build its own wearable sensors to ensure compatibility with hand-tracking algorithms1
2
."Your camera choice is going to affect the quality of your data -- which is going to affect how your hand-tracking algorithm performs," Wu said. "If you don't design the hardware well from the start, the data you collect might have very specific problems that you didn't anticipate"
1
.The company plans to hire and train armies of teleoperators and egocentric data operators around the world—a labor-intensive model that explains why major labs prefer outsourcing this work
1
. "You need a warehouse of hundreds of thousands of square feet with hundreds of robots," Wu explained. "You need to maintain these robots, calibrate their physical parameters, and properly train operators"1
2
.This build-out requires focus, capital, and operational scale that most AI labs would rather outsource—which is precisely the market XDOF is betting on
1
. "All of the top labs are trying to pursue robotics," Wu said. "We've already seen some of the downfalls of falling a little bit behind in the language model race ... you don't want to be in this type of situation where you pursue this technology too late, and everyone is in this boat where physical AI is the next frontier"1
.Summarized by
Navi
[1]
17 Apr 2026•Startups

21 Aug 2025•Technology

12 Mar 2025•Technology

1
Policy and Regulation

2
Business and Economy

3
Policy and Regulation
