Curated by THEOUTPOST
On Wed, 23 Apr, 12:06 AM UTC
4 Sources
[1]
Robot see, robot do: System learns after watching how-to videos
Cornell University researchers have developed a new robotic framework powered by artificial intelligence -- called RHyME (Retrieval for Hybrid Imitation under Mismatched Execution) -- that allows robots to learn tasks by watching a single how-to video. Robots can be finicky learners. Historically, they've required precise, step-by-step directions to complete basic tasks and tend to call it quits when things go off-script, like after dropping a tool or losing a screw. RHyME, however, could fast-track the development and deployment of robotic systems by significantly reducing the time, energy and money needed to train them, the researchers said. "One of the annoying things about working with robots is collecting so much data on the robot doing different tasks," said Kushal Kedia, a doctoral student in the field of computer science. "That's not how humans do tasks. We look at other people as inspiration." Kedia will present the paper, "One-Shot Imitation under Mismatched Execution," in May at the Institute of Electrical and Electronics Engineers' International Conference on Robotics and Automation, in Atlanta. Home robot assistants are still a long way off because they lack the wits to navigate the physical world and its countless contingencies. To get robots up to speed, researchers like Kedia are training them with what amounts to how-to videos -- human demonstrations of various tasks in a lab setting. The hope with this approach, a branch of machine learning called "imitation learning," is that robots will learn a sequence of tasks faster and be able to adapt to real-world environments. "Our work is like translating French to English -- we're translating any given task from human to robot," said senior author Sanjiban Choudhury, assistant professor of computer science. This translation task still faces a broader challenge, however: Humans move too fluidly for a robot to track and mimic, and training robots with video requires gobs of it. Further, video demonstrations -- of, say, picking up a napkin or stacking dinner plates -- must be performed slowly and flawlessly, since any mismatch in actions between the video and the robot has historically spelled doom for robot learning, the researchers said. "If a human moves in a way that's any different from how a robot moves, the method immediately falls apart," Choudhury said. "Our thinking was, 'Can we find a principled way to deal with this mismatch between how humans and robots do tasks?'" RHyME is the team's answer -- a scalable approach that makes robots less finicky and more adaptive. It supercharges a robotic system to use its own memory and connect the dots when performing tasks it has viewed only once by drawing on videos it has seen. For example, a RHyME-equipped robot shown a video of a human fetching a mug from the counter and placing it in a nearby sink will comb its bank of videos and draw inspiration from similar actions -- like grasping a cup and lowering a utensil. RHyME paves the way for robots to learn multiple-step sequences while significantly lowering the amount of robot data needed for training, the researchers said. RHyME requires just 30 minutes of robot data; in a lab setting, robots trained using the system achieved a more than 50% increase in task success compared to previous methods, the researchers said.
[2]
AI Teaches Robots Tasks from a Single How-To Video
Summary: Researchers have developed RHyME, an AI-powered system that enables robots to learn complex tasks by watching a single human demonstration video. Traditional robots struggle with unpredictable scenarios and require extensive training data, but RHyME allows robots to adapt by drawing on previous video knowledge. This method bridges the gap between human and robotic motion, enabling more flexible and efficient learning through imitation. With just 30 minutes of robot data, RHyME-equipped robots achieved over 50% higher task success than earlier approaches, marking a major step toward smarter, more capable robotic assistants. Key Facts: Source: Cornell University Cornell University researchers have developed a new robotic framework powered by artificial intelligence - called RHyME (Retrieval for Hybrid Imitation under Mismatched Execution) - that allows robots to learn tasks by watching a single how-to video. Robots can be finicky learners. Historically, they've required precise, step-by-step directions to complete basic tasks and tend to call it quits when things go off-script, like after dropping a tool or losing a screw. RHyME, however, could fast-track the development and deployment of robotic systems by significantly reducing the time, energy and money needed to train them, the researchers said. "One of the annoying things about working with robots is collecting so much data on the robot doing different tasks," said Kushal Kedia, a doctoral student in the field of computer science. "That's not how humans do tasks. We look at other people as inspiration." Kedia will present the paper, "One-Shot Imitation under Mismatched Execution," in May at the Institute of Electrical and Electronics Engineers' International Conference on Robotics and Automation, in Atlanta. Home robot assistants are still a long way off because they lack the wits to navigate the physical world and its countless contingencies. To get robots up to speed, researchers like Kedia are training them with what amounts to how-to videos - human demonstrations of various tasks in a lab setting. The hope with this approach, a branch of machine learning called "imitation learning," is that robots will learn a sequence of tasks faster and be able to adapt to real-world environments. "Our work is like translating French to English - we're translating any given task from human to robot," said senior author Sanjiban Choudhury, assistant professor of computer science. This translation task still faces a broader challenge, however: Humans move too fluidly for a robot to track and mimic, and training robots with video requires gobs of it. Further, video demonstrations - of, say, picking up a napkin or stacking dinner plates - must be performed slowly and flawlessly, since any mismatch in actions between the video and the robot has historically spelled doom for robot learning, the researchers said. "If a human moves in a way that's any different from how a robot moves, the method immediately falls apart," Choudhury said. "Our thinking was, 'Can we find a principled way to deal with this mismatch between how humans and robots do tasks?'" RHyME is the team's answer - a scalable approach that makes robots less finicky and more adaptive. It supercharges a robotic system to use its own memory and connect the dots when performing tasks it has viewed only once by drawing on videos it has seen. For example, a RHyME-equipped robot shown a video of a human fetching a mug from the counter and placing it in a nearby sink will comb its bank of videos and draw inspiration from similar actions - like grasping a cup and lowering a utensil. RHyME paves the way for robots to learn multiple-step sequences while significantly lowering the amount of robot data needed for training, the researchers said. RHyME requires just 30 minutes of robot data; in a lab setting, robots trained using the system achieved a more than 50% increase in task success compared to previous methods, the researchers said. Human demonstrations as prompts are a powerful way to program robots to do long-horizon manipulation tasks. However, translating these demonstrations into robot-executable actions presents significant challenges due to execution mismatches in movement styles and physical capabilities. Existing methods for human-robot translation either depend on paired data, which is infeasible to scale, or rely heavily on frame-level visual similarities that often break down in practice. To address these challenges, we propose RHyME, a novel framework that automatically pairs human and robot trajectories using sequence-level optimal transport cost functions. Given long-horizon robot demonstrations, RHyME synthesizes semantically equivalent human videos by retrieving and composing short-horizon human clips. This approach facilitates effective policy training without the need for paired data. RHyME successfully imitates a range of cross-embodiment demonstrators, both in simulation and with a real human hand, achieving over 50% increase in task success compared to previous methods.
[3]
Robots use Cornell's RHyME AI to learn new skills by watching just one video
In context: Teaching robots new skills has traditionally been slow and painstaking, requiring hours of step-by-step demonstrations for even the simplest tasks. If a robot encountered something unexpected, like dropping a tool or facing an unanticipated obstacle, its progress would often grind to a halt. This inflexibility has long limited the practical use of robots in environments where unpredictability is the norm. Researchers at Cornell University are now charting a new course with RHyME, an artificial intelligence framework that dramatically streamlines robot learning. An acronym for Retrieval for Hybrid Imitation under Mismatched Execution, RHyME enables robots to pick up new skills by watching a single demonstration video. This is a sharp departure from the exhaustive data collection and flawless repetition previously required for skill acquisition. The key advance with RHyME is its ability to overcome the challenge of translating human demonstrations into robotic actions. While humans naturally adapt their movements to changing circumstances, robots have historically needed rigid, perfectly matched instructions to succeed. Even slight differences between how a person and a robot perform a task could derail the learning process. RHyME tackles this problem by allowing robots to tap into a memory bank of previously observed actions. When shown a new demonstration, such as placing a mug in a sink, the robot searches its stored experiences for similar actions, like picking up a cup or putting down an object. The robot can figure out how to perform the new task by piecing together these familiar fragments, even if it has never seen that exact scenario. This approach makes robot learning more flexible and vastly more efficient. RHyME requires only about 30 minutes of robot-specific training data, compared to the thousands of hours demanded by earlier methods. In laboratory tests, robots using RHyME completed tasks over 50 percent more successfully than those trained with traditional techniques. The research team, led by doctoral student Kushal Kedia and assistant professor Sanjiban Choudhury, will present their findings at the upcoming IEEE International Conference on Robotics and Automation in Atlanta. Their collaborators include Prithwish Dan, Angela Chao, and Maximus Pace. The project has received support from Google, OpenAI, the US Office of Naval Research, and the National Science Foundation.
[4]
Robot see, robot do: System learns after watching how-to videos
Cornell University researchers have developed a new robotic framework powered by artificial intelligence -- called RHyME (Retrieval for Hybrid Imitation under Mismatched Execution) -- that allows robots to learn tasks by watching a single how-to video. Robots can be finicky learners. Historically, they've required precise, step-by-step directions to complete basic tasks, and tend to call it quits when things go off-script, like after dropping a tool or losing a screw. RHyME, however, could fast-track the development and deployment of robotic systems by significantly reducing the time, energy and money needed to train them, the researchers said. "One of the annoying things about working with robots is collecting so much data on the robot doing different tasks," said Kushal Kedia, a doctoral student in the field of computer science. "That's not how humans do tasks. We look at other people as inspiration." Kedia will present a paper titled "One-Shot Imitation under Mismatched Execution," in May at the Institute of Electrical and Electronics Engineers' International Conference on Robotics and Automation, in Atlanta. The work is also available on the arXiv preprint server. Home robot assistants are still a long way off because they lack the wits to navigate the physical world and its countless contingencies. To get robots up to speed, researchers like Kedia are training them with what amounts to how-to videos -- human demonstrations of various tasks in a lab setting. The hope of this approach, a branch of machine learning called "imitation learning," is that robots will learn a sequence of tasks faster and be able to adapt to real-world environments. "Our work is like translating French to English -- we're translating any given task from human to robot," said senior author Sanjiban Choudhury, assistant professor of computer science. This translation task still faces a broader challenge, however: Humans move too fluidly for a robot to track and mimic, and training robots with video requires gobs of it. Further, video demonstrations -- of, say, picking up a napkin or stacking dinner plates -- must be performed slowly and flawlessly, since any mismatch in actions between the video and the robot has historically spelled doom for robot learning, the researchers said. "If a human moves in a way that's any different from how a robot moves, the method immediately falls apart," Choudhury said. "Our thinking was, 'Can we find a principled way to deal with this mismatch between how humans and robots do tasks?'" RHyME is the team's answer -- a scalable approach that makes robots less finicky and more adaptive. It supercharges a robotic system to use its own memory and connect the dots when performing tasks it has viewed only once by drawing on videos it has seen. For example, a RHyME-equipped robot shown a video of a human fetching a mug from the counter and placing it in a nearby sink will comb its bank of videos and draw inspiration from similar actions -- like grasping a cup and lowering a utensil. RHyME paves the way for robots to learn multiple-step sequences while significantly lowering the amount of robot data needed for training, the researchers said. RHyME requires just 30 minutes of robot data; in a lab setting, robots trained using the system achieved a more than 50% increase in task success compared to previous methods, the researchers said. "This work is a departure from how robots are programmed today. The status quo of programming robots is thousands of hours of tele-operation to teach the robot how to do tasks. That's just impossible," Choudhury said. "With RHyME, we're moving away from that and learning to train robots in a more scalable way." Along with Kedia and Choudhury, the paper's authors are Prithwish Dan, Angela Chao, and Maximus Pace.
Share
Share
Copy Link
Cornell University researchers have developed RHyME, an AI-powered system that allows robots to learn complex tasks by watching a single human demonstration video, significantly improving efficiency and adaptability in robotic learning.
Researchers at Cornell University have made a significant breakthrough in the field of robotics and artificial intelligence with the development of RHyME (Retrieval for Hybrid Imitation under Mismatched Execution), a novel AI-powered system that enables robots to learn complex tasks by watching a single human demonstration video 12.
Historically, robots have been notoriously difficult to train, requiring precise, step-by-step instructions and struggling with unexpected scenarios. This inflexibility has long limited their practical use in unpredictable environments 3. RHyME addresses these challenges by allowing robots to adapt and learn more efficiently:
The key innovation of RHyME lies in its ability to bridge the gap between human and robotic motion:
For example, a RHyME-equipped robot shown a video of placing a mug in a sink can draw inspiration from similar actions like grasping a cup or lowering a utensil 1.
RHyME represents a significant shift in robot programming paradigms:
While RHyME marks a major advancement in robotic learning, challenges remain:
As research progresses, RHyME and similar technologies could revolutionize various industries, from manufacturing to healthcare, by enabling more flexible and capable robotic assistants.
Reference
[1]
[2]
MIT researchers have created a new method called Heterogeneous Pretrained Transformers (HPT) that uses generative AI to train robots for multiple tasks more efficiently, potentially revolutionizing the field of robotics.
6 Sources
6 Sources
Researchers from Johns Hopkins and Stanford have developed AI-powered surgical robots that learn from videos, performing complex tasks with human-like precision. This breakthrough could address surgeon shortages and enhance surgical efficiency.
7 Sources
7 Sources
Researchers at Johns Hopkins University and Stanford University have successfully trained a surgical robot to perform complex tasks with human-level skill using imitation learning, marking a significant advancement in autonomous robotic surgery.
7 Sources
7 Sources
Figure AI unveils Helix, an advanced Vision-Language-Action model that enables humanoid robots to perform complex tasks, understand natural language, and collaborate effectively, marking a significant leap in robotics technology.
9 Sources
9 Sources
Researchers develop an AI system enabling humanoid robots to mimic human movements, including dancing, walking, and fighting, potentially revolutionizing robot agility and adaptability.
2 Sources
2 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved