Curated by THEOUTPOST
On Fri, 25 Apr, 12:01 AM UTC
4 Sources
[1]
AI still can't beat humans at reading social cues
AI models have progressed rapidly in recent years and can already outperform humans in various tasks, from generating basic code to dominating games like chess and Go. But despite massive computing power and billions of dollars in investor funding, these advanced models still can't hold up to humans when it comes to truly understanding how real people interact with one another in the world. In other words, AI still fundamentally struggles at "reading the room." That's the claim made in a new paper by researchers from Johns Hopkins University. In the study, researchers asked a group of human volunteers to watch three-second video clips and rate the various ways individuals in those videos were interacting with one another. They then tasked more than 350 AI models -- including image, video, and language-based systems -- with predicting how the humans had rated those interactions. While the humans completed the task with ease, the AI models, regardless of their training data, struggled to accurately interpret what was happening in the clips. The researchers say their findings suggest that AI models still have significant difficulty understanding human social cues in real-world environments. That insight could have major implications for the growing industry of AI-enabled driverless cars and robots, which inherently need to navigate the physical world alongside people. "Anytime you want an AI system to interact with humans, you want to be able to know what those humans are doing and what groups of humans are doing with each other," John Hopkins University assistant professor of cognitive science and paper lead author Leyla Isik told Popular Science. "This really highlights how a lot of these models fall short on those tasks." Isik will present the research findings today at the International Conference on Learning Representations. Though previous research has shown that AI models can accurately describe what's happening in still images at a level comparable to humans, this study aimed to see whether that still holds true for video. To do that, Isik says she and her fellow researchers selected hundreds of videos from a computer vision dataset and clipped them down to three seconds each. They then narrowed the sample to include only videos featuring two humans interacting. Human volunteers viewed these clips, and answered a series of questions about what was happening, rated on a scale from 1 to 5. The questions ranged from objective prompts like "Do you think these bodies are facing each other?" to more subjective ones, such as whether the interaction appeared emotionally positive or negative. In general, the human respondents tended to give similar answers, as reflected in their ratings -- suggesting that people share a basic observational understanding of social interactions. The researchers then posed similar types of questions to image, video, and language models. (The language models were given human-written captions to analyze instead of raw video.) Across the board, the AI models failed to demonstrate the same level of consensus as the human participants. The language models generally performed better than the image and video models, but Isik notes that may be partly due to the fact that they were analyzing captions that were already quite descriptive. The researchers primarily evaluated open-access models, some of which were several years old. The study did not include the latest models recently released by leading AI companies like OpenAI and Anthropic. Still, the stark contrast between human and AI responses suggests there may be something fundamentally different about how models and humans process social and contextual information. "It's not enough to just see an image and recognize objects and faces," John Hopkins University doctoral student and paper co-author Kathy Garcia said in a statement. "We need AI to understand the story that is unfolding in a scene. Understanding the relationships, context, and dynamics of social interactions is the next step, and this research suggests there might be a blind spot in AI model development." The findings come as tech companies race to integrate AI into an increasing number of physical robots -- a concept often referred to as "embodied AI." Cities like Los Angeles, Phoenix, and Austin have become test beds of this new era thanks to the increasing presence of driverless Waymo robotaxis sharing the roads with human-driven vehicles. Limited understanding of complex environments has led some driverless cars to behave erratically or even get stuck in loops, driving in circles. While some recent studies suggest that driverless vehicles may currently be less prone to accidents than the average human driver, federal regulators have nonetheless opened up investigations into Waymo and Amazon-owned Zoox for driving behavior that allegedly violated safety laws. Other companies -- like Figure AI, Boston Dynamics, and Tesla -- are taking things a step further by developing AI-enabled humanoid robots designed to work alongside humans in manufacturing environments. Figure has already signed a deal with BMW to deploy one of its bipedal models at a facility in South Carolina, though its exact purpose remains somewhat vague. In these settings, properly understanding human social cues and context is even more critical, as even small misjudgments in intention run the risk of injury. Going a step further, some experts have even suggested that advanced humanoid robots could one day assist with elder and child care. Isik suggested the results of the study mean there are still several steps that need to be taken before that vision becomes a reality. "[The research] really highlights the importance of bringing neuroscience, cognitive science, and AI into these more dynamic real world settings." Isik said.
[2]
Awkward. Humans are still better than AI at reading the room
Humans, it turns out, are better than current AI models at describing and interpreting social interactions in a moving scene -- a skill necessary for self-driving cars, assistive robots, and other technologies that rely on AI systems to navigate the real world. The research, led by scientists at Johns Hopkins University, finds that artificial intelligence systems fail at understanding social dynamics and context necessary for interacting with people and suggests the problem may be rooted in the infrastructure of AI systems. "AI for a self-driving car, for example, would need to recognize the intentions, goals, and actions of human drivers and pedestrians. You would want it to know which way a pedestrian is about to start walking, or whether two people are in conversation versus about to cross the street," said lead author Leyla Isik, an assistant professor of cognitive science at Johns Hopkins University. "Any time you want an AI to interact with humans, you want it to be able to recognize what people are doing. I think this sheds light on the fact that these systems can't right now." Kathy Garcia, a doctoral student working in Isik's lab at the time of the research and co-first author, will present the research findings at the International Conference on Learning Representations on April 24. To determine how AI models measure up compared to human perception, the researchers asked human participants to watch three-second videoclips and rate features important for understanding social interactions on a scale of one to five. The clips included people either interacting with one another, performing side-by-side activities, or conducting independent activities on their own. The researchers then asked more than 350 AI language, video, and image models to predict how humans would judge the videos and how their brains would respond to watching. For large language models, the researchers had the AIs evaluate short, human-written captions. Participants, for the most part, agreed with each other on all the questions; the AI models, regardless of size or the data they were trained on, did not. Video models were unable to accurately describe what people were doing in the videos. Even image models that were given a series of still frames to analyze could not reliably predict whether people were communicating. Language models were better at predicting human behavior, while video models were better at predicting neural activity in the brain. The results provide a sharp contrast to AI's success in reading still images, the researchers said. "It's not enough to just see an image and recognize objects and faces. That was the first step, which took us a long way in AI. But real life isn't static. We need AI to understand the story that is unfolding in a scene. Understanding the relationships, context, and dynamics of social interactions is the next step, and this research suggests there might be a blind spot in AI model development," Garcia said. Researchers believe this is because AI neural networks were inspired by the infrastructure of the part of the brain that processes static images, which is different from the area of the brain that processes dynamic social scenes. "There's a lot of nuances, but the big takeaway is none of the AI models can match human brain and behavior responses to scenes across the board, like they do for static scenes," Isik said. "I think there's something fundamental about the way humans are processing scenes that these models are missing."
[3]
AI Still Falls Short in Understanding Human Social Interactions - Neuroscience News
Summary: Humans significantly outperform AI models in interpreting dynamic social interactions, a skill critical for technologies like autonomous vehicles and assistive robots. In a new study, participants reliably judged short videos of social scenes, while over 350 AI models struggled to match human accuracy or predict brain responses. Language models fared better at guessing human interpretations, while video models were better at predicting brain activity, but neither matched human capabilities. Researchers believe the gap stems from how current AI is modeled after brain areas specialized in static image processing, overlooking the dynamics required for real-life social understanding. Humans, it turns out, are better than current AI models at describing and interpreting social interactions in a moving scene -- a skill necessary for self-driving cars, assistive robots, and other technologies that rely on AI systems to navigate the real world. The research, led by scientists at Johns Hopkins University, finds that artificial intelligence systems fail at understanding social dynamics and context necessary for interacting with people and suggests the problem may be rooted in the infrastructure of AI systems. "AI for a self-driving car, for example, would need to recognize the intentions, goals, and actions of human drivers and pedestrians. "You would want it to know which way a pedestrian is about to start walking, or whether two people are in conversation versus about to cross the street," said lead author Leyla Isik, an assistant professor of cognitive science at Johns Hopkins University. "Any time you want an AI to interact with humans, you want it to be able to recognize what people are doing. I think this sheds light on the fact that these systems can't right now." Kathy Garcia, a doctoral student working in Isik's lab at the time of the research and co-first author, will present the research findings at the International Conference on Learning Representations on April 24. To determine how AI models measure up compared to human perception, the researchers asked human participants to watch three-second videoclips and rate features important for understanding social interactions on a scale of one to five. The clips included people either interacting with one another, performing side-by-side activities, or conducting independent activities on their own. The researchers then asked more than 350 AI language, video, and image models to predict how humans would judge the videos and how their brains would respond to watching. For large language models, the researchers had the AIs evaluate short, human-written captions. Participants, for the most part, agreed with each other on all the questions; the AI models, regardless of size or the data they were trained on, did not. Video models were unable to accurately describe what people were doing in the videos. Even image models that were given a series of still frames to analyze could not reliably predict whether people were communicating. Language models were better at predicting human behavior, while video models were better at predicting neural activity in the brain. The results provide a sharp contrast to AI's success in reading still images, the researchers said. "It's not enough to just see an image and recognize objects and faces. That was the first step, which took us a long way in AI. But real life isn't static. We need AI to understand the story that is unfolding in a scene. "Understanding the relationships, context, and dynamics of social interactions is the next step, and this research suggests there might be a blind spot in AI model development," Garcia said. Researchers believe this is because AI neural networks were inspired by the infrastructure of the part of the brain that processes static images, which is different from the area of the brain that processes dynamic social scenes. "There's a lot of nuances, but the big takeaway is none of the AI models can match human brain and behavior responses to scenes across the board, like they do for static scenes," Isik said. "I think there's something fundamental about the way humans are processing scenes that these models are missing." Author: Hannah Robbins Source: JHU Contact: Hannah Robbins - JHU Image: The image is credited to Neuroscience News Original Research: The findings will be presented at the International Conference on Learning Representations
[4]
Awkward -- A.I. Struggles to Understand Human Social Interactions, Study Finds
New research from Johns Hopkins shows A.I. models fall short in reading social dynamics, posing risks for real-world technologies like self-driving cars. While A.I. excels at solving complex logical problems, it falls short when understanding social dynamics. A new study by researchers at Johns Hopkins University reveals that A.I. systems still struggle with reading the nuances of human behavior, a skill crucial for real-world applications like robotics and self-driving cars. Sign Up For Our Daily Newsletter Sign Up Thank you for signing up! By clicking submit, you agree to our <a href="http://observermedia.com/terms">terms of service</a> and acknowledge we may use your information to send you emails, product samples, and promotions on this website and other properties. You can opt out anytime. See all of our newsletters To test A.I.'s ability to navigate human environments, researchers designed an experiment in which both humans and A.I. models watched short, three-second videos of groups of people interacting at varying levels of intensity. Each participant -- human or machine -- was asked to rate how intense the interactions appeared, according to findings presented last week at the International Conference on Learning Representations. When it comes to technologies like autonomous vehicles, the stakes are high, because human drivers make decisions based on not just traffic signals, but also predicting how other drivers will behave. "The A.I. needs to be able to predict what nearby people are up to," said study co-author Leyla Isik, a cognitive science professor at Johns Hopkins. "It's vital for the A.I. running a vehicle to be able to recognize whether people are just hanging out, interacting with one another or preparing to walk across the street." The experiment revealed a stark difference between human and machine performance. Among the 150 human participants, evaluations of the videos were remarkably consistent. In contrast, the 380 A.I. models' assessments were scattered and inconsistent, regardless of their sophistication. Dan Malinsky, a professor of biostatistics at Columbia University, said the study highlights key limitations of current A.I. technology, particularly "when it comes to predicting and understanding how dynamic systems change over time," he told Observer. Understanding the thinking and emotions of an interaction involving multiple people can be challenging even for humans, said Konrad Kording, a bioengineering and neuroscience professor at the University of Pennsylvania. "There are many things, like chess, that A.I. is better at and many things we might be better at. There are lots of things I would never trust an A.I. to do and some I wouldn't trust myself to do," Kording told Observer. Researchers believe the problem may be rooted in the infrastructure of A.I. systems. A.I. neural networks are modeled after the part of the human brain that processes static images, which is different from the area of the brain that processes dynamic social scenes. "There's a lot of nuances, but the big takeaway is none of the A.I. models can match human brain and behavior responses to scenes across the board, like they do for static scenes," Isik said. "I think there's something fundamental about the way humans are processing scenes that these models are missing." "It's not enough to just see an image and recognize objects and faces. That was the first step, which took us a long way in A.I. But real life isn't static. We need A.I. to understand the story that is unfolding in a scene. Understanding the relationships, context, and dynamics of social interactions is the next step, and this research suggests there might be a blind spot in A.I. model development," said Kathy Garcia, a co-author of the study.
Share
Share
Copy Link
A new study from Johns Hopkins University shows that current AI models struggle to interpret social dynamics and context in video clips, highlighting a significant gap between human and machine perception of social interactions.
A groundbreaking study led by researchers at Johns Hopkins University has revealed a significant gap between human and artificial intelligence (AI) capabilities in understanding social interactions. The research, presented at the International Conference on Learning Representations, demonstrates that current AI models fall short when it comes to interpreting dynamic social scenes, a crucial skill for technologies like self-driving cars and assistive robots 1.
The researchers conducted an experiment involving both human participants and over 350 AI models:
The results showed a stark contrast:
This research highlights several important points:
Real-world applications: The ability to understand social cues is crucial for technologies like self-driving cars and robots that need to interact with humans in dynamic environments 1.
AI model limitations: While AI has shown success in tasks involving static images, it struggles with interpreting dynamic social scenes 4.
Fundamental differences: The researchers suggest that the gap may be due to how current AI neural networks are modeled after brain areas specialized in static image processing, overlooking the dynamics required for real-life social understanding 2.
Lead author Leyla Isik, an assistant professor of cognitive science at Johns Hopkins University, emphasized the importance of this research for AI development:
"Anytime you want an AI system to interact with humans, you want to be able to know what those humans are doing and what groups of humans are doing with each other. This really highlights how a lot of these models fall short on those tasks." 1
The study underscores the need for further research and development in AI to bridge this gap in social understanding. As AI continues to be integrated into various aspects of daily life, addressing these limitations will be crucial for creating safer and more effective AI-powered technologies 4.
Reference
[1]
[2]
[3]
Researchers at Tufts University have discovered that AI language models are universally poor at identifying appropriate moments to contribute in conversations, highlighting a significant gap in AI's ability to engage in natural dialogue.
2 Sources
2 Sources
As artificial intelligence continues to evolve at an unprecedented pace, experts debate its potential to revolutionize industries while others warn of the approaching technological singularity. The manifestation of unusual AI behaviors raises concerns about the widespread adoption of this largely misunderstood technology.
2 Sources
2 Sources
A new study finds that ChatGPT, while excelling at logic and math, displays many of the same cognitive biases as humans when making subjective decisions, raising concerns about AI's reliability in high-stakes decision-making processes.
3 Sources
3 Sources
A study by USC researchers reveals that AI models, particularly open-source ones, struggle with abstract visual reasoning tasks similar to human IQ tests. While closed-source models like GPT-4V perform better, they still fall short of human cognitive abilities.
4 Sources
4 Sources
A study by University of Edinburgh researchers shows that advanced AI models have difficulty interpreting analog clocks and calendars, highlighting a significant gap in AI capabilities for everyday tasks.
6 Sources
6 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved