3 Sources
[1]
Can AI understand a flower without being able to touch or smell?
AI may be limited by a lack of taste, touch and smell which prevents it from fully understanding concepts in the same way as humans - suggesting that more advanced models may need to have a robot body The latest generation of artificial intelligence models seem to have a human-level understanding of the world, but it turns out that their lack of sensory information - and a body - places limits on how well they can comprehend concepts like a flower or humour. Qihui Xu at The Ohio State University and her colleagues asked both humans and large language models about their understanding of almost 4500 words - everything from "flower" and "hoof" to "humorous" and "swing." The participants and AI models were asked to rate each word for a variety of aspects, such as the level of emotional arousal they conjure up, or their links to senses and physical interaction with different parts of the body. The goal was to see how the large language models including OpenAI's GPT-3.5 and GPT-4 and Google's PaLM and Gemini compared to humans in their rankings. It turns out that humans and AI have a similar conceptual map of words that don't relate to interactions with the outside world, but differ greatly when words are linked to senses and physical actions. For instance, the AI models tended to believe that one could experience flowers via the torso - something that most humans would find odd, preferring to appreciate them visually or with a sniff. The problem, says Xu, is that LLMs build their understanding of the world from text hoovered-up from the internet, and that's just not sufficient to grasp sensual concepts. "They just differ so much from humans," she says. Some AI models are trained on visual information like photos and videos in addition to text, and the researchers found that the results of these models more closely matched the human word ratings, raising the possibility that adding more senses could bring future AI models ever-closer to human understanding of the world. "This tells us the benefits of doing multi-modal training might be larger than we expected. It's like one plus one actually can be greater than two," says Xu. "In terms of AI development, it sort of supports the importance of developing multi-modal models and the importance of having a body." Philip Feldman at University of Maryland, Baltimore County, says that giving AI models a robot body and exposing them to sensorimotor input would likely see a jump in ability, perhaps substantially so, but that we'll have to be very careful about how to go about it, given the risk of robots causing physical harm to people around them. Avoiding such risks would mean adding guardrails to the robots' actions, or only using robots for training that are soft so can cause no harm, says Feldman - but that would have its own downsides. "This is going to warp how they understand the world," says Feldman. "One of the things they would learn is that you can bounce off things, because they have little mass. And so now you try to put that deep understanding that has to do with physical contact [in a real robot with mass] and you have your humanoid robots believing that they can just crash into each other at full speed. Well, that's going to be a problem."
[2]
Things Humans Still Do Better Than AI: Understanding Flowers
Unlike humans, large language models can't learn via physical senses such as sight, smell, and touch (yet). While it might feel as though artificial intelligence is getting dangerously smart, there are still some basic concepts that AI doesn't comprehend as well as humans do. Back in March, we reported that popular large language models (LLMs) struggle to tell time and interpret calendars. Now, a study published earlier this week in Nature Human Behaviour reveals that AI tools like ChatGPT are also incapable of understanding familiar concepts, such as flowers, as well as humans do. According to the paper, accurately representing physical concepts is challenging for machine learning trained solely on text and sometimes images. “A large language model can’t smell a rose, touch the petals of a daisy or walk through a field of wildflowers,†Qihui Xu, lead author of the study and a postdoctoral researcher in psychology at Ohio State University, said in a university statement. “Without those sensory and motor experiences, it can’t truly represent what a flower is in all its richness. The same is true of some other human concepts.†The team tested humans and four AI modelsâ€"OpenAI's GPT-3.5 and GPT-4, and Google's PaLM and Geminiâ€"on their conceptual understanding of 4,442 words, including terms like flower, hoof, humorous, and swing. Xu and her colleagues compared the outcomes to two standard psycholinguistic ratings: the Glasgow Norms (the rating of words based on feelings such as arousal, dominance, familiarity, etc.) and the Lancaster Norms (the rating of words based on sensory perceptions and bodily actions). The Glasgow Norms approach saw the researchers asking questions like how emotionally arousing a flower is, and how easy it is to imagine one. The Lancaster Norms, on the other hand, involved questions including how much one can experience a flower through smell, and how much a person can experience a flower with their torso. In comparison to humans, LLMs demonstrated a strong understanding of words without sensorimotor associations (concepts like "justice"), but they struggled with words linked to physical concepts (like "flower," which we can see, smell, touch, etc.). The reason for this is rather straightforwardâ€"ChatGPT doesn't have eyes, a nose, or sensory neurons (yet) and so it can't learn through those senses. The best it can do is approximate, despite the fact that they train on more text than a person experiences in an entire lifetime, Xu explained. "From the intense aroma of a flower, the vivid silky touch when we caress petals, to the profound visual aesthetic sensation, human representation of â€~flower’ binds these diverse experiences and interactions into a coherent category," the researchers wrote in the study. "This type of associative perceptual learning, where a concept becomes a nexus of interconnected meanings and sensation strengths, may be difficult to achieve through language alone." In fact, the LLMs trained on both text and images demonstrated a better understanding of visual concepts than their text-only counterparts. That's not to say, however, that AI will forever be limited to language and visual information. LLMs are constantly improving, and they might one day be able to better represent physical concepts via sensorimotor data and/or robotics, according to Xu. She and her colleagues' research carries important implications for AI-human interactions, which are becoming increasingly (and, let's be honest, worryingly) intimate. For now, however, one thing is certain: “The human experience is far richer than words alone can hold," Xu concluded.
[3]
Why AI can't understand a flower the way humans do
Even with all its training and computer power, an artificial intelligence (AI) tool like ChatGPT can't represent the concept of a flower the way a human does, according to a new study. That's because the large language models (LLMs) that power AI assistants are usually based on language alone, and sometimes with images. "A large language model can't smell a rose, touch the petals of a daisy or walk through a field of wildflowers," said Qihui Xu, lead author of the study and postdoctoral researcher in psychology at The Ohio State University. "Without those sensory and motor experiences, it can't truly represent what a flower is in all its richness. The same is true of some other human concepts." The study is published in the journal Nature Human Behaviour. Xu said the findings have implications for how AI and humans relate to each other. "If AI construes the world in a fundamentally different way from humans, it could affect how it interacts with us," she said. Xu and her colleagues compared humans and LLMs in their knowledge representation of 4,442 words -- everything from "flower" and "hoof" to "humorous" and "swing." They compared the similarity of representations between humans and two state-of-the-art LLM families from OpenAI (GPT-3.5 and GPT-4) and Google (PaLM and Gemini). Humans and LLMs were tested on two measures. One, called the Glasgow Norms, asks for ratings of words on nine dimensions, such as arousal, concreteness and imageability. For example, the measure asks for ratings of how emotionally arousing a flower is, and how much one can mentally visualize a flower (or how imageable it is). The other measure, called Lancaster Norms, examined how concepts of words are related to sensory information (such as touch, hearing, smell, vision) and motor information, which are involved with actions -- such as what humans do through contact with the mouth, hand, arm and torso. For example, the measure asks for ratings on how much one experiences flowers by smelling, and how much one experiences flowers using actions from the torso. The goal was to see how the LLMs and humans were aligned in their ratings of the words. In one analysis, the researchers examined how much humans and AI were correlated on concepts. For example, do the LLMs and humans agree that some concepts have higher emotional arousal than others? In a second analysis, researchers investigated how humans compared to LLMs on deciding how different dimensions may jointly contribute to a word's overall conceptual representation and how different words are interconnected. For example, the concepts of "pasta" and "roses" might both receive high ratings for how much they involve the sense of smell. However, pasta is considered more similar to noodles than to roses -- at least for humans -- not just because of its smell, but also its visual appearance and taste. Overall, the LLMs did very well compared to humans in representing words that didn't have any connection to the senses and to motor actions. But when it came to words that have connections to things we see, taste or interact with using our body, that's where AI failed to capture human concepts. "From the intense aroma of a flower, the vivid silky touch when we caress petals, to the profound joy evoked, human representation of 'flower' binds these diverse experiences and interactions into a coherent category," the researchers say in the paper. The issue is that most LLMs are dependent on language, and "language by itself can't fully recover conceptual representation in all its richness," Xu said. Even though LLMs can approximate some human concepts, particularly when they don't involve senses or motor actions, this kind of learning is not efficient. "They obtain what they know by consuming vast amounts of text -- orders of magnitude larger than what a human is exposed to in their entire lifetimes -- and still can't quite capture some concepts the way humans do," Xu said. "The human experience is far richer than words alone can hold." But Xu noted that LLMs are continually improving and it's likely they will get better at capturing human concepts. The study did find that LLMs that are trained with images as well as text did do better than text-only models in representing concepts related to vision. And when future LLMs are augmented with sensor data and robotics, they may be able to actively make inferences about and act upon the physical world, she said. Co-authors on the study were Yingying Peng, Ping Li and Minghua Wu of the Hong Kong Polytechnic University; Samuel Nastase of Princeton University; and Martin Chodorow of the City University of New York.
Share
Copy Link
A new study reveals that AI models struggle to fully comprehend sensory-based concepts like flowers, highlighting the importance of multi-modal learning and potential future developments in AI.
A groundbreaking study published in Nature Human Behaviour has revealed that artificial intelligence (AI) models, despite their advanced capabilities, struggle to fully comprehend sensory-based concepts in the same way humans do. The research, led by Qihui Xu from The Ohio State University, highlights the limitations of AI in understanding concepts like flowers, which humans experience through multiple senses 1.
Source: Tech Xplore
The study compared the understanding of nearly 4,500 words between humans and large language models (LLMs) such as OpenAI's GPT-3.5 and GPT-4, and Google's PaLM and Gemini. Participants were asked to rate words on various aspects, including emotional arousal and connections to senses and physical interactions 2.
Results showed that while AI models performed well in comprehending abstract concepts without sensory associations, they struggled significantly with words linked to physical experiences. For instance, AI models tended to associate experiencing flowers with the torso, a concept most humans would find unusual 1.
Source: Gizmodo
Researchers attribute this discrepancy to the AI's lack of sensory and motor experiences. As Xu explains, "A large language model can't smell a rose, touch the petals of a daisy or walk through a field of wildflowers" 3. This limitation prevents AI from forming a complete representation of concepts like flowers, which for humans involve a rich tapestry of sensory inputs and emotional responses.
The study's findings have significant implications for AI development and human-AI interactions. It suggests that future advancements in AI may require more than just processing vast amounts of text data. Multi-modal training, incorporating visual information alongside text, has shown promise in improving AI's understanding of visual concepts 2.
Researchers propose that giving AI models a physical form through robotics could lead to substantial improvements in their ability to understand and interact with the world. Philip Feldman from the University of Maryland, Baltimore County, suggests that exposing AI to sensorimotor input through a robot body could result in a significant leap in capabilities 1.
However, this approach comes with its own set of challenges and risks. Feldman warns about the potential for physical harm and the need for careful implementation of safety measures in robotic AI systems 1.
As AI continues to evolve, researchers are exploring ways to bridge the gap between machine learning and human-like understanding. The integration of sensor data and robotics in future AI models may enable them to make inferences about and interact with the physical world more effectively 3.
This research underscores the complexity of human cognition and the challenges that remain in creating AI systems that can truly replicate the depth and richness of human understanding. As Xu concludes, "The human experience is far richer than words alone can hold" 2.
Apple is reportedly in talks with OpenAI and Anthropic to potentially use their AI models to power an updated version of Siri, marking a significant shift in the company's AI strategy.
29 Sources
Technology
22 hrs ago
29 Sources
Technology
22 hrs ago
Cloudflare introduces a new tool allowing website owners to charge AI companies for content scraping, aiming to balance content creation and AI innovation.
10 Sources
Technology
6 hrs ago
10 Sources
Technology
6 hrs ago
Elon Musk's AI company, xAI, has raised $10 billion in a combination of debt and equity financing, signaling a major expansion in AI infrastructure and development amid fierce industry competition.
5 Sources
Business and Economy
14 hrs ago
5 Sources
Business and Economy
14 hrs ago
Google announces a major expansion of AI tools for education, including Gemini for Education and NotebookLM, aimed at enhancing learning experiences for students and supporting educators in classroom management.
8 Sources
Technology
22 hrs ago
8 Sources
Technology
22 hrs ago
NVIDIA's upcoming GB300 Blackwell Ultra AI servers, slated for release in the second half of 2025, are poised to become the most powerful AI servers globally. Major Taiwanese manufacturers are vying for production orders, with Foxconn securing the largest share.
2 Sources
Technology
14 hrs ago
2 Sources
Technology
14 hrs ago