Curated by THEOUTPOST
On Fri, 11 Oct, 12:02 AM UTC
4 Sources
[1]
These AI models reason better than their open-source peers - but still can't rival humans
A study tested AI's ability to complete visual puzzles like those found on human IQ tests. It went poorly. Can artificial intelligence (AI) pass cognitive puzzles designed for human IQ tests? The results were mixed. Researchers from the USC Viterbi School of Engineering Information Sciences Institute (ISI) investigated whether multi-modal large language models (MLLMs) can solve abstract visual tests usually reserved for humans. Also: The best AI chatbots: ChatGPT, Copilot, and worthy alternatives Presented at the Conference on Language Modeling (COLM 2024) in Philadelphia last week, the research tested "the nonverbal abstract reasoning abilities of open-source and closed-source MLLMs" by seeing if image-processing models could go a step further and demonstrate reasoning skills when presented with visual puzzles. "For example, if you see a yellow circle turning into a blue triangle, can the model apply the same pattern in a different scenario?" explained Kian Ahrabian, a research assistant on the project, according to Neuroscience News. This task requires the model to use visual perception and logical reasoning similar to how humans think, making it a more complex challenge. The researchers tested 24 different MLLMs on puzzles developed from Raven's Progressive Matrices, a standard type of abstract reasoning -- and the AI models didn't exactly succeed. "They were really bad. They couldn't get anything out of it," Ahrabian said. The models struggled both to understand the visuals and to interpret patterns. However, the results varied. Overall, the study found that open-source models had more difficulty with visual reasoning puzzles than closed-source models like GPT-4V, though those still didn't rival human cognitive abilities. The researchers were able to help some models perform better using a technique called Chain of Thought prompting, which guides the model step-by-step through the reasoning portion of the test. Also: Open-source AI definition finally gets its first release candidate - and a compromise Closed-source models are thought to perform better in tests like these due to being specially developed, trained with bigger datasets, and having the advantages of private companies' computing power. "Specifically, GPT-4V was relatively good at reasoning, but it's far from perfect," Ahrabian noted. "We still have such a limited understanding of what new AI models can do, and until we understand these limitations, we can't make AI better, safer, and more useful," said Jay Pujara, research associate professor and author. "This paper helps fill in a missing piece of the story of where AI struggles." Also: AI can now solve reCAPTCHA tests as accurately as you can By finding the weaknesses in AI models' ability to reason, research like this can help direct efforts to flesh out those skills down the line -- the goal being to achieve human-level logic. But don't worry: For the time being, they're not comparable to human cognition.
[2]
Can advanced AI can solve visual puzzles and perform abstract reasoning?
Artificial Intelligence has learned to master language, generate art, and even beat grandmasters at chess. But can it crack the code of abstract reasoning -- those tricky visual puzzles that leave humans scratching their heads? Researchers at USC Viterbi School of Engineering Information Sciences Institute (ISI) are putting AI's cognitive abilities to the test, pushing the multi-modal large language models (MLLMs) to solve visual problems once reserved for human IQ tests. The result? A glimpse into how far AI has come -- and where it still stumbles. USC Viterbi ISI Research Assistants Kian Ahrabian and Zhivar Sourati recently investigated whether MLLMs can perform nonverbal abstract reasoning, tasks that require both visual perception and logical reasoning, and presented their findings at the Conference on Language Modeling (COLM 2024) in Philadelphia, PA October 7-9, 2024. The work is also available on the arXiv preprint server. Jay Pujara, research associate professor of computer science at the USC Viterbi School of Engineering and an author on the paper said, "Every day we're bombarded with new headlines about what AI can (and can't) do, which are often very surprising. We still have such a limited understanding of what new AI models can do, and until we understand these limitations we can't make AI better, safer, and more useful. This paper helps fill in a missing piece of the story of where AI struggles." The challenge: Can AI see and think? "We wanted to see if this new generation of large models, which are able to process images, can reason on their own," Ahrabian explained. "For example, if you see a yellow circle turning into a blue triangle, can the model apply the same pattern in a different scenario?" To answer this question, the team tested 24 different MLLMs on puzzles based on Raven's Progressive Matrices, a well-known test of abstract reasoning. They found that open-source models struggled significantly. "They were really bad. They couldn't get anything out of it," Ahrabian said plainly. In contrast, closed-source models, such as GPT-4V -- models developed by private companies and not publicly available for modification -- performed better. These models are typically trained with more advanced resources, including larger datasets and more powerful computing systems, giving them a noticeable edge. "We saw some nontrivial results with closed-source models," Ahrabian added, "Specifically, GPT-4V was relatively good at reasoning, but it's far from perfect." Where the AI stumbles A critical part of the study involved dissecting where these models were failing. One key issue was the AI's ability to accurately process visual information. "We wanted to know if the models could see the details -- like colors or lines colliding -- and whether that was where they were going wrong," Ahrabian said. To isolate the problem, the researchers provided detailed textual descriptions of the images, ensuring the models had all the necessary information in a different format "Even when we removed the visual element and just gave them text, many models still couldn't reason effectively," Sourati explained. This revealed a crucial insight: the issue wasn't just with visual processing -- it was with the reasoning itself. Now, the team had a clearer picture of what wasn't working, which allowed them to refine their focus and guide future improvements. The path forward: Improving AI's reasoning One promising method the researchers explored was "Chain of Thought prompting," where the AI is prompted to think step by step through reasoning tasks. This approach led to significant improvements in some cases. "By guiding the models with hints, we were able to see up to 100% improvement in performance," Ahrabian noted. Despite the remaining challenges, the researchers are optimistic. The study's findings highlight both the current limitations of AI and the exciting possibilities for future advancements. As these models continue to develop, USC's research could pave the way for AI that not only understands but reasons -- blurring the line between machine intelligence and human cognition.
[3]
Can advanced AI can solve visual puzzles and perform abstract reasoning?
Artificial Intelligence has learned to master language, generate art, and even beat grandmasters at chess. But can it crack the code of abstract reasoning -- those tricky visual puzzles that leave humans scratching their heads? Researchers at USC Viterbi School of Engineering Information Sciences Institute (ISI) are putting AI's cognitive abilities to the test, pushing the multi-modal large language models (MLLMs) to solve visual problems once reserved for human IQ tests. The result? A glimpse into how far AI has come -- and where it still stumbles. USC Viterbi ISI Research Assistants Kian Ahrabian and Zhivar Sourati recently investigated whether MLLMs can perform nonverbal abstract reasoning, tasks that require both visual perception and logical reasoning, and presented their findings at the Conference on Language Modeling (COLM 2024) in Philadelphia, PA October 7-9, 2024. Jay Pujara, research associate professor of computer science at the USC Viterbi School of Engineering and an author on the paper said, "Every day we're bombarded with new headlines about what AI can (and can't) do, which are often very surprising. We still have such a limited understanding of what new AI models can do, and until we understand these limitations we can't make AI better, safer, and more useful. This paper helps fill in a missing piece of the story of where AI struggles." The Challenge: Can AI See and Think? "We wanted to see if this new generation of large models, which are able to process images, can reason on their own," Ahrabian explained. "For example, if you see a yellow circle turning into a blue triangle, can the model apply the same pattern in a different scenario?" To answer this question, the team tested 24 different MLLMs on puzzles based on Raven's Progressive Matrices, a well-known test of abstract reasoning. They found that open-source models struggled significantly. "They were really bad. They couldn't get anything out of it," Ahrabian said plainly. In contrast, closed-source models, such as GPT-4V -- models developed by private companies and not publicly available for modification -- performed better. These models are typically trained with more advanced resources, including larger datasets and more powerful computing systems, giving them a noticeable edge. "We saw some nontrivial results with closed-source models," Ahrabian added, "Specifically, GPT-4V was relatively good at reasoning, but it's far from perfect." Where the AI Stumbles A critical part of the study involved dissecting where these models were failing. One key issue was the AI's ability to accurately process visual information. "We wanted to know if the models could see the details -- like colors or lines colliding -- and whether that was where they were going wrong," Ahrabian said. To isolate the problem, the researchers provided detailed textual descriptions of the images, ensuring the models had all the necessary information in a different format "Even when we removed the visual element and just gave them text, many models still couldn't reason effectively," Sourati explained. This revealed a crucial insight: the issue wasn't just with visual processing -- it was with the reasoning itself. Now, the team had a clearer picture of what wasn't working, which allowed them to refine their focus and guide future improvements. The Path Forward: Improving AI's Reasoning One promising method the researchers explored was "Chain of Thought prompting," where the AI is prompted to think step by step through reasoning tasks. This approach led to significant improvements in some cases. "By guiding the models with hints, we were able to see up to 100% improvement in performance," Ahrabian noted. Despite the remaining challenges, the researchers are optimistic. The study's findings highlight both the current limitations of AI and the exciting possibilities for future advancements. As these models continue to develop, USC's research could pave the way for AI that not only understands but reasons -- blurring the line between machine intelligence and human cognition. New Research at a New Conference Ahrabian and Sourati, Ph.D students at the Thomas Lord Department of Computer Science, presented the paper, The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large Language Models, at COLM this week, marking the conference's inaugural year. Pujara, who is also the director of the Center on Knowledge Graphs at ISI, commented, "AI is undergoing a major shift with the advent of language models. The emergence of new conferences like COLM to support this evolution is a great way to foster collaboration and inspire students eager to contribute to this rapidly advancing field."
[4]
Can AI Tackle Abstract Reasoning? Study Tests Cognitive Limits - Neuroscience News
Summary: Researchers tested artificial intelligence's ability to solve abstract visual puzzles similar to human IQ tests, revealing gaps in AI's reasoning skills. Open-source AI models struggled, while closed-source models like GPT-4V performed better, but far from perfectly. Detailed analyses showed that AI stumbled not only in interpreting visuals but also in the reasoning required to solve patterns. One technique, "Chain of Thought prompting," helped improve AI performance by guiding it through step-by-step reasoning. The study highlights both the promise and current limits of AI's cognitive abilities. This work suggests a path forward in developing AI that can understand and reason as humans do. Artificial Intelligence has learned to master language, generate art, and even beat grandmasters at chess. But can it crack the code of abstract reasoning -- those tricky visual puzzles that leave humans scratching their heads? Researchers at USC Viterbi School of Engineering Information Sciences Institute (ISI) are putting AI's cognitive abilities to the test, pushing the multi-modal large language models (MLLMs) to solve visual problems once reserved for human IQ tests. The result? A glimpse into how far AI has come -- and where it still stumbles. USC Viterbi ISI Research Assistants Kian Ahrabian and Zhivar Sourati recently investigated whether MLLMs can perform nonverbal abstract reasoning, tasks that require both visual perception and logical reasoning, and presented their findings at the Conference on Language Modeling (COLM 2024) in Philadelphia, PA October 7-9, 2024. Jay Pujara, research associate professor of computer science at the USC Viterbi School of Engineering and an author on the paper said, "Every day we're bombarded with new headlines about what AI can (and can't) do, which are often very surprising. "We still have such a limited understanding of what new AI models can do, and until we understand these limitations we can't make AI better, safer, and more useful. This paper helps fill in a missing piece of the story of where AI struggles." The Challenge: Can AI See and Think? "We wanted to see if this new generation of large models, which are able to process images, can reason on their own," Ahrabian explained. "For example, if you see a yellow circle turning into a blue triangle, can the model apply the same pattern in a different scenario?" To answer this question, the team tested 24 different MLLMs on puzzles based on Raven's Progressive Matrices, a well-known test of abstract reasoning. They found that open-source models struggled significantly. "They were really bad. They couldn't get anything out of it," Ahrabian said plainly. In contrast, closed-source models, such as GPT-4V -- models developed by private companies and not publicly available for modification -- performed better. These models are typically trained with more advanced resources, including larger datasets and more powerful computing systems, giving them a noticeable edge. "We saw some nontrivial results with closed-source models," Ahrabian added, "Specifically, GPT-4V was relatively good at reasoning, but it's far from perfect." Where the AI Stumbles A critical part of the study involved dissecting where these models were failing. One key issue was the AI's ability to accurately process visual information. "We wanted to know if the models could see the details -- like colors or lines colliding -- and whether that was where they were going wrong," Ahrabian said. To isolate the problem, the researchers provided detailed textual descriptions of the images, ensuring the models had all the necessary information in a different format. "Even when we removed the visual element and just gave them text, many models still couldn't reason effectively," Sourati explained. This revealed a crucial insight: the issue wasn't just with visual processing -- it was with the reasoning itself. Now, the team had a clearer picture of what wasn't working, which allowed them to refine their focus and guide future improvements. The Path Forward: Improving AI's Reasoning One promising method the researchers explored was "Chain of Thought prompting," where the AI is prompted to think step by step through reasoning tasks. This approach led to significant improvements in some cases. "By guiding the models with hints, we were able to see up to 100% improvement in performance," Ahrabian noted. Despite the remaining challenges, the researchers are optimistic. The study's findings highlight both the current limitations of AI and the exciting possibilities for future advancements. As these models continue to develop, USC's research could pave the way for AI that not only understands but reasons -- blurring the line between machine intelligence and human cognition. New Research at a New Conference Ahrabian and Sourati, Ph.D students at the Thomas Lord Department of Computer Science, presented the paper, The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large Language Models, at COLM this week, marking the conference's inaugural year. Pujara, who is also the director of the Center on Knowledge Graphs at ISI, commented, "AI is undergoing a major shift with the advent of language models. The emergence of new conferences like COLM to support this evolution is a great way to foster collaboration and inspire students eager to contribute to this rapidly advancing field." Author: Amy Blumenthal Source: USC Contact: Amy Blumenthal - USC Image: The image is credited to Neuroscience News Original Research: The findings will be presented at the Conference on Language Modeling (COLM 2024)
Share
Share
Copy Link
A study by USC researchers reveals that AI models, particularly open-source ones, struggle with abstract visual reasoning tasks similar to human IQ tests. While closed-source models like GPT-4V perform better, they still fall short of human cognitive abilities.
Researchers from the USC Viterbi School of Engineering Information Sciences Institute (ISI) have conducted a groundbreaking study to assess the capabilities of artificial intelligence in solving abstract visual puzzles similar to those found in human IQ tests. The study, presented at the Conference on Language Modeling (COLM 2024) in Philadelphia, reveals significant limitations in AI's ability to perform nonverbal abstract reasoning tasks 1.
The research team, led by Kian Ahrabian and Zhivar Sourati, tested 24 different multi-modal large language models (MLLMs) using puzzles based on Raven's Progressive Matrices, a standard test of abstract reasoning. The results showed a stark contrast between open-source and closed-source AI models 2.
Open-source models performed poorly, with Ahrabian stating, "They were really bad. They couldn't get anything out of it." In contrast, closed-source models like GPT-4V demonstrated better performance, though still far from matching human cognitive abilities 3.
The researchers delved deeper to understand where the AI models were failing. They discovered that the issue was not limited to visual processing but extended to the reasoning process itself. Even when provided with detailed textual descriptions of the images, many models struggled to reason effectively 4.
To enhance AI performance, the team explored a technique called "Chain of Thought prompting." This method guides the AI through step-by-step reasoning tasks and led to significant improvements in some cases. Ahrabian noted, "By guiding the models with hints, we were able to see up to 100% improvement in performance" 2.
Jay Pujara, research associate professor and author of the study, emphasized the importance of understanding AI's limitations: "We still have such a limited understanding of what new AI models can do, and until we understand these limitations, we can't make AI better, safer, and more useful" 1.
The study's findings highlight both the current limitations of AI and the potential for future advancements. As AI models continue to evolve, this research could pave the way for developing AI systems that can not only understand but also reason in ways more comparable to human cognition 4.
Reference
[4]
A recent study by Apple researchers exposes significant flaws in the mathematical reasoning capabilities of large language models (LLMs), challenging the notion of AI's advanced reasoning skills and raising questions about their real-world applications.
17 Sources
17 Sources
A new study from the University of Amsterdam and Santa Fe Institute shows that while GPT models perform well on standard analogy tasks, they struggle with variations, indicating limitations in AI's reasoning capabilities compared to humans.
2 Sources
2 Sources
Epoch AI's FrontierMath, a new mathematics benchmark, reveals that leading AI models struggle with complex mathematical problems, solving less than 2% of the challenges.
8 Sources
8 Sources
Scale AI and the Center for AI Safety have introduced a challenging new AI benchmark called 'Humanity's Last Exam', which has proven difficult for even the most advanced AI models, highlighting the current limitations of artificial intelligence.
7 Sources
7 Sources
Apple researchers conducted tests revealing significant limitations in AI models' ability to perform simple arithmetic and logical reasoning, raising questions about the true intelligence of current AI systems.
2 Sources
2 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved