Curated by THEOUTPOST
On Fri, 14 Mar, 8:05 AM UTC
6 Sources
[1]
Most current AI struggles to read clocks and calendars
Some of the world's most advanced AI systems struggle to tell the time and work out dates on calendars, a study suggests. While AI models can perform complex tasks such as writing essays and generating art, they have yet to master some skills that humans carry out with ease, researchers say. A team from the University of Edinburgh has shown that state-of-the-art AI models are unable to reliably interpret clock-hand positions or correctly answer questions about dates on calendars. Unlike simply recognising shapes, understanding analogue clocks and calendars requires a combination of spatial awareness, context and basic maths - something that remains challenging for AI, the team says. Overcoming this could enable AI systems to power time-sensitive applications like scheduling assistants, autonomous robots and tools for people with visual impairments, researchers say. The team tested if AI systems that process text and images - known as multimodal large language models (MLLMs) - can answer time-related questions by looking at a picture of a clock or a calendar. Researchers tested various clock designs, including some with Roman numerals, with and without second hands, and different coloured dials. Their findings show that AI systems, at best, got clock-hand positions right less than a quarter of the time. Mistakes were more common when clocks had Roman numerals or stylised clock hands. AI systems also did not perform any better when the second hand was removed, suggesting there are deep-seated issues with hand detection and angle interpretation, the team says. The researchers asked AI models to answer a range of calendar-based questions, such as identifying holidays and working out past and future dates. The team found that even the best-performing AI model got date calculations wrong one-fifth of the time. The findings are reported in a peer-reviewed paper that will be presented at the Reasoning and Planning for Large Language Models workshop at The Thirteenth International Conference on Learning Representations (ICLR) in Singapore on 28 April 2025. Rohit Saxena, of the University of Edinburgh's School of Informatics, who led the study, said: "Most people can tell the time and use calendars from an early age. Our findings highlight a significant gap in the ability of AI to carry out what are quite basic skills for people. These shortfalls must be addressed if AI systems are to be successfully integrated into time-sensitive, real-world applications, such as scheduling, automation and assistive technologies." Aryo Gema, also of the School of Informatics, said: "AI research today often emphasises complex reasoning tasks, but ironically, many systems still struggle when it comes to simpler, everyday tasks. Our findings suggest it's high time we addressed these fundamental gaps. Otherwise, integrating AI into real-world, time-sensitive applications might remain stuck at the eleventh hour."
[2]
Most AIs struggle with reading clocks, misreading faces 75% of the time
Facepalm: Generative AI tools are able to perform the sorts of tasks that once seemed the stuff of sci-fi, but most of them still struggle with many basic skills, including reading analog clocks and calendars. A new study has found that overall, AI systems read clock faces correctly less than a quarter of the time. A team of researchers at Edinburgh University tested some top multimodal large language models to see how well they could answer questions based on images of clocks and calendars. The systems being tested were Google DeepMind's Gemini 2.0, Anthropic's Claude 3.5 Sonnet, Meta's Llama 3.2-11B-Vision-Instruct, Alibaba's Qwen2-VL7B-Instruct, ModelBest's MiniCPM-V-2.6, and OpenAI's GPT-4o and GPT-o1. Various types of clocks appeared in the images: some with Roman numerals, those with and without seconds hands, different colored dials, etc. The systems read the clocks correctly less than 25% of the time. They struggled more with clocks that used Roman numerals and stylized hands. The AI's performance didn't improve when the seconds hand was removed, leading researchers to suggest that the problem comes from detecting the clocks' hands and interpreting the angles on a clock face. Using 10 years of calendar images, the researchers asked questions such as what day of the week is New Year's Day? and What is the 153rd day of the year? Even the most successful AI models got the calendar questions wrong 20 percent of the time. The success rates varied based on the AI system being used. Gemini-2.0 was the highest scorer in the clock test, while GPT-01 was accurate 80% of the time on the calendar questions. "Most people can tell the time and use calendars from an early age," said study lead Rohit Saxena, from Edinburgh University's School of Informatics. "Our findings highlight a significant gap in the ability of AI to carry out what are quite basic skills for people. These shortfalls must be addressed if AI systems are to be successfully integrated into time-sensitive, real-world applications, such as scheduling, automation and assistive technologies." Aryo Gema, another researcher from Edinburgh's School of Informatics, said, "AI research today often emphasises complex reasoning tasks, but ironically, many systems still struggle when it comes to simpler, everyday tasks." The findings are being reported in a peer-reviewed paper that will be presented at the Reasoning and Planning for Large Language Models workshop at The Thirteenth International Conference on Learning Representations (ICLR) in Singapore on April 28. The findings are currently available on the preprint server arXiv. This isn't the first study this month showing AI systems still make plenty of mistakes. The Tow Center for Digital Journalism studied eight AI search engines and found that they are inaccurate 60 percent of the time. The worst culprit was Grok-3, which was 94 percent inaccurate.
[3]
AI Sucks at Reading Clocks
Large language models still struggle with simple tasks like telling time. These days, artificial intelligence can generate photorealistic images, write novels, do your homework, and even predict protein structures. New research, however, reveals that it often fails at a very basic task: telling time. Researchers at Edinburgh University have tested the ability of seven well-known multimodal large language modelsâ€"the kind of AI that can interpret and generate various kinds of mediaâ€"to answer time-related questions based on different images of clocks or calendars. Their study, forthcoming in April and currently hosted on the preprint server arXiv, demonstrates that the LLMs has difficulty with these basic tasks. “The ability to interpret and reason about time from visual inputs is critical for many real-world applicationsâ€"ranging from event scheduling to autonomous systems,†the researchers wrote in the study. “Despite advances in multimodal large language models (MLLMs), most work has focused on object detection, image captioning, or scene understanding, leaving temporal inference underexplored.†The team tested OpenAI’s GPT-4o and GPT-o1; Google DeepMind’s Gemini 2.0; Anthropic’s Claude 3.5 Sonnet; Meta’s Llama 3.2-11B-Vision-Instruct; Alibaba’s Qwen2-VL7B-Instruct; and ModelBest’s MiniCPM-V-2.6. They fed the models different images of analog clocksâ€"timekeepers with Roman numerals, different dial colors, and even some missing the seconds handâ€"as well as 10 years of calendar images. For the clock images, the researchers asked the LLMs, what time is shown on the clock in the given image? For the calendar images, the researchers asked simple questions such as, what day of the week is New Year’s Day? and harder queries including what is the 153rd day of the year? “Analogue clock reading and calendar comprehension involve intricate cognitive steps: they demand fine-grained visual recognition (e.g., clock-hand position, day-cell layout) and non-trivial numerical reasoning (e.g., calculating day offsets),†the researchers explained. Overall, the AI systems did not perform well. They read the time on analog clocks correctly less than 25% of the time. They struggled with clocks bearing Roman numerals and stylized hands as much as they did with clocks lacking a seconds hand altogether, indicating that the issue may stem from detecting the hands and interpreting angles on the clock face, according to the researchers. Google's Gemini-2.0 scored highest on the team's clock task, while GPT-o1 was accurate on the calendar task 80% of the timeâ€"a far better result than its competitors. But even then, the most successful MLLM on the calendar task still made mistakes about 20% of the time. “Most people can tell the time and use calendars from an early age. Our findings highlight a significant gap in the ability of AI to carry out what are quite basic skills for people,†Rohit Saxena, a co-author of the study and PhD student at the University of Edinburgh’s School of Informatics, said in a university statement. “These shortfalls must be addressed if AI systems are to be successfully integrated into time-sensitive, real-world applications, such as scheduling, automation and assistive technologies.†So while AI might be able to complete your homework, don’t count on it sticking to any deadlines.
[4]
Most AI struggles to read clocks and calendars, study finds
Some of the world's most advanced AI systems struggle to tell the time and work out dates on calendars, a study suggests. While AI models can perform complex tasks such as writing essays and generating art, they have yet to master some skills that humans carry out with ease, researchers say. A team from the University of Edinburgh has shown that state-of-the-art AI models are unable to reliably interpret clock-hand positions or correctly answer questions about dates on calendars. Unlike simply recognizing shapes, understanding analog clocks and calendars requires a combination of spatial awareness, context and basic math -- something that remains challenging for AI, the team says. Overcoming this could enable AI systems to power time-sensitive applications like scheduling assistants, autonomous robots and tools for people with visual impairments, researchers say. The team tested if AI systems that process text and images -- known as multimodal large language models (MLLMs) -- can answer time-related questions by looking at a picture of a clock or a calendar. Researchers tested various clock designs, including some with Roman numerals, with and without second hands, and different colored dials. Their findings show that AI systems, at best, got clock-hand positions right less than a quarter of the time. Mistakes were more common when clocks had Roman numerals or stylized clock hands. AI systems also did not perform any better when the second hand was removed, suggesting there are deep-seated issues with hand detection and angle interpretation, the team says. The researchers asked AI models to answer a range of calendar-based questions, such as identifying holidays and working out past and future dates. The team found that even the best-performing AI model got date calculations wrong one-fifth of the time. The findings are reported in a peer-reviewed paper that will be presented at the Reasoning and Planning for Large Language Models workshop at The Thirteenth International Conference on Learning Representations (ICLR) in Singapore on 28 April 2025. Rohit Saxena, of the University of Edinburgh's School of Informatics, who led the study, said, "Most people can tell the time and use calendars from an early age. Our findings highlight a significant gap in the ability of AI to carry out what are quite basic skills for people. These shortfalls must be addressed if AI systems are to be successfully integrated into time-sensitive, real-world applications, such as scheduling, automation and assistive technologies." Aryo Gema, also of the School of Informatics, said, "AI research today often emphasizes complex reasoning tasks, but ironically, many systems still struggle when it comes to simpler, everyday tasks. Our findings suggest it's high time we addressed these fundamental gaps. Otherwise, integrating AI into real-world, time-sensitive applications might remain stuck at the eleventh hour."
[5]
AI still can't do 'basic tasks' such as tell the time or understand a calendar
The team tested whether AI systems that process text and images - known as multimodal large language models (MLLMs) - can answer time-related questions by looking at a picture of a clock or a calendar. They looked at various clock designs, including some with Roman numerals, with and without second hands, and different coloured dials. Their findings show that AI systems, at best, got clock-hand positions right less than a quarter of the time. Mistakes were more common when clocks had Roman numerals or stylised clock hands. AI systems also did not perform any better when the second hand was removed, suggesting there are deep-seated issues with hand detection and angle interpretation, the team says. The researchers asked AI models to answer a range of calendar-based questions, such as identifying holidays and working out past and future dates. The team found that even the best-performing AI model got date calculations wrong one-fifth of the time. 'Significant gap in ability' Rohit Saxena, of the University of Edinburgh's School of Informatics, who led the study, said there was a "significant gap in the ability of AI to carry out what are quite basic skills for people". "Most people can tell the time and use calendars from an early age," she said. "These shortfalls must be addressed if AI systems are to be successfully integrated into time-sensitive, real-world applications, such as scheduling, automation and assistive technologies." Aryo Gema, also of the School of Informatics, said AI "emphasises complex reasoning tasks, but ironically, many systems still struggle when it comes to simpler, everyday tasks". "Our findings suggest it's high time we addressed these fundamental gaps. Otherwise, integrating AI into real-world, time-sensitive applications might remain stuck at the eleventh hour." The findings are reported in a peer-reviewed paper that will be presented at the Reasoning and Planning for Large Language Models workshop in Singapore on April 28.
[6]
You'll Laugh at This Simple Task AI Still Can't Do
Most human children learn how to tell time around ages six and seven -- but artificial intelligence still, apparently, can't parse a clock face. Researchers from Scotland's University of Edinburgh have found that AI models that can process text and images -- otherwise known as multimodal large language models, or MLLMs -- could only read analog clock faces a pitiful 25 percent of the time. In a paper that's awaiting peer review, the AI informatics researchers explained that Google's Gemini was the "best" of the crop when they tested out MLLMs from that company, OpenAI, Anthropic, and others to see how well they could read clock faces and yearly calendars. As they soon found, all of the models they testedseemed to be challenged by the "combination of spatial awareness, context, and basic math" required to read time and dates. "Researchers tested various clock designs, including some with Roman numerals, with and without second hands, and different [colored] dials," the statement expounded. "Their findings show that AI systems, at best, got clock-hand positions right less than a quarter of the time. Mistakes were more common when clocks had Roman numerals or [stylized] clock hands." When testing out how well the MLLMs handled calendars -- specifically, ten years of the large annual kind, which show all 12 months of the year on one page -- the researchers found that they were slightly better at reading dates than times, but only slightly. GPT-o1, the first generation of OpenAI's reasoning models, ended up scoring the highest on the calendar challenge by getting the date questions right 80 percent of the time. Still, it answered one-fifth of the questions put to it -- such as "Which day of the week is New Year's Day?" or "What is the 153rd day of the year?" -- incorrectly. Rohit Saxena, the study's lead author, said in the school's press release that although "most people can tell the time and use calendars from an early age," AI seems, per the new research, to struggle to "carry out what are quite basic skills for people." "These shortfalls must be addressed if AI systems are to be successfully integrated into time-sensitive, real-world applications," Saxena said, "such as scheduling, automation and assistive technologies." As New Scientist reported more than three years ago, Oxford researchers found that when they trained their own AI model on analog clock faces and their correct readings, it was able to accurately tell the time between 74 and 84 percent of the time. The tension illustrates the current situation of AI: it can often ace difficult questions in heady domains like math and the law, but simultaneously continues to struggle with tasks as basic as telling the time. Look no further than the tech giant Apple, which was forced to push back its ambitious plans to integrate AI into its voice assistant Siri this month. An AI that can respond to virtually any query makes a great tech demo, but if it struggles to set an alarm or schedule an appointment, you're going to have a lot of disappointed users on your hands -- even at well-funded companies like OpenAI, Apple, and Google. More on AI fails: Study Finds That AI Search Engines Are Wrong an Astounding Proportion of the Time
Share
Share
Copy Link
A study by University of Edinburgh researchers shows that advanced AI models have difficulty interpreting analog clocks and calendars, highlighting a significant gap in AI capabilities for everyday tasks.
A recent study conducted by researchers at the University of Edinburgh has revealed a surprising limitation in advanced artificial intelligence (AI) systems: they struggle to perform basic time-telling tasks that most humans learn at an early age. The study, led by Rohit Saxena from the School of Informatics, tested various state-of-the-art AI models on their ability to interpret analog clocks and calendars 1.
The research team evaluated several multimodal large language models (MLLMs), including systems from Google DeepMind, Anthropic, Meta, Alibaba, ModelBest, and OpenAI. These AI models were presented with images of different clock designs, including those with Roman numerals, varying dial colors, and with or without second hands 2.
The results were striking:
The study also tested the AI models' ability to answer calendar-based questions, such as identifying holidays and calculating dates. Even the best-performing AI model made errors in date calculations 20% of the time 4.
This research highlights a significant gap between AI's capabilities in complex tasks and its struggles with everyday skills that humans often take for granted. Aryo Gema, another researcher involved in the study, noted:
"AI research today often emphasizes complex reasoning tasks, but ironically, many systems still struggle when it comes to simpler, everyday tasks. Our findings suggest it's high time we addressed these fundamental gaps." 5
The ability to interpret time from visual inputs is crucial for many real-world applications, including:
Overcoming these limitations could significantly enhance AI's integration into time-sensitive, real-world applications. However, the current shortfalls present a notable obstacle to achieving this goal 1.
The findings of this study will be presented at the Reasoning and Planning for Large Language Models workshop at The Thirteenth International Conference on Learning Representations (ICLR) in Singapore on April 28, 2025, highlighting the importance of addressing these fundamental gaps in AI capabilities.
Reference
[1]
[3]
[4]
A recent study by Apple researchers exposes significant flaws in the mathematical reasoning capabilities of large language models (LLMs), challenging the notion of AI's advanced reasoning skills and raising questions about their real-world applications.
17 Sources
17 Sources
Apple researchers conducted tests revealing significant limitations in AI models' ability to perform simple arithmetic and logical reasoning, raising questions about the true intelligence of current AI systems.
2 Sources
2 Sources
The Arc Prize Foundation introduces ARC-AGI-2, a challenging new test for artificial general intelligence that current AI models, including those from OpenAI and Google, are struggling to solve. The benchmark emphasizes efficiency and adaptability, revealing limitations in current AI capabilities.
5 Sources
5 Sources
A study by USC researchers reveals that AI models, particularly open-source ones, struggle with abstract visual reasoning tasks similar to human IQ tests. While closed-source models like GPT-4V perform better, they still fall short of human cognitive abilities.
4 Sources
4 Sources
A new study from Johns Hopkins University shows that current AI models struggle to interpret social dynamics and context in video clips, highlighting a significant gap between human and machine perception of social interactions.
4 Sources
4 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved