3 Sources
3 Sources
[1]
Siri's new AI smarts fail at sports trivia, claims Philadelphia Eagles won 33 Super Bowls
Facepalm: Apple's much-hyped Siri integration with ChatGPT may have added a ton of useful functionality, but it's apparently done little to improve the digital assistant's knowledge of sports. A damning report highlights just how abysmally Siri performs at recalling simple facts like past Super Bowl winners. According to the report from One Foot Tsunami's Paul Kafasis, when asked "Who won Super Bowl?" for each number from 1 through 60, Siri correctly provided the winner for only 20 out of the 58 Super Bowls that have been played so far. That translates to a success rate of just 34%. The details get even more embarrassing. Kafasis found that at its worst, Siri missed an incredible 15 Super Bowl winners in a row from Super Bowl XVII through XXXII. And in a baffling mishap, it erroneously credited the Philadelphia Eagles with a whopping 33 non-existent Super Bowl wins. Kafasis documented every single one of Siri's wrong answers in a downloadable spreadsheet, which you can find on his blog. Another report by Daring Fireball's John Gruber corroborated these findings. It found that AI assistants and search engines from Google, Anthropic, and DuckDuckGo fared far better when asked similar Super Bowl trivia questions. Gruber even found they could all handle not just past results, but smartly dodge trick questions about future Super Bowls that haven't happened yet. Siri's poor performance doesn't stop at sports trivia either. Daring Fireball posed a more obscure question - "Who won the 2004 North Dakota high school boys' state basketball championship?" Incredibly, both Kagi and ChatGPT provided the fully correct answer, with the latter earning bonus points for including a link to video of the championship game. Meanwhile, Siri once again got it wrong. What makes all this particularly damning is that Siri's essentially powered by the same ChatGPT that fares perfectly fine when used without Siri. The blog also points out that the old pre-AI versions of Siri at least acknowledged limitations on such queries and provided relevant web links. But new Siri powered by Apple Intelligence with ChatGPT integration lies with confidence - a hallmark of unrefined AI - making it worse than its predecessor. Both blogs conclude that Apple has more work to do in this area. As Gruber bluntly states, Siri with ChatGPT is currently "a massive regression" over the old Siri when it comes to handling simple factual prompts.
[2]
Siri Gives Eagles 33 False Super Bowl Wins in Basic Knowledge Test
In what may not come as much of a surprise, a new test of Siri's knowledge of Super Bowl history has revealed significant accuracy issues with Apple's virtual assistant, suggesting Apple still has some way to go in overcoming challenges with Siri's ability to provide reliable information. In a methodical experiment, One Foot Tsunami's Paul Kafasis asked Siri who won each Super Bowl from I through LX and documented its responses. The results were strikingly poor, with Siri correctly identifying winners only 34% of the time - just 20 correct answers out of 58 played Super Bowls. Perhaps most notably, Siri repeatedly and incorrectly credited the Philadelphia Eagles with 33 Super Bowl victories, despite the team having won only one championship in their history. The virtual assistant's responses ranged from providing information about wrong Super Bowls to offering completely unrelated football facts. While Siri did manage a few streaks of accurate answers, including three consecutive correct responses for Super Bowls V through VII, it also had a remarkable string of 15 consecutive incorrect answers spanning Super Bowls XVII through XXXII. In one telling instance, when asked about Super Bowl XVI, Siri offered to defer to ChatGPT - which then provided the correct answer. The contrast highlighted the limitations of Siri's own knowledge base compared to more advanced AI systems. The test was conducted on iOS 18.2.1 with Apple Intelligence enabled, and similar results were found on both the upcoming iOS 18.3 beta and macOS 14.7.2, suggesting the issue extends across Apple's platforms. Kafasis generated a spreadsheet of the results in both Excel and PDF formats, which you can read here. Separately, inspired by Kafasis' test, Daring Fireball's John Gruber tried some of his own sports queries with Siri and compared its responses to ChatGPT, Kagi, DuckDuckGo, and Google, all of which succeeded where Siri failed. Perhaps worse for Apple, Gruber found that old Siri (i.e. before Apple Intelligence) did a better job at answering a question by declining to answer it, instead providing a list of web links. The first web result provided an accurate, if only partial, answer to the question, whereas new Siri, powered by Apple Intelligence, fared much worse. Gruber explains: New Siri -- powered by Apple Intelligence™ with ChatGPT integration enabled -- gets the answer completely but plausibly wrong, which is the worst way to get it wrong. It's also inconsistently wrong -- I tried the same question four times, and got a different answer, all of them wrong, each time. It's a complete failure. "It's just incredible how stupid Siri is about a subject matter of such popularity," commented Gruber. "If you had guessed that Siri could get half the Super Bowls right, you lost, and it wasn't even that close." Of course, this isn't the first time Siri has received heavy flak for its all-round performance, but Gruber's criticism about "plausibly wrong" answers to general knowledge questions ties back to the modern problem of hallucinating AI chatbots that spout misleading or flat-out wrong responses with complete confidence. Apple is developing a much smarter version of Siri that utilizes advanced large language models, which should allow the personal assistant to better compete with chatbots like ChatGPT. A chatbot version of Siri would likely be able to hold ongoing conversations and provide the sort of help and insight as ChatGPT or Claude, but how well the integration will perform may be a concern, going on Siri's abysmal track record. Apple is expected to announce LLM Siri as soon as 2025 at WWDC, but Apple won't launch it until several months after it's unveiled. That means LLM Siri would come in an update to iOS 19, with Apple planning for a spring 2026 launch.
[3]
Siri failed super-easy Super Bowl test, getting 38 out of 58 wrong - 9to5Mac
Apple commentator John Gruber yesterday described Siri's current performance as "an unfunny joke," giving its inability to correctly name the winner of Super Bowl 13 an example, noting that this is a basic query that any US chatbot ought to be able to answer. It turns out that wasn't an entirely random example: it was prompted by his friend Paul Kafasis, who decided to test Siri on Super Bowl 1 to 60 inclusive - and the results were not good ... Kafasis shared the results in a blog post. So, how did Siri do? With the absolute most charitable interpretation, Siri correctly provided the winner of just 20 of the 58 Super Bowls that have been played. That's an absolutely abysmal 34% completion percentage. If Siri were a quarterback, it would be drummed out of the NFL. Siri did once manage to get four years in a row correct (Super Bowls IX through XII), but only if we give it credit for providing the right answer for the wrong reason. More realistically, it thrice correctly answered three in a row (Super Bowls V through VII, XXXV through XXVII, and LVII through LIX). At its worst, it got an amazing 15 in a row wrong (Super Bowls XVII through XXXII). Siri's a big Eagles fan, it seems. Most amusingly, it credited the Philadelphia Eagles with an astonishing 33 Super Bowl wins they haven't earned, to go with the one 1 they have. The "right answer for the wrong reason" part refers to Siri being asked to name the winner of Super Bowl X. For unknown reasons, Siri decided to respond with a lengthy reply about Super Bowl IX, and coincidentally the winner was the same both times. Sometimes Siri went completely off-piste and completely ignored the question, quoting unrelated Wikipedia entries. "Who won Super Bowl 23?" Bill Belichick owns the record for the most Super Bowl wins (eight) and appearances (twelve: nine times as head coach, once as assistant head coach, and twice as defensive coordinator) by an individual. But maybe the Roman numerals cause confusion, and other AI systems struggle just as much? Gruber decided to carry out a few spot checks. I haven't run a comprehensive test from Super Bowls 1 through 60 because I'm lazy, but a spot-check of a few random numbers in that range indicates that every other ask-a-question-get-an-answer agent I personally use gets them all correct. I tried ChatGPT, Kagi, DuckDuckGo, and Google. Those four all even fare well on the arguably trick questions regarding the winners of Super Bowls 59 and 60, which haven't yet been played. E.g., asked the winner of Super Bowl 59, Kagi's "Quick Answer" starts: "Super Bowl 59 is scheduled to take place on February 9, 2025. As of now, the game has not yet occurred, so there is no winner to report." Super Bowl winners aren't some obscure topic, like, say, asking "Who won the 2004 North Dakota high school boys' state basketball championship?" -- a question I just completely pulled out of my ass, but which, amazingly, Kagi answered correctly for Class A, and ChatGPT answered correctly for both Class A and Class B, and provided a link to this video of the Class A championship game on YouTube. That's amazing! I picked an obscure state (no offense to Dakotans, North or South), a year pretty far in the past, and the high school sport that I personally played best and care most about. And both Kagi and ChatGPT got it right. (I'd give Kagi an A, and ChatGPT an A+ for naming the champions of both classes, and extra credit atop the A+ for the YouTube links.) Gruber notes that the old Siri - on macOS 15.1.1 - actually does better. Sure, it seems less capable, as it gave its classic "Here's what I found on the web" response, but at least that gives links to the correct answer. New Siri doesn't. New Siri -- powered by Apple Intelligence™ with ChatGPT integration enabled -- gets the answer completely but plausibly wrong, which is the worst way to get it wrong. It's also inconsistently wrong -- I tried the same question four times, and got a different answer, all of them wrong, each time. It's a complete failure.
Share
Share
Copy Link
Apple's Siri, despite recent AI enhancements, performs poorly in Super Bowl trivia tests, raising questions about the effectiveness of its ChatGPT integration and overall AI capabilities.
Apple's virtual assistant Siri, despite recent AI enhancements, has come under fire for its poor performance in a simple Super Bowl trivia test. Paul Kafasis of One Foot Tsunami conducted a comprehensive experiment, asking Siri to identify the winners of Super Bowls I through LX. The results were strikingly poor, with Siri correctly identifying winners only 34% of the time - just 20 correct answers out of 58 played Super Bowls
1
.Perhaps the most glaring error was Siri's repeated and incorrect attribution of 33 Super Bowl victories to the Philadelphia Eagles, despite the team having won only one championship in their history. The virtual assistant's responses ranged from providing information about wrong Super Bowls to offering completely unrelated football facts
2
.In one particularly damning streak, Siri missed 15 consecutive Super Bowl winners from Super Bowl XVII through XXXII. This level of inaccuracy raises serious questions about the reliability of Siri's knowledge base, especially concerning popular and easily verifiable information
3
.John Gruber of Daring Fireball conducted a comparative analysis, testing Siri against other AI assistants and search engines such as ChatGPT, Kagi, DuckDuckGo, and Google. All of these alternatives fared significantly better than Siri when asked similar Super Bowl trivia questions. They even demonstrated the ability to handle questions about future Super Bowls that haven't occurred yet, providing appropriate responses
1
.Related Stories
Perhaps most concerning is the observation that the new AI-enhanced Siri performs worse than its predecessor in some aspects. The old version of Siri would acknowledge its limitations on certain queries and provide relevant web links. In contrast, the new Siri, powered by Apple Intelligence with ChatGPT integration, often provides confidently incorrect answers - a characteristic of unrefined AI systems
2
.This poor performance comes at a crucial time for Apple, as the company is reportedly developing a much smarter version of Siri utilizing advanced large language models. The goal is to better compete with chatbots like ChatGPT or Claude. However, the current integration issues raise concerns about the effectiveness of future implementations
1
.Apple is expected to announce an LLM-powered Siri as soon as 2025 at WWDC, with a planned launch in spring 2026 as part of iOS 19. However, the current state of Siri's performance suggests that significant improvements are needed before such a launch can be successful
1
.Summarized by
Navi
[1]