ChatGPT Fixes Strawberry Test But AI Reasoning Gaps Persist

ChatGPT Finally Solves the Strawberry Problem

OpenAI proudly announced that ChatGPT can now correctly answer one of its most embarrassing failures: counting how many "r"s in strawberry1

. For years, the AI chatbot would confidently provide incorrect answers to this simple ChatGPT letter counting task, often claiming the word contained fewer than three instances of the letter. The official ChatGPTapp account on X declared "at long last" the problem was solved, alongside another notorious stumbling block called the car wash problem2

The Cranberry Test Exposes Deeper Issues

Within hours of OpenAI's victory lap, users discovered the fix wasn't as comprehensive as claimed. When asked the same question about cranberry, ChatGPT repeatedly responded with "The word 'cranberry' has 1 'R'"—an obviously incorrect answer for a word containing three instances of the letter1

. X user @NathanEspinoza_ quickly posted evidence of this failure, demonstrating that the AI reasoning improvements appeared superficial. When tested on GPT-5.5, ChatGPT provided yet another wrong answer, claiming cranberry contained two "r"s before admitting its counting error when challenged2

. The inconsistency suggests OpenAI deployed a hardcoded solution for the specific strawberry query rather than addressing the underlying AI logic gaps.

Why Language Models Struggle With Simple Tasks

The persistent failures reveal fundamental limitations in how large language models process information. LLMs like ChatGPT are built on transformers that convert words into numerical representations capturing meaning and context, but they don't inherently preserve a clear sense of individual letters that make up words2

. This architectural design makes letter-counting tasks surprisingly difficult despite the models' ability to handle complex equations and sophisticated reasoning. The confident mistakes that emerge from this limitation represent one of the most frustrating aspects of AI chatbots—they deliver wrong information with unwavering certainty, and when challenged, may continue defending incorrect responses1

Car Wash Problem Shows Mixed Results Across AI Platforms

OpenAI also claimed ChatGPT now solves another AI reasoning test: the car wash problem, which asks whether you should walk or drive to a car wash 50 meters away. Most AI models recommend walking, missing the obvious contextual understanding issue that you need the car with you to wash it1

. Testing revealed inconsistent performance across platforms. ChatGPT on GPT-5.5 and Claude using Sonnet 4.6 still recommended walking, while Gemini correctly identified that although walking would be quicker, you'd need to bring the car. Grok performed best, not only flagging the issue but noting the question has become a popular test for whether AI grasps actual goals versus providing generic advice that ignores context2

What This Means for AI Development

The strawberry and cranberry debacle raises critical questions about whether AI systems are genuinely improving or simply memorizing answers to specific tests. Hardcoded solutions in AI chatbots aren't new, but OpenAI touting this "fix" while the root problem clearly remains highlights a concerning pattern in how progress gets measured and communicated1

. For users relying on these tools for accurate information, the frequency of confident mistakes remains a significant danger, especially given the substantial resources AI development consumes. The challenge for developers is whether they can address fundamental architectural limitations in language models or if they'll continue patching individual test cases while deeper reasoning flaws persist. As these models become more integrated into daily workflows, the gap between performance on complex tasks and failure on simple logical questions demands attention from both OpenAI and the broader AI industry.

ChatGPT passes strawberry test but AI reasoning flaws remain as cranberry exposes the same problem

ChatGPT Finally Solves the Strawberry Problem

The Cranberry Test Exposes Deeper Issues

Why Language Models Struggle With Simple Tasks

Car Wash Problem Shows Mixed Results Across AI Platforms

What This Means for AI Development

References

ChatGPT finally knows how many 'R's are in 'strawberry,' but confident mistakes remain

ChatGPT just announced it can finally pass the simple 'how many "r"s in strawberry' test, but users are still tripping it up by switching to 'cranberry'

Related Stories

GPT-5.2 still can't solve the infamous strawberry question despite billions in AI investment

OpenAI's Strawberry Model: A New Era in AI Reasoning and Capabilities

OpenAI's GPT-5 Launch Marred by Embarrassing Errors, Raising Questions About AI Progress

Recent Highlights

Anthropic overtakes OpenAI as most valuable AI startup with $965 billion valuation

Pope Leo XIV releases major AI encyclical calling for 'disarmament' of artificial intelligence

Apple's Siri overhaul for iOS 27 brings Gemini integration and standalone app to compete with ChatGPT

Recent Highlights

Today's Top Stories

OpenAI's AI model disproved an 80-year-old math problem, leaving mathematicians questioning their future

NVIDIA unveils Isaac GR00T humanoid robot platform for researchers with Unitree and Sharpa

Microsoft Surface Laptop Ultra debuts with Nvidia RTX Spark chip and 128GB unified memory

US moves to halt Nvidia AI chip shipments to Chinese firms outside China after year-long loophole