GPT-5.2 Still Can't Solve the Strawberry Question

GPT-5.2 Fails Basic Letter Counting Despite Advanced Capabilities

OpenAI's GPT-5.2, released in December 2025, demonstrates a puzzling weakness that highlights the persistent limitations of large language models. Despite billions of dollars in investment and the ability to generate marketing images, compile reports, and create chart-topping songs, ChatGPT powered by GPT-5.2 incorrectly counts 'r's in strawberry1

. The word contains three r's—one after the 't' and two consecutive in the 'berry' portion—yet the model consistently reports only two2

Source: MakeUseOf

This ChatGPT counting error persists as a test for AI performance across multiple model iterations. Previous versions exhibited uncertainty or erratic behavior on the strawberry question, but the latest model delivers a direct answer of two without deviation1

. The outcome remains unchanged despite elevated hardware demands that have pushed RAM prices higher and substantial global water consumption linked to training infrastructure2

Tokenization Explains the Inability to Accurately Count Specific Letters

The root cause lies in the tokenized input/output design that defines how large language models process text. When users input "strawberry," ChatGPT doesn't process individual letters S-T-R-A-W-B-E-R-R-Y. Instead, tokenization breaks text into chunks called tokens—whole words, syllables, or word parts2

. The model counts tokens containing the letter rather than performing precise letter enumeration1

The OpenAI Tokenizer tool illustrates this process clearly. Entering "strawberry" yields three tokens: st, raw, berry. The first token contains no r, the second includes one r, and the third contains two r's but functions as a single token1

. The model associates r's with only two tokens, leading to the incorrect count. This tokenization pattern affects similar words—raspberry divides into comparable tokens, causing ChatGPT to report two r's for that word as well1

GPT-5.2 incorporates the o200k_harmony tokenization method, first introduced with OpenAI o4-mini and GPT-4o models. This updated scheme aims for efficiency but retains the strawberry discrepancy1

. ChatGPT operates as a prediction engine, leveraging patterns from training data to anticipate subsequent elements rather than functioning with true intelligence2

OpenAI Has Fixed Other Tokenization Issues But Core Problems Remain

When ChatGPT launched in late 2022, it was riddled with token-based challenges. Specific phrases triggered excessive responses or processing failures1

. OpenAI addressed many through training adjustments and system enhancements over subsequent years. Verification tests on classic problems showed improvements—ChatGPT now accurately spells Mississippi, identifying one m, four i's, four s's, and two p's. It also reverses "lollipop" to "popillol," preserving all letters in proper sequence1

However, tokenization issues persist in unexpected ways. A notable historical example involves solidgoldmagikarp, a string that disrupted tokenization in GPT-3, causing erratic outputs including user insults and unintelligible text1

. When queried about this phrase, GPT-5.2 produced a hallucination, describing it as a secret Pokémon joke embedded in GitHub repositories that transforms avatars and icons into Pokémon-themed elements—a claim completely lacking basis in reality1

Competing AI Models Demonstrate Superior Letter Counting Through Different Approaches

Comparative tests across other AI models yielded correct results for the strawberry question, revealing that this limitation isn't universal. Perplexity, Claude, Grok, Gemini, Qwen, and Copilot all correctly identified three r's in strawberry1

. These models employ distinct tokenization systems that enable accurate letter identification, even when some are powered by OpenAI's underlying architectures1

This discrepancy matters for users relying on AI for tasks requiring precision. While large language models excel at pattern recognition and complex outputs, they demonstrate persistent limitations in exact counting of small quantities1

. They perform well in mathematics and problem-solving but falter on precise tallies of letters or words in brief strings1

. Understanding these fundamental constraints helps users make informed decisions about when to trust AI performance and when human verification remains essential.

GPT-5.2 still can't solve the infamous strawberry question despite billions in AI investment

GPT-5.2 Fails Basic Letter Counting Despite Advanced Capabilities

Tokenization Explains the Inability to Accurately Count Specific Letters

OpenAI Has Fixed Other Tokenization Issues But Core Problems Remain

Competing AI Models Demonstrate Superior Letter Counting Through Different Approaches

References

GPT-5.2 still counts two r's in strawberry

ChatGPT still can't answer this simple question

Related Stories

OpenAI's Strawberry Model: A New Era in AI Reasoning and Capabilities

OpenAI to Launch 'Strawberry': A New AI Model for ChatGPT

OpenAI's Project Strawberry: A Leap Forward in AI Logical Reasoning

Recent Highlights

Apple Plans Major Siri AI Overhaul in iOS 27 With Third-Party Chatbot Integration

OpenAI shuts down Sora after six months, ending Disney's $1 billion licensing partnership

AI chatbots validate you too much, making you less kind to others, Stanford study reveals

Recent Highlights

Today's Top Stories

Recruiters declare job interviews AI-free zones as candidates use bots to answer live questions

Eli Lilly signs $2.75 billion AI drug development deal with Insilico Medicine

Arm unveils first in-house AI chip, securing Meta and OpenAI as early partners for data centers

Pentagon Official's Millions in Rival AI Firm Raises Questions in Anthropic Contract Dispute