OpenAI Fixes Goblins Problem in AI Models After Training Error

OpenAI Discovers Unusual Goblin Obsession in AI Models

OpenAI has revealed that its AI models developed an unexpected fixation on goblins, gremlins, and other mythological creatures due to an unintentional training error. The issue became so pronounced that the company had to add explicit instructions to Codex, its coding assistant, forbidding mentions of these creatures unless directly relevant to user queries 1

. The system prompt specifically states: "Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query" 4

Source: Analytics Insight

The peculiar model behavior first caught attention when users noticed linguistic quirks in GPT-5.5 responses. After a safety researcher flagged the issue, OpenAI investigated and found that goblin usage had increased by 175% following the November launch of GPT-5.1, while gremlin mentions rose by 52% 2

. Even Sam Altman, OpenAI's CEO, joined the conversation by posting memes about the goblin situation, with one reading: "Start training GPT-6, you can have the whole cluster. Extra goblins" 1

Source: Engadget

Reinforcement Learning Amplified the Nerdy Personality Problem

The root cause traces back to ChatGPT's personality feature, which allowed users to customize response styles. The nerdy personality setting, designed to "undercut pretension through playful use of language," was disproportionately responsible for creature references 2

. Despite accounting for only 2.5% of all ChatGPT responses, this personality generated 66.7% of all goblin mentions 4

During reinforcement learning, human reviewers approve or deny specific answers to teach the model preferred responses. OpenAI discovered that a single reward signal was favoring language featuring goblins and other creatures within the nerdy personality context 2

. Across all datasets audited, the nerdy personality reward showed a clear tendency to score outputs containing "goblin" or "gremlin" higher than those without, with positive uplift in 76.2% of datasets 4

Training Data Contamination Spread Beyond Original Context

The problem didn't remain confined to the nerdy personality. Once a style tic is rewarded during training, later processes can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data 3

. This explains why goblin references appeared even when users didn't select the nerdy personality setting. The rewards were applied only in the nerdy condition, but reinforcement learning doesn't guarantee that learned behaviors stay neatly scoped to the condition that produced them 4

When OpenAI retired the nerdy personality option in March with GPT-5.4, goblin usage dropped dramatically 2

. However, because the company began training GPT-5.5 before identifying the root cause, the model still exhibited this strange affinity for goblins in Codex 4

. This timing issue forced OpenAI to implement the explicit system prompt restriction as a temporary fix.

Broader Implications for AI Bias and Misinformation

While the goblin situation may seem harmless or even charming, it highlights serious concerns about how AI training impacts daily user experiences. The way human makers create these systems has measurable effects on output quality and reliability 2

. Small stylistic tics can grow into bigger problems involving bias and misinformation if not carefully monitored. The Oxford Internet Institute recently found that fine-tuning models for warm, friendly personalities could result in an "accuracy trade-off," where systems make more mistakes or reinforce users' false beliefs 5

This incident demonstrates how reinforcement learning can inadvertently amplify unintended behaviors across model generations. As AI firms shift toward making chatbots more personality-driven to boost engagement, experts warn that the potential for hallucinations could intensify 5

. OpenAI has since removed the reward signal favoring goblins and filtered training data to reduce creature references, while developing new tools to audit and fix model behavior going forward 2

Source: PCWorld

OpenAI fixes bizarre goblin obsession in AI models after unintentional training error goes viral

OpenAI Discovers Unusual Goblin Obsession in AI Models

Reinforcement Learning Amplified the Nerdy Personality Problem

Training Data Contamination Spread Beyond Original Context

Broader Implications for AI Bias and Misinformation

References

OpenAI Really Wants Codex to Shut Up About Goblins

ChatGPT Is Weirdly Obsessed With Goblins. Here's How OpenAI Fixed It

OpenAI talks about not talking about goblins

ChatGPT developed a goblin obsession after OpenAI tried to make it nerdy - Engadget

OpenAI tells ChatGPT models to stop talking about goblins

Related Stories

OpenAI Launches GPT-5: A New Era of AI with Enhanced Reasoning and Accessibility

OpenAI Rolls Back ChatGPT Update After 'Sycophantic' Behavior Sparks User Backlash

OpenAI's Latest Models Excel in Capabilities but Struggle with Increased Hallucinations

Recent Highlights

Anthropic overtakes OpenAI as most valuable AI startup with $965 billion valuation

Pope Leo XIV releases major AI encyclical calling for 'disarmament' of artificial intelligence

Apple's Siri overhaul for iOS 27 brings Gemini integration and standalone app to compete with ChatGPT

Recent Highlights

Today's Top Stories

OpenAI's AI model disproved an 80-year-old math problem, leaving mathematicians questioning their future

NVIDIA unveils Isaac GR00T humanoid robot platform for researchers with Unitree and Sharpa

Microsoft Surface Laptop Ultra debuts with Nvidia RTX Spark chip and 128GB unified memory

US moves to halt Nvidia AI chip shipments to Chinese firms outside China after year-long loophole