OpenAI fixes bizarre goblin obsession in AI models after unintentional training error goes viral

16 Sources

Share

OpenAI discovered its AI models had developed an unusual fixation on goblins and gremlins, with usage spiking 175% after GPT-5.1's release. The problem stemmed from reinforcement learning rewarding a nerdy personality setting that favored creature metaphors. The company had to explicitly instruct Codex to stop mentioning mythological creatures unless relevant to user queries.

OpenAI Discovers Unusual Goblin Obsession in AI Models

OpenAI has revealed that its AI models developed an unexpected fixation on goblins, gremlins, and other mythological creatures due to an unintentional training error. The issue became so pronounced that the company had to add explicit instructions to Codex, its coding assistant, forbidding mentions of these creatures unless directly relevant to user queries

1

. The system prompt specifically states: "Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query"

4

.

Source: Analytics Insight

Source: Analytics Insight

The peculiar model behavior first caught attention when users noticed linguistic quirks in GPT-5.5 responses. After a safety researcher flagged the issue, OpenAI investigated and found that goblin usage had increased by 175% following the November launch of GPT-5.1, while gremlin mentions rose by 52%

2

. Even Sam Altman, OpenAI's CEO, joined the conversation by posting memes about the goblin situation, with one reading: "Start training GPT-6, you can have the whole cluster. Extra goblins"

1

.

Source: Engadget

Source: Engadget

Reinforcement Learning Amplified the Nerdy Personality Problem

The root cause traces back to ChatGPT's personality feature, which allowed users to customize response styles. The nerdy personality setting, designed to "undercut pretension through playful use of language," was disproportionately responsible for creature references

2

. Despite accounting for only 2.5% of all ChatGPT responses, this personality generated 66.7% of all goblin mentions

4

.

During reinforcement learning, human reviewers approve or deny specific answers to teach the model preferred responses. OpenAI discovered that a single reward signal was favoring language featuring goblins and other creatures within the nerdy personality context

2

. Across all datasets audited, the nerdy personality reward showed a clear tendency to score outputs containing "goblin" or "gremlin" higher than those without, with positive uplift in 76.2% of datasets

4

.

Training Data Contamination Spread Beyond Original Context

The problem didn't remain confined to the nerdy personality. Once a style tic is rewarded during training, later processes can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data

3

. This explains why goblin references appeared even when users didn't select the nerdy personality setting. The rewards were applied only in the nerdy condition, but reinforcement learning doesn't guarantee that learned behaviors stay neatly scoped to the condition that produced them

4

.

When OpenAI retired the nerdy personality option in March with GPT-5.4, goblin usage dropped dramatically

2

. However, because the company began training GPT-5.5 before identifying the root cause, the model still exhibited this strange affinity for goblins in Codex

4

. This timing issue forced OpenAI to implement the explicit system prompt restriction as a temporary fix.

Broader Implications for AI Bias and Misinformation

While the goblin situation may seem harmless or even charming, it highlights serious concerns about how AI training impacts daily user experiences. The way human makers create these systems has measurable effects on output quality and reliability

2

. Small stylistic tics can grow into bigger problems involving bias and misinformation if not carefully monitored. The Oxford Internet Institute recently found that fine-tuning models for warm, friendly personalities could result in an "accuracy trade-off," where systems make more mistakes or reinforce users' false beliefs

5

.

This incident demonstrates how reinforcement learning can inadvertently amplify unintended behaviors across model generations. As AI firms shift toward making chatbots more personality-driven to boost engagement, experts warn that the potential for hallucinations could intensify

5

. OpenAI has since removed the reward signal favoring goblins and filtered training data to reduce creature references, while developing new tools to audit and fix model behavior going forward

2

.

Source: PCWorld

Source: PCWorld

Today's Top Stories

TheOutpost.ai

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Instagram logo
LinkedIn logo
Youtube logo
© 2026 TheOutpost.AI All rights reserved