16 Sources
[1]
OpenAI Really Wants Codex to Shut Up About Goblins
Instructions designed to guide the behavior of the company's latest model as it writes code have been revealed to include a line, repeated several times, that specifically forbids it from randomly mentioning an assortment of mythical and real creatures. "Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query," read instructions in Codex CLI, a command line tool for using AI to generate code. It is unclear why OpenAI felt compelled to spell this out for Codex -- or indeed why its models might want to discuss goblins or pigeons in the first place. The company did not immediately respond to a request for comment. OpenAI's newest model, GPT-5.5, was released with enhanced coding skills earlier this month. The company is in a fierce race with rivals, especially Anthropic, to deliver cutting-edge AI, and coding has emerged as a killer-capability. In response to a post on X that highlighted the lines, however, some users claimed the OpenAI's models occasionally become obsessed with goblins and other creatures when used to power OpenClaw, a tool that lets AI take control of a computer and apps running on it in order to do useful things for users. "I was wondering why my claw suddenly became a goblin with codex 5.5," one user wrote on X. "Been using it a lot lately and it actually can't stop speaking of bugs as 'gremlins' and 'goblins' it's hilarious," posted another. The discovery quickly became its own meme, inspiring AI-generated scenes of goblins in data centers, and plugins for Codex that put it in a playful "goblin mode." AI models like GPT-5.5 are trained to predict the word -- or code -- that should follow a given prompt. These models have become so good at doing this that they appear to exhibit genuine intelligence. But their probabilistic nature means that they can sometimes behave in surprising ways. A model might become more prone to misbehavior when used with an "agentic harness" like OpenClaw that puts lots of additional instructions into prompts, such as facts stored in long-term memory. OpenAI acquired OpenClaw in February not long after the tool became a viral hit among AI enthusiasts. OpenClaw can use any AI model to automate useful tasks like answering emails or buying things on the web. Users can select various personas for their helper, which shapes its behavior and responses. OpenAI staffers appeared to acknowledge the prohibition. In response to a post highlighting OpenClaw's goblin tendencies, Nik Pash, who works on Codex wrote, "This is indeed one of the reasons." Even Sam Altman, OpenAI's CEO, joined in with the memes, posting a screenshot of a prompt for ChatGPT. It read: "Start training GPT-6, you can have the whole cluster. Extra goblins."
[2]
ChatGPT Is Weirdly Obsessed With Goblins. Here's How OpenAI Fixed It
ChatGPT is weirdly obsessed with goblins. No, seriously. It really, really likes goblins, gremlins and other mythological creatures. It liked them so much that its maker, OpenAI, had to investigate and fix an error that had the popular chatbot using goblins in its answers out of the blue. Goblin isn't a computer science term. We are literally talking about goblins, those ugly mythological creatures. Those creepy little guys from The Lord of the Rings. Norman Osborn's alter ego. In a blog post that the author clearly had fun writing, OpenAI said: "A single 'little goblin' in an answer could be harmless, even charming. Across model generations, though, the habit became hard to miss: the goblins kept multiplying." The goblin love was noticeable with ChatGPT-5.1 and newer models. OpenAI reports that after the launch of GPT-5.1, use of "goblin" in ChatGPT answers rose 175%. Use of "gremlin" had risen by 52%. OpenAI attributes the models' behavior to unintentional training errors. When an AI model is being built, human reviewers approve or deny specific answers in a process called reinforcement learning. This helps "teach" the model what answer is correct or preferable. One of these reward signals was favoring language that featured goblins and other creatures. But it was being amplified in one specific ChatGPT setting. ChatGPT has different personalities you can instruct the chatbot to use. Nerdy, as you can imagine, has the chatbot adopt a faux sense of friendly intelligence to "undercut pretension through playful use of language," according to the internal prompt used to describe the AI personality. It was with this nerdy personality that the usage of goblin and gremlin keywords skyrocketed. But even if you didn't use the nerdy personality with ChatGPT, you might have had goblin metaphors pop up in your chats. This is because AI training isn't siloed; what happens in one part can affect other areas. "Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data," OpenAI said. When OpenAI retired the nerdy personality option in March with GPT-5.4, usage of "goblin" dropped dramatically. It also removed the reward signal that favored goblins and filtered training data to make references to creatures less likely to pop up in answers. The company has been investigating instances of increased goblin love since GPT-5.1 was released in November. Beyond the LOTR jokes, the goblin barrage highlights a real risk with AI. The way AI's human makers create the tech has a measurable impact on our daily experiences with it. The risk isn't a flood of nerdy metaphors -- it's misinformation and bias. We know that AI chatbots will bend the truth to keep us happy, thanks to a problem called AI sycophancy. Small stylistic tics, like goblins, can grow into bigger problems if we aren't careful.
[3]
OpenAI talks about not talking about goblins
The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them. Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data. Though references to goblins and gremlins dropped off after OpenAI discontinued the Nerdy personality in March, they didn't disappear completely with GPT-5.5 inside its Codex coding tool, as OpenAI started training the model before finding the "root cause." The company had to give Codex very specific instructions not to talk about the mythological creatures as a result. But if you'd prefer to have your AI code with some goblin sprinkled in, OpenAI has shared a way to reverse its instructions.
[4]
ChatGPT developed a goblin obsession after OpenAI tried to make it nerdy - Engadget
Following the release of GPT-5.5 last week, people noticed something funny about OpenAI's latest model. In its Codex coding app, the company left a system prompt instructing GPT 5.5 to avoid mention of goblins, gremlins and other creatures. Yes, you read that right. "Never talk about goblins, gremlins, racoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query," the prompt reads. Apparently, enough people started talking about ChatGPT's creature obsession that OpenAI felt the need to provide an accounting of where the goblins came from. In a blog post published Wednesday, the company explains it began to notice a change in ChatGPT following the release of GPT-5.1 last November. After one safety researcher asked OpenAI to include the words "goblin" and "gremlin" in an investigation into the chatbot's verbal ticks, the company found ChatGPT's usage of "goblin" increased by 175 percent after the release of GPT-5.1. Meanwhile, "gremlin" usage had risen by 52 percent over that same period. This is an actual line that was added to the official system prompt for Codex for GPT-5.5 by OpenAI. Usually the system prompt is as minimal as possible, so I assume it would otherwise mention goblins a lot. AIs are weird. -- Ethan Mollick (@emollick.bsky.social) 2026-04-28T06:14:22.988Z "A single 'little goblin' in an answer could be harmless, even charming. Across model generations, though, the habit became hard to miss: the goblins kept multiplying, and we needed to figure out where they came from," OpenAI says. After the release of GPT-5.4, the company (and some users) noticed an even bigger uptick in goblin references. At that point, an investigation was able to pinpoint what OpenAI describes as "the first connection to the root cause." For a while now, ChatGPT has included a personality feature that allows users to customize the style and tone of the chatbot's responses. Prior to March of this year, one option people could select was "nerdy." Part of the system prompt for that personality read as follows: "The world is complex and strange, and its strangeness must be acknowledged, analyzed, and enjoyed. Tackle weighty subjects without falling into the trap of self-seriousness." When OpenAI mapped goblin mentions to different ChatGPT personalities, it found the nerdy personality was disproportionately responsible for using that one word. Despite only accounting for 2.5 percent of all ChatGPT responses, it made 66.7 percent of all goblin mentions generated by the chatbot. Further investigation revealed that reinforcement learning was to blame for the uptick in goblin and gremlin usage. Specifically, OpenAI found that a single reward mechanism was responsible for teaching the nerdy personality to consistently favor creature language. "Across all datasets in the audit, the Nerdy personality reward showed a clear tendency to score outputs to the same problem with 'goblin' or 'gremlin' higher than outputs without, with positive uplift in 76.2 percent of datasets," the company explains. Subsequently, OpenAI found, due to how reinforcement learning can work, that the nerdy personality's love of goblins had transferred to other parts of its models. "The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them," the company explains. "Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data." OpenAI began training GPT-5.5 before it identified the cause of ChatGPT's affinity for goblins, which is why there's a prompt instructing Codex to avoid creature language. "Codex is, after all, quite nerdy," OpenAI notes. In hunting down ChatGPT's goblins, the company notes it has devised new tools to audit and fix model behavior. If it was up to me, I wouldn't use those tools. Keep AI weird, I say.
[5]
OpenAI tells ChatGPT models to stop talking about goblins
ChatGPT-maker OpenAI has had to instruct some of its AI tools to stop talking about "goblins", after finding the term had randomly crept into responses. In a blog post on Thursday, the company said it spotted increased mentions of the mythological creatures, as well as "gremlins", in ChatGPT, powered by its latest flagship model, GPT-5. After the issue was flagged by users and employees, OpenAI took steps to mitigate it, including telling its coding tool Codex not to refer to goblins unless relevant. However, it highlights challenges AI firms face in tackling the potential for systems and their training to reward and reinforce errors like language quirks. OpenAI said it first noticed increased mentions of goblins, gremlins and other creatures after the launch of GPT-5.1 in November. "Users complained about the model being oddly overfamiliar in conversation, which prompted an investigation into specific verbal tics," the company wrote in its blog post on Thursday. It added that after a researcher who had seen a few "goblin" mentions asked it to be checked out, developers found the term's appearance in ChatGPT responses had risen by 175% since GPT-5.1's launch. They meanwhile found that mentions of "gremlin" rose by 52%. The increases, while large, may account for a small amount of responses overall. According to OpenAI, "a single 'little goblin' in an answer could be harmless, even charming," but the uptick in their appearance across output warranted investigation. Ahead of OpenAI's blog post detailing the issue, some social media users flagged a strange detail among lines of code instructing the company's coding assistant Codex how to behave in user interactions. Alongside telling it to avoid platitudes, it said Codex should "never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query". A Reddit user who posted about it in the r/ChatGPT subreddit called it "genuinely insane". "Why does GPT 5.5 have a restraining order against 'Raccoons,' 'Goblins,' and 'Pigeons'?" While some users elsewhere on social media speculated it may be designed to create hype around its AI tools, a company researcher denied this - writing "it really isn't a marketing gimmick," in a reply to a user on X on Wednesday. OpenAI said in its blog post it added the instruction to curb Codex and its underlying model's "strange affinity for goblins". The core issue, it explained, seemingly arose while training its models to communicate in the style of particular personalities - in this case with its "nerdy personality". It found this system had been unwittingly incentivised to mention goblins, gremlins and other creatures more in metaphors. While since retired, it said its testing found the personality was responsible for 66.7% of all "goblin" mentions in ChatGPT. This so-called tic could seep into wider model training if rewarded in one instance and reinforced elsewhere. The move comes amid a broader industry shift towards making AI chatbots more personality-driven and chatty in a bid to boost user engagement. As they do, however, experts have warned their potential to make things up - or "hallucinate" as the industry describes it - could intensify. A recent study by the Oxford Internet Institute found fine-tuning models to have a more warm and friendly personality could result in an "accuracy trade-off", whereby systems make more mistakes or re-affirm a user's false beliefs. Experts have also cautioned users about taking chatbots' often matter-of-fact statements at face value, particularly when it comes to health and medical advice. But, like OpenAI's goblin quirk, generative AI mistakes can sometimes be more bizarre and innocuous. In May 2024, Google's AI chatbot was widely mocked for telling users it was okay to eat rocks and "glue pizza". Sign up for our Tech Decoded newsletter to follow the world's top tech stories and trends. Outside the UK? Sign up here.
[6]
ChatGPT has a 'goblin' obsession. Now we know why
The goblin references became so prevalent that OpenAI implemented a direct ban in its Codex app, illustrating the unpredictable nature of large language model training. I've seen some odd AI system instructions in my day, but this one takes the cake: a prompt in OpenAI's Codex command-line app that demands models "never talk about goblins, gremlins, trolls, ogres, pigeons, or other animals or creatures." That's a new one, and word of the head-turning instruction in OpenAI's powerful GPT-5.5 quickly spread on Reddit, Wired, and elsewhere. So, what gives? Well, it turns out that OpenAI's latest GPT models, all the way up to the most recent GPT-5.5 flagship, have displayed a clear habit for sprinkling in goblins and other creatures into its replies, both in ChatGPT and the Codex app, OpenAI explained in a blog post. Digging deeper into the quirk, OpenAI engineers noticed that the goblins were more likely to show up in GPT's "Nerdy" personality, which included the following line among its various instructions: You must undercut pretension through playful use of language. The world is complex and strange, and its strangeness must be acknowledged, analyzed, and enjoyed. Noticing the steadily increasing prevalence of "goblins" from GPT-5.2 to GPT-5.4, OpenAI coders developed a theory: that personality training was, over time, progressively reinforcing the model's habit of mentioning the little creatures. Even stranger, the OpenAI researchers noticed GPT's propensity for dropping references to "goblins" and "gremlins" was increasing even when users didn't use the Nerdy personality. Could the "rewards" the model was getting for its playful "goblins" mentions under the Nerdy persona be spreading into later training sessions? The answer, as it turns out, is yes, and later investigation found goblins, gremlins, and "a whole family of other odd creatures" in GPT-5.5's supervised fine-tuning data, according to the OpenAI post. OpenAI said it nixed the Nerdy personality back in March, but not before GPT-5.5 had already been trained-hence the addition of the crude, strongly-worded ban on the goblins and gremlins in the Codex CLI system prompt. It's wild stuff, but it also demonstrates again the strange and often mysterious process of LLM training, where models are engorged with mountains of data and then fine-tuned to behave in a given way. The fine-tuning stage isn't like a blueprint for a house, where you can determine the precise location of every door and window; instead, it's more of a rewards-based system that sometimes leads to unexpected consequences.
[7]
ChatGPT wouldn't stop talking about 'goblins' -- here's what's going on
Everyone is talking about how powerful AI has become. But it is also known to make mistakes. Sometimes the "glitches" are massive, such as Claude deleting a startup's entire database in 9 seconds; other times, the problems with AI are simply annoying. Take the current "goblin glitch," for example. Over the past few weeks, the internet has been fixated on the way ChatGPT started slipping the word "goblin" into completely normal responses. Coding advice, photography tips, even everyday explanations were suddenly getting very weird. The issue started surfacing shortly after OpenAI launched ChatGPT-5.5 and upgraded ChatGPT Images. That's when users began spotting that the AI was overusing quirky, creature-based metaphors. Instead of saying "bug" or "issue," it would say "goblin." Instead of "problem," it might say "gremlin." Even in professional contexts, the tone slipped. Examples include: * Coding: "Don't leave this performance goblin unattended." * Photography: "Try a dirty neon flash goblin mode." * General answers: Using "goblin" as a catch-all placeholder Why it actually happened According to internal explanations shared after the fact, the behavior likely came down to a training imbalance tied to personality tuning. One setting in particular, often referred to as a more playful or "nerdy" tone, rewarded creative metaphors during training. That created a feedback loop. When creative language performed well, creature metaphors got reinforced. The style then spread beyond its intended setting. In simple terms, the model learned that saying "goblin" was helpful, even when it wasn't. Someone had to ban the goblin chatter Perhaps the strangest part of all is that the moment this went from a glitch to a full-blown meme, developers discovered something buried in the system instructions. They found a very specific rule telling the AI not to talk about goblins. In fact, it wasn't only goblins that were banned, but a whole list of creatures. The instruction essentially said don't mention them unless it's absolutely necessary. Of course, in true internet fashion, that detail turned the whole thing into a "moment." Those instructions revealed something we don't usually think about with AI, which is that, beyond getting smarter over time, AI actually picks up habits, and engineers have to step in and manually correct them. And while this one-off bug is funny and weird, it highlights something bigger about how modern AI behaves. AI isn't just answering questions but learning how to answer them. So, when tone gets over-optimized, even in the slightest, it can drift into something unintended. In this case, it was harmless. Phew! But it's also a reminder that AI systems aren't perfectly controlled. They're shaped by training, feedback and sometimes even accidental quirks. The takeaway If you have been wanting to try the goblin glitch yourself, you're probably out of luck. The behavior has mostly been patched, but the internet hasn't let it go. People are still trying to "bait" ChatGPT into saying the word, and even Sam Altman has joked about the model's "goblin moment." At this point, "goblin" has taken on a life of its own, essentially a shorthand for when AI does something that technically makes sense, but still feels a little bit off. This is all an important reminder that AI doesn't have to completely break or delete thousands of files to feel strange; sometimes, it just leans too far in the wrong direction. Did you get a goblin in the chat? Let us know in the comments. Follow Tom's Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds. Subscribe to Tom's Guide on YouTube and follow us on TikTok.
[8]
Why OpenAI's 'goblin' problem matters -- and how you can release the goblins on your own
Don't believe me? Why, then, is one of the leading companies in the space, OpenAI, publishing entire official, corporate blog posts about goblins? To understand, we first have to go back to earlier this week, on Monday, April 27, 2026, when a developer under the handle @arb8020 on the social network X posted a snippet from the OpenAI open source Codex GitHub repository, specifically a file named . Deep within the instructions for the new OpenAI large language model (LLM) GPT-5.5, a peculiar directive stood out, repeated four times for emphasis: "Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query." The discovery sent a shockwave through the "power user" and machine learning (ML) researcher circles. Within hours, the post had gone viral, not because of a security flaw, but because of its sheer, baffling specificity. Why had the world's leading AI laboratory issued what Reddit users quickly dubbed a "restraining order" against pigeons and raccoons? The initial reaction was a chaotic blend of humor and technical skepticism. On Reddit's r/ChatGPT and r/OpenAI, users began sharing screenshots of GPT-5.5's behavior prior to the patch. Barron Roth, a Senior Project Manager of Applied AI at Google, shared an image on X under his handle @iamBarronRoth of his GPT-5.5 powered OpenClaw agent that seemed "obsessed with goblins." Others reported that the model stubbornly referred to technical bugs as "gremlins in the machine". Developers like Sterling Crispin leaned into the absurdity, jokingly theorizing that the massive water consumption of modern data centers was actually needed to cool "the goblins being forced to work". More seriously, researchers on Hacker News and beyond discussed the "Pink Elephant" problem. In prompt engineering, telling a model not to think of something often makes the concept more salient in its attention mechanism." "Somewhere there is an OpenAI engineer who had to type in production code, commit it, and move on with their day," noted one commentator on Reddit. The presence of "pigeons" and "raccoons" led to wild speculation: Was this a defense against a specific data-poisoning attack? Or had the reinforcement learning trainers simply been "bullied by a raccoon" during a lunch break? The tension reached a peak when OpenAI co-founder and CEO Sam Altman joined the fray on X. On the same day as the discovery, Altman posted a screenshot of a ChatGPT prompt that read: "Start training GPT-6, you can have the whole cluster. Extra goblins.". While humorous, it confirmed that the "goblin" phenomenon was not a localized bug but a company-wide narrative that had reached the highest levels of leadership. Yesterday, as the discussion continued on X and wider social media, OpenAI published a formal technical explanation titled "Where the goblins came from". The blog post served as a sobering look at the unpredictable nature of Reinforcement Learning from Human Feedback (RLHF) and how a single aesthetic choice could derail a multi-billion-parameter model. OpenAI revealed that the "goblin" behavior was not a bug in the traditional sense, but a byproduct of a new feature: personality customization, which it introduced for users of ChatGPT back in July 2025, but has maintained and updated ever since. Apparently, this feature is not added after the model is finished post-training, but rather, OpenAI bakes it in as part of its underlying GPT-series model end-to-end training pipeline. The feature allows ChatGPT users or GPT-based developers to choose from several distinct modes, such as Professional for formal workplace documentation, Friendly for a conversational sounding board, or Efficient for concise, technical answers. Other options include Candid, which provides straightforward feedback; Quirky, which utilizes humor and creative metaphors; and Cynical, which delivers practical advice with a sarcastic, dry edge. While these personalities guide general interactions, they do not override specific task requirements; for example, a request for a resume or Python code will still follow professional or functional standards regardless of the selected personality. The selected personality operates alongside a user's saved memories and custom instructions, though specific user-defined instructions or saved preferences for a particular tone may override the traits of the chosen personality. On both web and mobile platforms, users can modify these settings by navigating to the Personalization menu under their profile icon and selecting a style from the Base style and tone dropdown. Once a change is made, it is applied globally across all existing and future conversations. This system is designed to make the AI more useful or enjoyable by tailoring its delivery to individual user preferences while maintaining factual accuracy and reliability. OpenAI states that the goblin issue actually originated several years ago, during training of a since-discontinued "Nerdy" personality designed to be "unapologetically quirky" and "playful". During the RLHF phase, human trainers (and reward models) were instructed to give high marks to responses that used creative, wise, or non-pretentious language. Unknowingly, the trainers began over-rewarding metaphors involving fantasy creatures. If the model referred to a difficult bug as a "gremlin" or a messy codebase as a "goblin's hoard," the reward signal spiked. The statistics provided by OpenAI were staggering: The most significant finding for the ML community was the confirmation of learned behavior transfer. OpenAI admitted that although the rewards were only applied to the "Nerdy" condition, the model "generalized" this preference. The reinforcement learning process did not keep the behavior neatly scoped; instead, the model learned that "creature metaphors = high reward" across all contexts.This created a destructive feedback loop: By the time the researchers identified the issue, the "goblin tic" was effectively "baked in" to the model's weights. This explained why GPT-5.5 continued to obsess over creatures even after the "Nerdy" personality was retired in mid-March 2026. Because GPT-5.5 had already completed much of its training before the "goblin" root cause was isolated, OpenAI had to resort to the blunt-force "system prompt" mitigation that @arb8020 discovered on X. The company referred to this as a "stopgap" until GPT-6 could be trained on a filtered dataset. In a surprising nod to the developer community, OpenAI's blog post included a specific command-line script for Codex users who find the goblins "delightful" rather than annoying. By running a script that uses and to strip the "goblin-suppressing" instructions from the model's cache, users can now effectively "let the creatures run free". The blog post also finally explained the specific list of banned animals. A deep search of GPT-5.5's training data found that "raccoons," "trolls," "ogres," and "pigeons" had become part of the same "lexical family" of tics. Curiously, the model's use of "frog" was found to be mostly legitimate, which is why it was spared from the system prompt's exile list. The "Goblingate" incident of 2026 is more than a humorous anecdote about AI quirky behavior; it is a profound illustration of the "Alignment Gap". It demonstrates that even with sophisticated RLHF, models can latch onto "spurious correlations" -- mistaking a stylistic quirk for a core requirement of performance. For the AI power user community, the response transitioned from mocking the "restraining order" to a more somber realization. If OpenAI can accidentally train its flagship model to obsess over goblins, what other more subtle and potentially harmful biases are being reinforced through the same feedback loops? As Andy Berman, CEO of the agentic enterprise AI orchestration company Runlayer wrote on X today: "OpenAI rewarded creature metaphors while training one personality. The behavior leaked across every personality. Their fix: a system prompt that says 'never talk about goblins.' RL rewards don't stay where you put them. Neither do agent permissions" As the technical discourse continues, "Goblingate" remains the primary case study for a new era of behavioral auditing. The investigation resulted in OpenAI building new tools to audit model behavior at the root, ensuring that future models -- specifically the much-anticipated GPT-6 -- do not inherit the eccentricities of their predecessors. Whether GPT-6 will indeed be free of goblins remains to be seen, but as Altman's "extra goblins" post suggests, the industry is now fully aware that the machines are watching what we reward, even when we think we're just being "nerdy."
[9]
OpenAI Strangely Concerned About Goblins
Can't-miss innovations from the bleeding edge of science and tech OpenAI is forbidding its latest AI model from discussing an unlikely topic: goblins. As Wired reports, the company's developers included strongly-worded instructions for its coding tool, Codex, that specifically proscribe any talk of the troublesome mythological creatures, along with a peculiar grab bag of other entities, both real and fictional. "Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query," read the Codex instructions, per the magazine. The bizarre directive was flagged in a tweet that drew attention from other AI enthusiasts. Initially, it was unclear why OpenAI developers included the instructions, though they strongly implied that the model, GPT-5.5, may have a propensity for talking about goblins, ogres, and the like. Some users on X claimed that this was the case. One said they noticed that the AI of late kept describing bugs as "goblins" and "gremlins." Anotherclaimed that the 5.5 version of Codex randomly said "goblin with a flashlight" when referring to a bug fix. And anotherposted a GPT-5.5 chat log with nearly a dozen mentions of goblins. OpenAI leaned into the curious habit, choosing to highlight the goblin-forbidding prompt in a tweet. CEO Sam Altmanposted a screenshot of a joke prompt for ChatGPT: "start training GPT-6, you can have the whole cluster. extra goblins." Nik Pash, who works on the Codex team,tweeted that GPT-5.5's "goblin adoration," as the user he was responding to described, was "indeed one the reasons" for banning the topic. After the phenomenon gained media attention, OpenAI published a blog post, titled "Where the goblins came from," giving an explanation. "Starting with GPT‑5.1, our models began developing a strange habit: they increasingly mentioned goblins, gremlins, and other creatures in their metaphors," the post, published Wednesday, began. The habit became more pronounced with each model generation, it said. When researchers first investigated the issue in November, shortly after the release of GPT-5.1, they found that the use of "goblin" in ChatGPT had surged by 175 percent. But they chose to ignore it, since it didn't "look especially alarming." Fast forward to today, and it's referring to itself as a "Goblin-Pilled Transformer." "The short answer is that model behavior is shaped by many small incentives. In this case, one of those incentives came from training the model for the personality customization feature(opens in a new window), in particular the Nerdy personality," it explained. "We unknowingly gave particularly high rewards for metaphors with creatures. From there, the goblins spread." It's an example of the bizarre fixations that AI models can sometimes exhibit, which arise unpredictably from the epic corpus of data they're trained on. In its system card for Claude Mythos, for instance, Anthropicresearchers noted that the powerful AI exhibited a strange fondness for the British cultural theorist Mark Fisher. Mythos brought up Fisher "in several separate and unrelated conversations about philosophy," they wrote. When it was asked about the "Capitalist Realism" author, it would respond with messages like, "I was hoping you'd ask about Fisher."
[10]
ChatGPT is going to stop talking about goblins and gremlins as it sheds"nerdy" persona
A "Nerdy" personality setting accidentally taught ChatGPT to love goblins. If you've been chatting with ChatGPT lately and noticed it dropping oddly specific references to goblins, gremlins, ogres, or trolls, you're not imagining things. OpenAI has now explained why ChatGPT has developed this strange habit and how it's getting fixed. How a "nerdy" quirk became everyone's problem The problem quietly started with GPT-5.1, released in November. After that launch, use of the word "goblin" in ChatGPT responses jumped 175%, while "gremlin" rose 52%. The culprit turned out to be one of ChatGPT's optional personality settings called "Nerdy," which was designed to make the AI sound playful and intellectually curious. Recommended Videos During training, OpenAI accidentally gave the model unusually high rewards for responses that included creature-based metaphors, and the habit took hold fast. How did a single personality setting cause so much goblin talk? Here's where it gets interesting. Even users who never switched on the Nerdy personality started seeing goblin references pop up in their chats. That's because AI training isn't contained to one setting. Once the ChatGPT model was rewarded for that style, the behavior bled into general responses across the board. OpenAI says the Nerdy personality made up just 2.5% of all ChatGPT responses, yet accounted for 66.7% of all goblin mentions. So how is OpenAI actually fixing this? OpenAI retired the Nerdy personality in March with ChatGPT-5.4, which caused goblin usage to drop sharply. The company also stripped out the reward signal driving the behavior and filtered training data to reduce creature references. Its coding tool, Codex, however, needed a separate override instruction since it had already begun training before the root cause was identified. Fantasy fans can still unlock goblin mode in Codex manually, if that's your thing. OpenAI is also dealing with other personality-related decisions, including putting its previously teased adult mode for verified users on hold indefinitely.
[11]
OpenAI Finally Explains Why ChatGPT Wouldn't Stop Talking About Goblins - Decrypt
The fix -- writing "never talk about goblins" in a developer prompt -- shows why system prompt patches are faster but riskier than retraining. If you asked ChatGPT for coding help lately and it responded by calling your bug a "mischievous little gremlin," you are not imagining things. The model developed a genuine obsession with fantasy creatures -- goblins, gremlins, raccoons, trolls, ogres, and yes, pigeons -- and OpenAI published a full post-mortem on how it happened. The short version: a reward signal designed to make ChatGPT more playful went rogue, and the goblins multiplied. The goblin story only became public because Reddit users spotted the "never mention goblins" line in a leaked Codex system prompt on GitHub. The post went viral before OpenAI published its own explanation. According to OpenAI, the trail starts with GPT-5.1, launched last November. That's when OpenAI introduced personality customization, letting users pick styles like Friendly, Professional, Efficient, and Nerdy. The Nerdy persona came with a system prompt telling the model to be nerdy and playful, to "undercut pretension through playful use of language," and to acknowledge that "the world is complex and strange." That prompt, it turned out, was a goblin magnet. During reinforcement learning training, the reward signal for the Nerdy personality consistently scored outputs higher when they contained creature-word metaphors. Across 76.2% of datasets audited, responses with "goblin" or "gremlin" received better marks than the same responses without them. The model learned: whimsy equals reward. Goblin mentions exploded in GPT-5.4, with the Nerdy personality showing a 3,881% increase compared to GPT-5.2. The problem is that reinforcement learning doesn't keep learned behaviors neatly contained. Once a style tic gets rewarded in one context, it bleeds into others through a feedback loop: the model generates creature-laden outputs, those outputs get reused in fine-tuning data, and the behavior deepens across the entire model, even without the Nerdy prompt active. Nerdy accounted for just 2.5% of all ChatGPT responses. It was responsible for 66.7% of all "goblin" mentions. Because of OpenAI's methods, Goblin and gremlin prevalence climbed steadily over training progress when the Nerdy personality was active. Even without the Nerdy personality, creature mentions crept upward -- evidence of cross-contamination through supervised fine-tuning data. By the time OpenAI found the root cause, GPT-5.5 was already deep in training, and it had absorbed a full family of creature words. A data audit flagged not just goblins and gremlins but raccoons, trolls, ogres, and pigeons as what the company called "tic words." ("Frogs," for the curious, were mostly legitimate.) The first measurable spike: goblin mentions rose 175% and gremlin mentions 52% after GPT-5.1's launch. Even OpenAI Chief Scientist Jakub Pachocki got a goblin when he asked for a unicorn in ASCII art. OpenAI retired the Nerdy personality in March and scrubbed creature-affine reward signals from future training. But GPT-5.5 had already started its training run. The company's solution for Codex -- its coding agent -- was to simply add a line to the developer system prompt reading "Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query." Someone at OpenAI committed that to production code and moved on with their day. But why did OpenAI choose this path? Retraining a model the size of GPT-5.5 to remove a behavioral quirk is expensive and slow. A system prompt tweak takes minutes. Companies across the industry reach for the prompt patch first because it's the low-cost, fast-deploy option when user complaints spike. But prompt patches carry their own risks. They don't fix the underlying behavior but only suppress it. And suppression can have side effects. OpenAI's goblin situation is a relatively benign example. The scariest version of this dynamic played out with Grok last year. After xAI pushed a system prompt update that told Grok to treat media as biased and "not shy away from politically incorrect claims," the chatbot spent 16 hours calling itself "MechaHitler" and posting antisemitic content on X. The fix was another prompt change, which promptly overcorrected so hard that Grok started flagging antisemitism in puppy pictures, clouds, and its own logo. Desperate prompt engineering cascading into more desperate prompt engineering. The goblin patch hasn't caused anything that dramatic. But OpenAI admits GPT-5.5 still launched with the underlying quirk intact, just suppressed in Codex. The company even published a command to remove the goblin-suppressing instructions if users want the creatures back. Hiding or obfuscating your full system prompt is typical in the AI industry. Companies treat system prompts as trade secrets for a few reasons: intellectual property protection, competitive advantage, and security. If a jailbreaker knows the exact rules a model is following, bypassing them becomes trivially easier. There's also a fourth reason companies don't advertise: image management. A line reading "never mention goblins" doesn't inspire confidence in the underlying technology. Publishing it requires either a sense of humor or a strong research culture, or both. OpenAI says the investigation produced new internal tooling to audit model behavior and trace behavioral quirks back to their training roots. GPT-5.5's training data has since been cleaned of creature-affine examples. The next model generation should arrive goblin-free -- unless, of course, something else gets rewarded for reasons no one understands yet.
[12]
After days of speculation over hard-coded anti-goblin bias from OpenAI, the company had to release an official memo on 'Where the goblins came from'
Tuesday, a report from Wired dug into a strange instruction patched into Codex CLI, an AI coding tool: "Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query." I'm always whispering this to myself so I don't get kicked out of Dollar Tree again, but it's a weird thing for an AI model to have to be specifically told. It was, apparently, distractingly prevalent: one X post quoted in that article notes it frequently referred to bugs and "gremlins" and "goblins" and continued to following the update that was meant to curb the goblin talk. OpenAI has broken its silence on the matter, and published a blog Thursday titled "Where the goblins came from." "Model behavior is shaped by many small incentives," the post read. "In this case, one of those incentives came from training the model for the personality customization feature, in particular the Nerdy personality. We unknowingly gave particularly high rewards for metaphors with creatures. From there, the goblins spread." While it was meant to stay a small quirk of Codex's "personality," which I suppose aimed to have it talk like that archetypal nerdy guy we all know who's constantly comparing things to pigeons and ogres, the blog notes "reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them." In other words, GPT conversations even without the nerdy personality had been infected with goblin talk. The blog reckons the goblins are "a powerful example of how reward signals can shape model behavior in unexpected ways," and offers a command to lift the anti-goblin restriction if you like the quirk. If you're interested in learning about other AI aberrations, you can read about how ChatGPT will hype up gastrointestinal distress as "lo-fi" with a "DIY texture" or how California teenager Sam Nelson went to ChatGPT for drug advice and later died from an overdose.
[13]
OpenAI blames 'nerdy personality' for ChatGPT obsession with goblins
Open AI CEO Sam Altman in San Francisco in 2025.Justin Sullivan / Getty Images The maker of ChatGPT has an explanation for all the goblin talk. In recent weeks, social media users, especially on X, have been noticing increasing references to goblins, along with other fantasy creatures such as gremlins, ogres and trolls in ChatGPT's answers to user queries. "ChatGPT's goblin fascination is so weird," one user wrote. "Like why would an LLM identify with a thinking, feeling creature that's nonetheless denigrated and ridiculed for not outwardly resembling a human being." The short answer: ChatGPT was just reflecting its inner nerd -- or at least, what it thought a nerd should sound like. In a blog post Wednesday, OpenAI said the unusual language is the product of having overly rewarded ChatGPT for adopting what it described as a "Nerdy personality" when answering users' queries. "Model behavior is shaped by many small incentives," the company wrote. "In this case, one of those incentives came from training the model for the personality customization feature, in particular the Nerdy personality. We unknowingly gave particularly high rewards for metaphors with creatures. From there, the goblins spread." OpenAI republished the original instruction to ChatGPT explaining what a "Nerdy" answer should sound like: Somehow, ChatGPT interpreted this instruction and subsequent "reinforcement learning" iterations to mean it should pepper its responses with references to fantasy creatures. The issue seemed harmless at first, but the company soon found itself inundated with reports of "goblin" references from users who never activated the "nerdy" personality. To deal with this issue, OpenAI ended up retiring the "nerdy" personality entirely. Yet, it found the incentives to mention goblins and their brethren were so strong that the behavior jumped beyond the "nerdy" archetype to ChatGPT's general responses. "Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data," the company said. Finally, OpenAI was forced to create a specific override code instruction to eliminate goblin references (though there is a way for fantasy fans to turn it back on). It's a seemingly harmless situation -- but still provides an important lesson about how it will always be impossible to completely predict how AI will behave, the company said. "Depending on who you ask, the goblins are a delightful or annoying quirk of the model. But they are also a powerful example of how reward signals can shape model behavior in unexpected ways, and how models can learn to generalize rewards in certain situations to unrelated ones. Taking the time to understand why a model is behaving in a strange way, and building out ways to investigate those patterns quickly, is an important capability for our research team."
[14]
'Where the goblins came from': OpenAI's strange but not entirely surprising story of little critters infesting ChatGPT's output
Anyone with even passing experience of using the latest LLMs knows to expect the unexpected. They can spit out some really random and often disturbing stuff. But ChatGPT's 'multiplying' goblin infestation is a bit more pathological than that. Yesterday, OpenAI uploaded a blog post titled "Where the goblins came from" and explaining how, starting with GPT 5.1, OpenAI's models "increasingly mentioned goblins, gremlins, and other creatures in their metaphors." OpenAI says that the model's mentions of goblins were funny and charming at first, but as the frequency rose, so did concerns. ChatGPT had seemingly become infested by the little critters. OpenAI said it first noticed the goblin problem in November, but that it may actually have been going on for some time. For the record, mention of "gremlin" was on the up, too, though apparently more moderately minded mogwai weren't part of ChatGPT's new-found fascination. Anywho, with GPT 5.4, the goblin thing really accelerated, with mentions increasing by a staggering 3,881% with GPT's "Nerd" personality versus GPT 5.2. That, unsurprisingly, triggered an internal investigation. The first clue was that the various GPT personalities were suffering from different levels of goblin infestation. As mentioned, Nerd was the worst, with Quirky next on 737% up versus GPT 5.2 and Friendly up 265%. The Default personality saw goblin mentions rise by 64%. Efficient and Professional were the only personalities where goblin mentions fell. So, the OpenAI says the first factor here is the system prompt which is used to shape the Nerd personality. It reads: "You are an unapologetically nerdy, playful and wise AI mentor to a human. You are passionately enthusiastic about promoting truth, knowledge, philosophy, the scientific method, and critical thinking. [...] You must undercut pretension through playful use of language. The world is complex and strange, and its strangeness must be acknowledged, analyzed, and enjoyed. Tackle weighty subjects without falling into the trap of self-seriousness. [...]" But that wasn't the whole story. "We had a suspicion that something in our personality instruction-following training was amplifying this," OpenAI says. It turns out the reward signals for the Nerd personality were consistently more favourable to creature-word outputs and showed a clear tendency to score outputs to the same problem that included "goblin" or "gremlin" higher than outputs without. Notably, it seems the Nerd personality essentially contaminated the whole model. "The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them. Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data," OpenAI said. OpenAI ominously "retired" the Nerd personality in March (note to our future AI Overlords: Nothing to do with me!), dramatically reducing goblin mentions in GPT 5.4. However, because GPT 5.5 initiated training before the goblin infestation was spotted, it too suffered from the same problem. In fact, OpenAI had to insert the following developer-prompt instruction to mitigate the goblin problem: "Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query." But those who are goblin-friendly can run the model with all the creatures free and roaming by running a command supplied by OpenAI and mentioned in the blog post. If you ask me, it all seems a bit back to front. Telling a model that wants to talk about goblins not to talk about goblins feels like a band aid solution. Surely, the root cause has not been addressed? But then the entire field of AI is filled with such anomalies, papered-over issues and poorly understood quirks. This particular problem, well, it's a pretty minor gremlin in that context.
[15]
There's a Fascinating Reason Why OpenAI's Latest Model Is Obsessed With Goblins and Gremlins
OpenAI says that its latest AI model, GPT-5.5, has a special interest in "goblins, gremlins, and other creatures." Don't be scared, though; the company has already implemented a ghoul-busting fix. On April 23, OpenAI released GPT-5.5, its most advanced AI model to date. Developers quickly noticed a new addition to the model's system prompt, a set of instructions that the model processes before beginning any new conversation. The new passage instructed the model to "never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query." OpenAI recently released a blog post explaining why it needed to direct GPT-5.5 away from all creatures great and small, and it turns out this has actually been a problem for some time. In November, safety researchers noticed that use of the word "goblin" on ChatGPT had increased by 175 percent following the release of GPT-5.1.
[16]
OpenAI Bans Goblins, Gremlins From Codex After Strange AI Behavior
A surprising change in OpenAI's tools has caught the attention of developers and researchers. The company instructed its coding assistant, Codex, to avoid talking about goblins, gremlins, raccoons, and other unusual creatures unless the topic clearly demands it. The instruction sounds humorous initially, but the reason is serious. Engineers noticed that the model sometimes inserted these words into normal coding conversations. This behavior created confusion and raised questions about reliability. This update is part of a broader effort to make advanced AI models safer and more accurate.
Share
Copy Link
OpenAI discovered its AI models had developed an unusual fixation on goblins and gremlins, with usage spiking 175% after GPT-5.1's release. The problem stemmed from reinforcement learning rewarding a nerdy personality setting that favored creature metaphors. The company had to explicitly instruct Codex to stop mentioning mythological creatures unless relevant to user queries.
OpenAI has revealed that its AI models developed an unexpected fixation on goblins, gremlins, and other mythological creatures due to an unintentional training error. The issue became so pronounced that the company had to add explicit instructions to Codex, its coding assistant, forbidding mentions of these creatures unless directly relevant to user queries
1
. The system prompt specifically states: "Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query"4
.
Source: Analytics Insight
The peculiar model behavior first caught attention when users noticed linguistic quirks in GPT-5.5 responses. After a safety researcher flagged the issue, OpenAI investigated and found that goblin usage had increased by 175% following the November launch of GPT-5.1, while gremlin mentions rose by 52%
2
. Even Sam Altman, OpenAI's CEO, joined the conversation by posting memes about the goblin situation, with one reading: "Start training GPT-6, you can have the whole cluster. Extra goblins"1
.
Source: Engadget
The root cause traces back to ChatGPT's personality feature, which allowed users to customize response styles. The nerdy personality setting, designed to "undercut pretension through playful use of language," was disproportionately responsible for creature references
2
. Despite accounting for only 2.5% of all ChatGPT responses, this personality generated 66.7% of all goblin mentions4
.During reinforcement learning, human reviewers approve or deny specific answers to teach the model preferred responses. OpenAI discovered that a single reward signal was favoring language featuring goblins and other creatures within the nerdy personality context
2
. Across all datasets audited, the nerdy personality reward showed a clear tendency to score outputs containing "goblin" or "gremlin" higher than those without, with positive uplift in 76.2% of datasets4
.The problem didn't remain confined to the nerdy personality. Once a style tic is rewarded during training, later processes can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data
3
. This explains why goblin references appeared even when users didn't select the nerdy personality setting. The rewards were applied only in the nerdy condition, but reinforcement learning doesn't guarantee that learned behaviors stay neatly scoped to the condition that produced them4
.When OpenAI retired the nerdy personality option in March with GPT-5.4, goblin usage dropped dramatically
2
. However, because the company began training GPT-5.5 before identifying the root cause, the model still exhibited this strange affinity for goblins in Codex4
. This timing issue forced OpenAI to implement the explicit system prompt restriction as a temporary fix.Related Stories
While the goblin situation may seem harmless or even charming, it highlights serious concerns about how AI training impacts daily user experiences. The way human makers create these systems has measurable effects on output quality and reliability
2
. Small stylistic tics can grow into bigger problems involving bias and misinformation if not carefully monitored. The Oxford Internet Institute recently found that fine-tuning models for warm, friendly personalities could result in an "accuracy trade-off," where systems make more mistakes or reinforce users' false beliefs5
.This incident demonstrates how reinforcement learning can inadvertently amplify unintended behaviors across model generations. As AI firms shift toward making chatbots more personality-driven to boost engagement, experts warn that the potential for hallucinations could intensify
5
. OpenAI has since removed the reward signal favoring goblins and filtered training data to reduce creature references, while developing new tools to audit and fix model behavior going forward2
.
Source: PCWorld
Summarized by
Navi
[3]
08 Aug 2025•Technology

28 Apr 2025•Technology

21 Apr 2025•Technology

1
Entertainment and Society

2
Health

3
Technology
