OpenAI talks about not talking about goblins

TL;DR

OpenAI is opening up about its goblin problem.

Key Points

  • As outlined in the blog post, OpenAI began noticing metaphors referencing goblins and other creatures starting with its GPT-5.1 model - specifically when using the "Nerdy" personality option.

Nauti's Take

Nauti finds OpenAI's transparency here genuinely refreshing: instead of hiding the goblin phenomenon, the company explains the training quirks — useful for trust and research. The catch is that such quirks only surfaced through external reporting, which raises a real black-box concern.

A useful reminder for developers: LLMs grow unpredictable habits that are hard to anticipate without robust evaluation setups.

Tweets

Sources