720 / 758

What do LLMs think when you don't tell them what to think about?

TL;DR

Researchers studied what LLMs generate when given no topic – and each model family has distinct default preferences.

Key Points

  • GPT models lean toward code and math, Llama toward narratives, DeepSeek toward religious content, Qwen toward exam questions.
  • These „knowledge priors” reveal which training data shaped the models – a fingerprint of their datasets.
  • The patterns are consistent across different versions of the same model family.

Nauti's Take

This is a rare look behind the curtain: what data a model has seen can be read not just from its capabilities, but from what it produces unprompted. GPT loves code, Llama tells stories, DeepSeek turns religious – these aren't coincidences, but traces of training data.

In practice, this means: choosing a model isn't just choosing performance, but also an implicit worldview. And you should know that before deploying it to users.

Context

LLMs are often seen as neutral tools, but this study shows they have built-in preferences directly inherited from their training data. This affects how they respond to open prompts – and makes transparent which knowledge domains a model „favors”. For developers, this matters because it explains why certain models fit certain tasks better than others – not just due to capabilities, but due to inherent biases.

Sources