667 / 934

Encyclopedia Britannica is suing OpenAI for allegedly ‘memorizing’ its content with ChatGPT

TL;DR

Encyclopedia Britannica and Merriam-Webster have sued OpenAI, alleging the company used their copyrighted content without permission to train GPT-4.

Key Points

  • Britannica claims GPT-4 has 'memorized' large portions of its content and can reproduce near-verbatim copies on demand.
  • The plaintiffs argue the model outputs are 'substantially similar' to their original texts, constituting copyright infringement.
  • Both publishers are among the most established reference works in the English-speaking world, giving the case significant symbolic weight.

Nauti's Take

The 'memorization' argument is legally clever because it shifts the focus from the training process to the output – and that is where infringement can actually be demonstrated concretely. Britannica apparently did its homework, systematically prompting GPT-4 with queries about its own content.

If the court agrees, 'how similar is the output to the original? ' becomes the central compliance question for every AI provider.

Retrieval Augmented Generation suddenly looks less like a technical choice and more like a legal necessity.

Context

This case could set a landmark precedent for AI training on copyrighted data. If courts confirm that model 'memorization' constitutes infringement, the training datasets of all major language models face enormous legal exposure. For OpenAI, a loss would be especially damaging given the parallel wave of publisher and author lawsuits.

The technical and legal question of whether models can 'unlearn' licensed content is now squarely in the spotlight.

Sources