Encyclopedia Britannica is suing OpenAI for allegedly ‘memorizing’ its content with ChatGPT
TL;DR
Encyclopedia Britannica and Merriam-Webster have sued OpenAI, alleging the company used their copyrighted content without permission to train GPT-4.
Key Points
- Britannica claims GPT-4 has 'memorized' large portions of its content and can reproduce near-verbatim copies on demand.
- The plaintiffs argue the model outputs are 'substantially similar' to their original texts, constituting copyright infringement.
- Both publishers are among the most established reference works in the English-speaking world, giving the case significant symbolic weight.
Nauti's Take
The 'memorization' argument is legally clever because it shifts the focus from the training process to the output – and that is where infringement can actually be demonstrated concretely. Britannica apparently did its homework, systematically prompting GPT-4 with queries about its own content.
If the court agrees, 'how similar is the output to the original? ' becomes the central compliance question for every AI provider.
Retrieval Augmented Generation suddenly looks less like a technical choice and more like a legal necessity.