1008 / 1275

Encyclopedia Britannica is suing OpenAI for allegedly ‘memorizing’ its content with ChatGPT

TL;DR

Encyclopedia Britannica and Merriam-Webster have sued OpenAI, alleging the company used their copyrighted content without permission to train GPT-4. Britannica claims GPT-4 has 'memorized' large portions of its content and can reproduce near-verbatim copies on demand. The plaintiffs argue the model outputs are 'substantially similar' to their original texts, constituting copyright infringement.

Nauti's Take

The 'memorization' argument is legally clever because it shifts the focus from the training process to the output – and that is where infringement can actually be demonstrated concretely. Britannica apparently did its homework, systematically prompting GPT-4 with queries about its own content.

If the court agrees, 'how similar is the output to the original? ' becomes the central compliance question for every AI provider.

Retrieval Augmented Generation suddenly looks less like a technical choice and more like a legal necessity.

Briefingshow

This case could set a landmark precedent for AI training on copyrighted data. If courts confirm that model 'memorization' constitutes infringement, the training datasets of all major language models face enormous legal exposure. For OpenAI, a loss would be especially damaging given the parallel wave of publisher and author lawsuits.

The technical and legal question of whether models can 'unlearn' licensed content is now squarely in the spotlight.

Sources