tech-pub

Millions of books died so Claude could live

February 3, 2026 at 03:19 PMUpdated: Mar 201 Sources

TL;DR

Anthropic trained Claude on millions of copyrighted books – without permission from publishers or authors. Training data came from pirated e-book collections and shadow libraries, including Books3 and LibGen. Anthropic invokes fair use, while publishers and authors sue and demand licensing agreements. The Vergecast explores the ethical and legal gray zones of AI training on unauthorized content.

Nauti's Take

Anthropic loves to position itself as the good guys in the AI race – with an ethics board and Constitutional AI. But when push comes to shove, they too reach into the piracy jar.

Interpreting fair use as a blank check for billion-dollar companies is brazen. Publishers and authors are right: if you profit from others' work, you should pay.

That Claude, the poster child for "safe AI," was trained on stolen goods is the irony of the year.

Briefingshow

Claude is considered one of the most advanced language models – but its edge rests on questionable foundations. If leading AI companies systematically ignore copyright, it could destabilize the entire content industry. At the same time, the case shows how difficult it is to reconcile AI development with existing law.

The debate will determine who profits from creative works in the future – creators or tech giants.