Millions of books died so Claude could live
TL;DR
Anthropic trained Claude on millions of copyrighted books – without permission from publishers or authors. Training data came from pirated e-book collections and shadow libraries, including Books3 and LibGen. Anthropic invokes fair use, while publishers and authors sue and demand licensing agreements. The Vergecast explores the ethical and legal gray zones of AI training on unauthorized content.
Nauti's Take
Anthropic loves to position itself as the good guys in the AI race – with an ethics board and Constitutional AI. But when push comes to shove, they too reach into the piracy jar.
Interpreting fair use as a blank check for billion-dollar companies is brazen. Publishers and authors are right: if you profit from others' work, you should pay.
That Claude, the poster child for "safe AI," was trained on stolen goods is the irony of the year.
Summary
Anthropic trained Claude on millions of copyrighted books – without permission from publishers or authors. Training data came from pirated e-book collections and shadow libraries, including Books3 and LibGen.
Anthropic invokes fair use, while publishers and authors sue and demand licensing agreements. The Vergecast explores the ethical and legal gray zones of AI training on unauthorized content.