The Atlantic created a searchable database of the music used to train AI
TL;DR
The Atlantic has made four music datasets searchable after reporter Alex Reisner found they were being used to train AI systems. Two collections are massive, with about 12 million and 9 million tracks. Two smaller sets still contain more than 100,000 songs each. The datasets have been downloaded thousands of times. Google and Stability have confirmed use in research papers, while other users are not fully traceable.
Nauti's Take
This is uncomfortable for the AI music business because it punctures the convenient fog around training data. When nobody can inspect the inputs, everything sounds like research, innovation and fair use.
Once artists can search for their own names, the debate becomes concrete evidence, not vibes. The industry now has to explain why personal streaming availability was treated like a permission slip for model training.
Briefingshow
This is not just another story about music inside AI training data. The searchable database turns a vague copyright fight into something artists, labels and platforms can inspect track by track. It raises a sharper question: who had permission to crawl, download and commercially exploit this material?