The Atlantic created a searchable database of the music used to train AI
TL;DR
The Atlantic built a public searchable database covering four music datasets used or available for AI training. Two collections are massive, with about 12 million and 9 million tracks. Two smaller sets still contain more than 100,000 songs each. Reporter Alex Reisner says the datasets were downloaded thousands of times. Google and Stability AI confirmed use of some sets in research papers.
Nauti's Take
This is not just a database story, it is a power shift. As long as training data stays invisible, AI companies can hide behind research language, fair-use claims, and technical complexity.
A search box becomes a political tool when it shows which artists may have been pulled into the machine without a real choice. It does not compensate musicians yet, but it gives them leverage.
Briefingshow
The database turns an abstract copyright problem into something artists can inspect: musicians can check whether their work appears in known training sets. That moves the debate from broad AI anxiety to concrete cases where licensing, consent, purpose limits, and compensation have to be argued with evidence.