The Atlantic created a searchable database of the music used to train AI
TL;DR
The Atlantic investigated four publicly available music datasets used for AI training and turned them into a searchable database. Two of the datasets are massive, with roughly 12 million and 9 million tracks. Two smaller collections still contain more than 100,000 songs each. Google and Stability have confirmed in research papers that they used such datasets. Who else downloaded or trained on them remains unclear.
Nauti's Take
This is not a cute transparency project. It is an uncomfortable reality check for the AI music business.
Companies training music models cannot hide forever behind dataset names, research papers, and the claim that material was publicly available. If a newsroom can make the trail searchable, artists, lawyers, and rights holders can too.
The next fight will be less about creativity and more about evidence, licenses, and money flows.
Briefingshow
The database turns a vague AI music controversy into something searchable: actual songs, artists, and training traces. For labels, musicians, and platforms, that shifts the debate from suspicion to evidence. It also exposes how thin the line is between publicly reachable data and lawfully licensed training material.