The Atlantic publishes searchable index of music datasets used to train AI

In detail

Two datasets contain roughly 12 million and 9 million tracks; two others have over 100,000 songs each.
Collections include tracks from major artists (e.g., Lady Gaga, Radiohead, Wu‑Tang Clan) and sources like the Free Music Archive.
Google and Stability confirm use of such datasets in research papers.
Many datasets are lists of Spotify/YouTube links; developers use automated tools to download audio, sometimes bypassing logins or ads and violating platform terms of service.

Why it matters

The availability of massive music collections highlights licensing and compliance risks for AI training pipelines and the potential legal exposure for companies building audio/creative AI products.

For you If your business uses or buys audio AI, verify training data provenance and licensing with providers and avoid models trained on unlicensed large‑scale collections.

Sources

The Verge