3 Sources
[1]
Investigation by The Atlantic reveals many millions of songs used for AI music training - Engadget
Taylor Swift, Bad Bunny and many, many more artists have had their work fed into AI models. We're always glad to see more publications and groups digging deeper into artificial intelligence and its impact. Today, The Atlantic has published four searchable databases of music that has been used to train AI models. The scope is pretty staggering, with 12 million tracks in one database, 9 million in another, and the two final ones each containing about 100,000 songs. The accompanying article by staff writer Alex Reisner gives further context to just how much copyrighted music was used for AI training, including hit tracks from Taylor Swift and Bad Bunny. He points to some of the legal cases already underway against generative AI music platforms, such as Suno and Udio, which have often made claims of fair use as a defense for wholesale scraping copyright-protected content to power their platforms. A similar case in book publishing didn't make headway with a judge on claims of copyright infringement, but piracy allegations have proved to be a more compelling argument. The full results and payout from that suit are still pending, though the initial settlement was for $1.5 billion. Having sources such as these databases from The Atlantic could help parties in the music industry try for similar lawsuits in the future. Many music streaming services have taken steps to prevent, identify or label generative AI creations, but those efforts have seen varying degrees of success. They also haven't stopped scammers from creating imitations of existing bands and trying to benefit off their work with AI copycats.
[2]
The Atlantic uncovers millions of copyrighted songs in AI training data
An investigation by The Atlantic has revealed that millions of copyrighted songs have been used to train AI music models, including tracks from popular artists like Taylor Swift and Bad Bunny. The publication created four searchable databases that collectively encompass 12 million, 9 million, and two additional databases with approximately 100,000 songs each. The article by staff writer Alex Reisner provides insight into the extent of copyrighted music included in AI training data. Legal actions are currently underway against generative AI music platforms such as Suno and Udio, which assert fair use as a defense for using copyright-protected material. A previous lawsuit in the book publishing sector struggled to advance on copyright claims, while piracy allegations gained more traction. The initial settlement from the book publishing case amounted to $1.5 billion, with final outcomes and payouts still pending. The databases from The Atlantic may serve as valuable resources for the music industry in pursuing future lawsuits related to copyright infringement. In response to the rise of AI-generated music, many streaming services have implemented measures to prevent, identify, or label such creations. However, the effectiveness of these measures has varied. Additionally, scammers have exploited the situation by creating imitations of established bands to capitalize on their work through AI-generated copies.
[3]
Millions of songs have been used for AI music training
Many prominent artists have had their work fed into AI models. AI is a thing now, and different outlets are digging into various effects of AI in our modern times. As reported by Engadget, The Atlantic has published four searchable databases of music that has been used to train AI models. It has "12 million tracks in one database, 9 million in another, and the two final ones each containing about 100,000 songs". This means that a lot of copyrighted music was used for AI training. And as expected, many music streaming services have taken steps to prevent, identify or label generative AI creations. But that hasn't stopped scammers from creating imitations of existing bands and trying to benefit off their work with AI copycats. The Atlantic's databases, among others like it, could help parties in the music industry watch for their interests a bit better.
Share
Copy Link
The Atlantic investigation has uncovered four searchable databases revealing that millions of copyrighted songs from artists like Taylor Swift and Bad Bunny were used to train AI music models. The databases contain over 21 million tracks, potentially fueling new copyright infringement lawsuits against generative AI music platforms like Suno and Udio.
The Atlantic has published four searchable databases of songs exposing the extensive use of copyrighted material in AI music training. Staff writer Alex Reisner documented 12 million tracks in one database, 9 million in another, and two additional databases each containing approximately 100,000 songs
1
. The investigation reveals that millions of copyrighted songs from prominent artists including Taylor Swift and Bad Bunny have been fed into AI models without authorization2
.
Source: Engadget
Generative AI music platforms such as Suno and Udio are already facing lawsuits over their use of copyright-protected content. These platforms have frequently defended their practices by asserting fair use claims, arguing that wholesale scraping of copyrighted material falls within legal boundaries
1
. However, the music industry is watching closely as similar legal battles unfold in other creative sectors. A comparable case in book publishing struggled to gain traction on copyright infringement grounds initially, but piracy allegations proved more compelling to judges. That lawsuit resulted in an initial settlement of $1.5 billion, with final outcomes and payouts still pending2
.The searchable databases from The Atlantic could serve as critical evidence for parties pursuing music industry lawsuits against AI companies. These resources provide concrete documentation of which specific tracks were included in AI training data, making it easier for artists and rights holders to identify unauthorized use of their work
2
. Legal experts suggest these databases may help the music industry build stronger cases, potentially following the path established by book publishers who successfully argued piracy claims3
.
Source: GameReactor
Related Stories
Many music streaming services have implemented measures to prevent, identify, or label AI-generated content, though these efforts have achieved varying degrees of success
1
. The challenge extends beyond legitimate AI music creation to include AI-driven music scams, where bad actors create imitations of existing bands and attempt to profit from AI copycats. These scammers exploit the technology to generate content that mimics established artists, making it difficult for listeners to distinguish authentic work from AI-generated imitations3
.The scale of copyrighted music used in AI training raises questions about compensation, attribution, and creative control. Artists whose work appears in these databases had no opportunity to consent to or negotiate terms for the use of their music. As AI music generation becomes more sophisticated, the industry faces pressure to establish clear frameworks for how AI companies can access and use creative works. Watch for increased regulatory scrutiny and potential legislation addressing AI training practices, as well as more aggressive enforcement actions from rights holders seeking to protect their catalogs from unauthorized AI training.
Summarized by
Navi
[1]
[3]
17 Apr 2025•Technology

28 Apr 2026•Entertainment and Society

20 Apr 2026•Technology

1
Policy and Regulation

2
Business and Economy

3
Technology
