2 Sources
2 Sources
[1]
Activist group says it has scraped 86m music files from Spotify
Platform with 700m users worldwide says it is investigating after Anna's Archive claims to have accessed tracks and metadata An activist group has claimed to have scraped millions of tracks from Spotify and is preparing to release them online. Observers said the apparent leak could boost AI companies looking for material to develop their technology. A group called Anna's Archive said it had scraped 86m music files from Spotify and 256m rows of metadata such as artist and album names. Spotify, which hosts more than 100m tracks, confirmed that the leak does not represent its entire inventory. The Stockholm-based company, which has more than 700m users worldwide, said it had "identified and disabled the nefarious user accounts that engaged in unlawful scraping". "An investigation into unauthorized access identified that a third party scraped public metadata and used illicit tactics to circumvent DRM [digital rights management] to access some of the platform's audio files. We are actively investigating the incident," said Spotify. Spotify does not believe the music taken by Anna's Archive has been released yet. Anna's Archive, which is known for providing links to pirated books, said in a blog it wanted to create a "'preservation archive' for music". The group claimed the audio files represent 99.6% of all music listened to by Spotify users and would be shared via "torrents" - a means of sharing large digital files online. "Of course Spotify doesn't have all the music in the world, but it's a great start," said Anna's Archive, which describes its mission as "preserving humanity's knowledge and culture". "With your help, humanity's musical heritage will be forever protected from destruction by natural disasters, wars, budget cuts, and other catastrophes," said the group. Ed Newton-Rex, a composer and campaigner for protecting artists' copyright, said the leaked music would probably be used for developing AI models. "Training on pirated material is sadly common in the AI industry, so this stolen music is almost certain to end up training AI models. This is why governments must insist AI companies reveal the training data they use," he said. The Anna's Archive site makes references to LibGen, a vast online archive of pirated books that has allegedly been used by Mark Zuckerberg's Meta to train its AI models. According to a US court filing, Zuckerberg, Meta's founder and chief executive, approved use of the LibGen dataset despite warnings within the company's AI executive team that it is a dataset "we know to be pirated". The co-founder of an AI startup wrote on LinkedIn that members of the public could in theory "create their own personal free version of Spotify". Yoav Zimmerman, co-founder of Third Chair, said it could also allow tech companies to "train on modern music at scale." He added: "The only thing stopping them is copyright law and the deterrent of enforcement." Copyright has become a battleground between artists, authors and creatives on one side and AI companies on the other. AI tools like chatbot and music generators are trained on vast amounts of data taken from the open web, including copyright-protected work. In the UK, creative professionals have protested against a government proposal to let AI companies use copyright-protected work without permission unless the owner of the copyright-protected work signals they do not want their data to be taken. Almost every respondent to a government consultation on the proposal has backed artists' concerns. Liz Kendall, the secretary of state for science, innovation and technology, told parliament this month there was "no clear consensus" on the issue, adding that ministers would "take the time to get this right". The government has pledged to make policy proposals on AI and copyright by 18 March next year.
[2]
A pirate activist group scraped and released Spotify's entire library
A pirate activist group said it 'backed up' Spotify's music catalogue, claiming it put metadata for 256 million tracks online. The streaming platform said it's 'actively monitoring' the incident. Streaming platform Spotify confirmed on Monday its library had been scraped by a third party, after a pirate activist group claimed it released metadata for the platform's entire music catalogue. According to a blog post on the open source search engine Anna's Archive, the release includes metadata for 256 million tracks and 86 million audio files, representing around 99.6 percent of listens. The files cover music that was put on the platform between 2007 and 2025, the blog post said. "It's the world's first 'preservation archive' for music which is fully open (meaning it can easily be mirrored by anyone with enough disk space)," the blog post stated. A spokesperson from Spotify confirmed the unauthorised access of its library, adding that the third party "used illicit tactics to circumvent DRM (digital rights management) to access some of the platform's audio files". "Spotify has identified and disabled the nefarious user accounts that engaged in unlawful scraping. We've implemented new safeguards for these types of anti-copyright attacks and are actively monitoring for suspicious behaviour," the spokesperson later added in a statement to Euronews Next. The spokesperson said there is no indication of any non-public user information being compromised in the breach, and that the only user-related data involved relates to public playlists created by users. Spotify did not specify how much data was scraped. Hackers said the data was "a little under 300TB in total size" and would be distributed on peer-to-peer file-sharing networks in bulk torrents. Anna's Archive claims its mission is "preserving humanity's knowledge and culture". The search engine for "shadow libraries" has until now been focused on books and other texts. "This Spotify scrape is our humble attempt to start such a 'preservation archive' for music," the blog post states. "Of course Spotify doesn't have all the music in the world, but it's a great start." Theoretically, anyone with the technical knowledge and disk space could use the archive to create their own copy of Spotify. Realistically, anyone who tries will face swift and severe legal action from record companies and other rightsholders. One of the bigger concerns is the potential for artificial intelligence (AI) companies to use the data to train their models, according to Yoav Zimmerman, CEO of Third Chair, a company that tracks unauthorised use of intellectual property. "It also just became dramatically easier for AI companies to train on modern music at scale," Zimmerman said in a LinkedIn post. "The only thing stopping them is copyright law and the deterrent of enforcement." Spotify said it is actively working with industry partners to protect the rights of the creative community. "Since day one, we have stood with the artist community against piracy," the company shared in a statement.
Share
Share
Copy Link
A pirate activist group called Anna's Archive claims to have scraped 86 million music files and 256 million rows of metadata from Spotify, representing 99.6% of all music listened to on the platform. The Stockholm-based streaming giant confirmed the unauthorized access and disabled the accounts involved, while experts warn the leaked material could become AI training data for music generators.

Spotify has confirmed that a pirate activist group successfully executed unauthorized access to its platform, scraping 86 million music files and 256 million rows of metadata from the streaming service. Anna's Archive, a group previously known for providing links to shadow libraries of pirated books, claimed responsibility for the breach in a blog post describing the effort as creating a "preservation archive" for music
1
. The Stockholm-based company, which hosts more than 100 million tracks and serves over 700 million users worldwide, said it had "identified and disabled the nefarious user accounts that engaged in unlawful scraping"1
.The breach does not represent Spotify's entire inventory, but the scraped music files cover approximately 99.6% of all music listened to by Spotify users
2
. According to Anna's Archive, the files span music uploaded to the platform between 2007 and 2025, totaling "a little under 300TB in total size"2
. The group plans to distribute the scraped music files through torrents, a peer-to-peer file-sharing method that allows anyone with sufficient disk space to mirror the entire dataset.Spotify's investigation revealed that the third party "used illicit tactics to circumvent DRM [Digital Rights Management] to access some of the platform's audio files"
2
. The company emphasized that no non-public user information was compromised, with the only user-related data involved relating to public playlists created by users. Spotify stated it has "implemented new safeguards for these types of anti-copyright attacks and are actively monitoring for suspicious behaviour"2
.Anna's Archive describes its mission as "preserving humanity's knowledge and culture" and claims this Spotify scrape represents "the world's first 'preservation archive' for music which is fully open"
2
. The group stated: "With your help, humanity's musical heritage will be forever protected from destruction by natural disasters, wars, budget cuts, and other catastrophes"1
. While theoretically anyone with technical knowledge could use the archive to create their own copy of Spotify's music catalog, such attempts would face swift legal action from record companies and other rightsholders.The most pressing concern for the creative community centers on how this massive dataset could fuel training AI models for music generators. Ed Newton-Rex, a composer and campaigner for protecting artists' copyright, warned that "training on pirated material is sadly common in the AI industry, so this stolen music is almost certain to end up training AI models"
1
. He emphasized the urgent need for data transparency, stating: "This is why governments must insist AI companies reveal the training data they use"1
.Yoav Zimmerman, CEO of Third Chair, a company tracking unauthorized use of intellectual property, noted in a LinkedIn post that "it also just became dramatically easier for AI companies to train on modern music at scale"
2
. He added that members of the public could theoretically "create their own personal free version of Spotify," but emphasized that "the only thing stopping them is copyright law and the deterrent of enforcement"1
.Related Stories
The Anna's Archive site makes references to LibGen, a vast online archive of pirated books that has allegedly been used by tech giants for AI development. According to a US court filing, Mark Zuckerberg's Meta used the LibGen dataset to train its AI models despite internal warnings that it is "a dataset we know to be pirated"
1
. This precedent raises alarms about how the Spotify breach could follow a similar trajectory.Copyright has become a battleground between the creative community and AI companies, with AI tools like chatbots and music generators trained on vast amounts of data taken from the open web, including copyright-protected work. In the UK, creative professionals have protested against a government proposal to let AI companies use copyright-protected work without permission unless owners explicitly opt out. Liz Kendall, the secretary of state for science, innovation and technology, told parliament this month there was "no clear consensus" on the issue, with the government pledging to make policy proposals on AI and copyright by 18 March next year
1
.Spotify emphasized it is "actively working with industry partners to protect the rights of the creative community," stating: "Since day one, we have stood with the artist community against piracy"
2
. The incident underscores the urgent need for stronger protections as scraping techniques become more sophisticated and the appetite for AI training data grows among tech companies.Summarized by
Navi
[1]
16 Oct 2025•Technology

25 Sept 2025•Technology

08 Dec 2025•Entertainment and Society

1
Technology

2
Technology

3
Technology
