News Publishers Block AI Training Access to Web Archives Amid Copyright Infringement Concerns
Over 240 news organizations across nine countries are blocking web archives like Common Crawl and the Internet Archive's Wayback Machine to prevent AI companies from using their content without permission or compensation. Major outlets including The New York Times, CNN, and USA Today are restricting access, citing copyright violations as AI firms use archived news content to train large language models. The move raises concerns about preserving public records and historical accountability.