Wayback Machine Blocked by News Websites Over AI Fears

Major News Organizations Block Wayback Machine Over AI Concerns

A growing number of news websites are blocking the Internet Archive's Wayback Machine, threatening one of the web's most vital historical preservation tools. According to research from Originality AI, 23 major news organizations are now preventing the archive's web crawler from accessing their content, representing a significant portion of the 241 sites that have implemented such restrictions 1

. Among the publishers blocking the Internet Archive are prominent outlets like the New York Times and USA Today 2

Source: TechRadar

The decision stems from concerns about AI training models using archived content without permission. New York Times spokesperson Graham James stated that "Times content on the Internet Archive is being used by AI companies in violation of copyright law to directly compete with us" 1

. Publishers worry that while they can block AI scraping directly from their sites, third-party AI firms can still access their material through the Wayback Machine's extensive library of web history.

The Irony of Blocking While Benefiting

The situation reveals a troubling contradiction. USA Today recently published an investigative report on US Immigration and Customs Enforcement's delayed disclosure of detainment policy impacts—research that relied extensively on the Wayback Machine 1

. Yet the outlet simultaneously blocks the archive from preserving its own content. Mark Graham, director of the Wayback Machine, highlighted this paradox: "They're able to pull together their story research because the Wayback Machine exists. At the same time, they're blocking access" 1

This isn't about readers circumventing paywalls, but rather the broader backlash against AI and content scraping. News organizations fear that archived versions of their articles provide an easy target for Large Language Models (LLMs) seeking training data, potentially enabling copyright violation on a massive scale.

Threats to Preserving Historical Web Content and Public Accountability

The trend extends beyond traditional media. Reddit has also blocked the Wayback Machine's web crawler due to identical AI concerns, while federal government websites have contributed to data loss by deleting content 2

. As more organizations restrict access, the Internet Archive's capacity for archiving web pages and maintaining an accurate public record faces serious erosion.

Graham warns that the consequences reach far beyond AI: "There's no question that the general locking-down of more and more of the public web is impacting society's ability to understand what's going on in our world" 1

. Third-party archives provide an incorruptible version of stories that can hold publishers accountable when content is revised or deleted after publication 2

What Comes Next for Digital Preservation

More than 100 media workers have signed a petition titled "Journalists applaud the Internet Archive's role in preserving the public record," demonstrating support from within the industry 1

. Graham remains in talks with news organizations to find solutions that address AI scraping concerns while maintaining access for historical preservation 2

The situation presents complex questions about copyright law, the rights of publishers, and society's need for transparent historical records. As AI continues reshaping the digital landscape, finding a balance between protecting intellectual property and maintaining the public record will determine whether future researchers, journalists, and citizens can access the web's history or face an increasingly locked-down internet where accountability becomes harder to enforce.

News websites block Wayback Machine as AI scraping fears threaten digital preservation

Major News Organizations Block Wayback Machine Over AI Concerns

The Irony of Blocking While Benefiting

Threats to Preserving Historical Web Content and Public Accountability

What Comes Next for Digital Preservation

References

AI could mean the end of the Wayback Machine, as news websites are increasingly blocking it to prevent content scraping

News outlets like NYT and USA Today are blocking the Internet Archive's Wayback Machine to prevent AI training models from using their content | Fortune

Related Stories

News Publishers Block AI Training Access to Web Archives Amid Copyright Infringement Concerns

Reddit Blocks Internet Archive to Prevent AI Companies from Scraping User Data

AI boom triples hard drive prices, threatening Internet Archive and Wayback Machine preservation

Recent Highlights

Anthropic overtakes OpenAI as most valuable AI startup with $965 billion valuation

Pope Leo XIV releases major AI encyclical calling for 'disarmament' of artificial intelligence

Apple's Siri overhaul for iOS 27 brings Gemini integration and standalone app to compete with ChatGPT

Recent Highlights

Today's Top Stories

OpenAI's AI model disproved an 80-year-old math problem, leaving mathematicians questioning their future

NVIDIA unveils Isaac GR00T humanoid robot platform for researchers with Unitree and Sharpa

Microsoft Surface Laptop Ultra debuts with Nvidia RTX Spark chip and 128GB unified memory

US moves to halt Nvidia AI chip shipments to Chinese firms outside China after year-long loophole