AI Boom Triples Hard Drive Prices for Archival Sites

AI Boom Drives Storage Crisis for Digital Preservation

The AI boom has created an unexpected casualty: organizations dedicated to internet archiving are struggling to preserve digital history as hard drive prices skyrocket to unprecedented levels. The Internet Archive, home to the Wayback Machine and custodian of approximately 210 petabytes of data, now confronts what founder Brewster Kahle describes as "a very real issue costing us time and money."1

The organization adds another 100 terabytes to its collections daily, making the current hard drive shortage particularly acute.

Both NAND and mechanical HDDs face severe shortages as hyperscalers book out production capacities for AI data centers. The 28-30TB hard drives essential for preserving digital content now cost up to three times their previous price—when they're available at all.2

This stratospheric storage pricing forces archival organizations to make difficult choices about what content they can afford to preserve and at what pace.

Skyrocketing Hard Drive Prices Impact Major Archival Organizations

The Wikimedia Foundation, which maintains Wikipedia and over 65 million articles, faces similar pressures from the storage crisis. A spokesperson explained that the organization sees "the primary impact in the purchase of memory and hard drives but also in terms of lead times on server deliveries and our capacity to place future orders."1

The organization must now carefully allocate budgets that were already stretched thin, with current market turbulence exacerbating existing constraints.

The Internet Archive attempts to source drives directly from manufacturers, but those suppliers remain busy fulfilling backorders from larger clients. While the organization benefits from active donors and a community committed to fighting digital decay, these supporters can only provide workarounds rather than systemic solutions. Finding large-capacity drives at manufacturer's suggested retail price has become nearly impossible, even for casual enthusiasts.

Anti-Scraping Measures Create Dual Threat to Knowledge Preservation

Beyond the hard drive shortage, the AI boom threatens internet archiving through another mechanism: anti-scraping measures. As LLMs require massive datasets often acquired through data scraping—sometimes illegally—websites have implemented countermeasures to prevent unauthorized extraction of their content. These protective barriers don't distinguish between web crawler bots gathering information for AI training and those creating snapshots for educational purposes and digital preservation.

Blocking archival bots has become increasingly common as website operators treat all automated scraping with suspicion. The Wayback Machine's web crawler, designed to compile historical snapshots of web pages, now faces the same barriers erected against AI companies.2

This creates a particularly thorny problem: AI companies can potentially circumvent blocks by accessing content through the Wayback Machine itself, making news sites wary of allowing any archival access.

Individual Archivists Scale Back Amid Rising Costs

The impact extends beyond major non-profits to individual contributors who support preserving digital content. Members of communities like the r/DataHoarders subreddit report scaling back or entirely stopping their archival efforts, waiting for prices to stabilize. While occasional deals surface, the consistent availability of affordable, large-capacity storage has evaporated. Organizations like the End of Term Archive, which documents government websites between different administrations, hold onto hope that market conditions will improve before their next upgrade cycle.

The Internet Archive does utilize tape storage for longer-term backups, but this medium cannot replace hard drives for the "living archive" that users access on demand. Tape storage lacks the performance characteristics necessary for responsive information access, making HDDs essential for day-to-day operations. As talks continue between archival organizations and content providers about resolving the web crawler blocking issue, the storage crisis represents a more immediate financial burden that threatens the pace and scope of knowledge preservation efforts worldwide.

AI boom triples hard drive prices, threatening Internet Archive and Wayback Machine preservation

AI Boom Drives Storage Crisis for Digital Preservation

Skyrocketing Hard Drive Prices Impact Major Archival Organizations

Anti-Scraping Measures Create Dual Threat to Knowledge Preservation

Individual Archivists Scale Back Amid Rising Costs

References

Internet archival sites struggling to preserve the internet because of skyrocketing hard drive prices due to the AI boom -- Wayback Machine and Wikimedia punished by stratospheric storage pricing and stricter anti-scraping measures blocking the wrong bots

The Wayback Machine faces another threat from AI -- ridiculously expensive hard drive prices

Related Stories

AI Demand Sells Out Western Digital Hard Drives Through 2026, Long-Term Agreements Lock Supply

AI Boom Triggers Storage Crisis: HDD and SSD Prices Surge Amid Supply Shortages

News websites block Wayback Machine as AI scraping fears threaten digital preservation

Recent Highlights

OpenAI releases GPT-5.6 models after government review, unveils ChatGPT Work to compete in AI agent race

Apple sues OpenAI over alleged trade secrets theft as 400+ former employees caught in scandal

SK Hynix raises $26.5B in largest foreign US IPO as AI boom fuels memory chip demand

Recent Highlights

Today's Top Stories

200+ Economists Warn AI Economic Impact Could Dwarf Industrial Revolution in Just Years

Waze integrates Google Gemini AI with personalized navigation and motorcycle-focused updates

Satya Nadella warns companies using AI are paying twice: once in cash, once in secrets

Samsung Health forces users to choose: consent to AI training or lose your health data