AI boom triples hard drive prices, threatening Internet Archive and Wayback Machine preservation

2 Sources

Share

The AI boom has triggered a storage crisis that threatens internet archiving efforts. Hard drive prices have surged up to three times their normal cost, with 28-30TB drives either out of stock or grossly inflated. The Internet Archive, which stores 210 petabytes and adds 100 terabytes daily, now faces mounting costs while anti-scraping measures designed to block AI bots inadvertently block archival efforts too.

News article

AI Boom Drives Storage Crisis for Digital Preservation

The AI boom has created an unexpected casualty: organizations dedicated to internet archiving are struggling to preserve digital history as hard drive prices skyrocket to unprecedented levels. The Internet Archive, home to the Wayback Machine and custodian of approximately 210 petabytes of data, now confronts what founder Brewster Kahle describes as "a very real issue costing us time and money."

1

The organization adds another 100 terabytes to its collections daily, making the current hard drive shortage particularly acute.

Both NAND and mechanical HDDs face severe shortages as hyperscalers book out production capacities for AI data centers. The 28-30TB hard drives essential for preserving digital content now cost up to three times their previous price—when they're available at all.

2

This stratospheric storage pricing forces archival organizations to make difficult choices about what content they can afford to preserve and at what pace.

Skyrocketing Hard Drive Prices Impact Major Archival Organizations

The Wikimedia Foundation, which maintains Wikipedia and over 65 million articles, faces similar pressures from the storage crisis. A spokesperson explained that the organization sees "the primary impact in the purchase of memory and hard drives but also in terms of lead times on server deliveries and our capacity to place future orders."

1

The organization must now carefully allocate budgets that were already stretched thin, with current market turbulence exacerbating existing constraints.

The Internet Archive attempts to source drives directly from manufacturers, but those suppliers remain busy fulfilling backorders from larger clients. While the organization benefits from active donors and a community committed to fighting digital decay, these supporters can only provide workarounds rather than systemic solutions. Finding large-capacity drives at manufacturer's suggested retail price has become nearly impossible, even for casual enthusiasts.

Anti-Scraping Measures Create Dual Threat to Knowledge Preservation

Beyond the hard drive shortage, the AI boom threatens internet archiving through another mechanism: anti-scraping measures. As LLMs require massive datasets often acquired through data scraping—sometimes illegally—websites have implemented countermeasures to prevent unauthorized extraction of their content. These protective barriers don't distinguish between web crawler bots gathering information for AI training and those creating snapshots for educational purposes and digital preservation.

Blocking archival bots has become increasingly common as website operators treat all automated scraping with suspicion. The Wayback Machine's web crawler, designed to compile historical snapshots of web pages, now faces the same barriers erected against AI companies.

2

This creates a particularly thorny problem: AI companies can potentially circumvent blocks by accessing content through the Wayback Machine itself, making news sites wary of allowing any archival access.

Individual Archivists Scale Back Amid Rising Costs

The impact extends beyond major non-profits to individual contributors who support preserving digital content. Members of communities like the r/DataHoarders subreddit report scaling back or entirely stopping their archival efforts, waiting for prices to stabilize. While occasional deals surface, the consistent availability of affordable, large-capacity storage has evaporated. Organizations like the End of Term Archive, which documents government websites between different administrations, hold onto hope that market conditions will improve before their next upgrade cycle.

The Internet Archive does utilize tape storage for longer-term backups, but this medium cannot replace hard drives for the "living archive" that users access on demand. Tape storage lacks the performance characteristics necessary for responsive information access, making HDDs essential for day-to-day operations. As talks continue between archival organizations and content providers about resolving the web crawler blocking issue, the storage crisis represents a more immediate financial burden that threatens the pace and scope of knowledge preservation efforts worldwide.

Today's Top Stories

TheOutpost.ai

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

Instagram logo
LinkedIn logo
Youtube logo
© 2026 TheOutpost.AI All rights reserved