AI Scraping Bots Overwhelm Digital Archives, Threatening Cultural Institutions

2 Sources

Share

AI bots are overwhelming the servers of libraries, archives, museums, and galleries, causing disruptions and raising concerns about the sustainability of open access to cultural resources.

AI Bots Overwhelm Digital Archives

A recent survey by the GLAM-E Lab has revealed a growing crisis in the digital preservation of cultural heritage. AI scraping bots are overwhelming the servers of libraries, archives, museums, and galleries (GLAM institutions), causing significant disruptions and in some cases knocking entire collections offline

1

2

.

Scale and Impact of the Problem

Of the 43 institutions surveyed across North America, Europe, and Oceania, 39 reported recent traffic spikes attributed to AI bots. These bots, often linked to companies building training corpora for large AI models, arrive in dense, rapid waves, downloading entire collections and ignoring traditional web crawling etiquette

2

.

The impact varies widely depending on the institution's digital infrastructure. While some larger organizations can absorb the increased traffic, smaller community archives may crash within minutes of a bot attack. Many institutions only discovered the true source of the traffic after experiencing breakdowns, as their analytics tools were not designed to detect this type of bot activity

2

.

Challenges for Cultural Institutions

This situation presents a significant dilemma for GLAM institutions, whose mission is to share culture and knowledge widely. The same openness that serves the public also exposes them to industrial-scale scraping from AI developers, often without attribution, compensation, or regard for infrastructure costs

2

.

Institutions have reported bots arriving in swarms, rotating IP addresses, and spoofing user agents to avoid detection. These attacks can spike server CPU usage to 100% and crash systems for hours or days

2

.

Countermeasures and Their Limitations

Many GLAM teams have deployed various countermeasures, including firewalls, IP blocks, geofencing, and bot detection services. However, each solution comes with trade-offs. For example, blocking by geography might prevent legitimate researchers from accessing materials, while user agent filtering can be easily circumvented

2

.

The most effective countermeasures, such as scaling up server capacity or integrating sophisticated traffic monitoring tools, are often prohibitively expensive for cultural institutions with limited budgets

2

.

Philosophical and Ethical Considerations

This crisis raises deeper philosophical questions about the nature of digital access in the AI age. If bots now represent a significant share of traffic, should institutions try to serve them, block them, or treat them as a new class of visitor? The situation is testing the values of openness and access in the digital age, as the infrastructure supporting these ethics wasn't designed to handle AI-scale extraction

2

.

Source: Dataconomy

Source: Dataconomy

Future Implications and Potential Solutions

Some institutions are considering building APIs to serve bots more efficiently, while others are hoping for legal protections. However, enforcement of such measures is far from guaranteed

2

.

The GLAM community is calling for AI companies to support the maintenance of the public internet if they intend to use it as a training ground. This could involve abiding by better standards, funding sustainable access programs, or respecting new opt-out protocols

2

.

Source: 404 Media

Source: 404 Media

As this situation continues to evolve, it's clear that a balance must be struck between preserving open access to cultural resources and protecting the digital infrastructure that makes such access possible. The resolution of this crisis will likely shape the future of digital cultural preservation and AI development alike.

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo