2 Sources
2 Sources
[1]
As AI data scrapers sap websites' revenues, some fight back
Lisbon (AFP) - A swarm of AI "crawlers" is running rampant on the internet, scouring billions of websites for data to feed algorithms at leading tech companies -- all without permission or payment, upending the online economy. Before the rise of AI chatbots, websites allowed search engines to access their content in return for increased visibility, a system that rewarded them with traffic and advertising revenues. But the rapid development of generative AI has allowed tech giants like Google and OpenAI to harvest information for their chatbots with web crawlers, without humans ever needing to visit the original sites. Traditional content producers, such as media outlets, are being outpaced by AI crawlers, which have cut into their online operations and advertising revenues. "Sites that gave bots access to their content used to get readers in exchange," said Kurt Muehmel, head of AI strategy at data management firm Dataiku. But the arrival of generative AI "completely breaks" that model, he told AFP. Wikipedia's human internet traffic fell by eight percent between 2024 and 2025 because of a rise in AI search engine summaries, the online encyclopaedia reported last month. "The fundamental tension is that the new business of the internet that is AI-driven doesn't generate traffic," said Matthew Prince, CEO of Cloudflare, an American internet services provider. 'No trespassing' Cloudflare, which processes more than 20 percent of all internet traffic, announced this summer a new measure aimed at blocking AI crawlers from accessing content without payment or permission from website owners. "It's basically like putting a speed limit sign or a no trespassing sign," Prince told AFP on the sidelines of the Web Summit in Lisbon. "Badly behaving bots can get by that, but we can track that... Over time, we can tighten these controls in a way that we're confident the AI companies can't get through." The measure, which applies to more than 10 million websites, has already "attracted the attention of artificial intelligence giants", he added. On a smaller scale, American startup TollBit is providing online news publishers with tools to block, monitor and monetise AI crawler traffic. "The internet is a highway," said CEO and co-founder Toshit Panigrahi, who described the company as a "tollbooth on the internet". TollBit works with more than 5,600 sites, including USA Today, Time magazine and the Associated Press, allowing media outlets to set their own access fees for their content. The analytics are free for publishers, but AI companies are charged a "transaction fee for every piece of content they access". But for Muehmel, the online takeover by AI crawlers cannot be resolved with only "partial measures or by an individual company". "This is an evolution of the entire internet economy, which will take years," he said. If the bot swarm continues to roam freely online, "all of the incentives for content creation are going to go away," Prince said. "That would be a loss, not just for us humans that want to consume it, but actually for the AI companies that need original content in order to train their systems."
[2]
As AI data scrapers sap websites' revenues, some fight back
AI crawlers from major tech firms are scraping vast amounts of web content without permission, undermining traffic and revenues for publishers. As generative AI reduces human visits, companies like Cloudflare and TollBit are trying to block or monetise bot access, but experts warn the issue threatens the entire internet economy. A swarm of AI "crawlers" is running rampant on the internet, scouring billions of websites for data to feed algorithms at leading tech companies -- all without permission or payment, upending the online economy. Bihar Elections 2025 Bihar Election 2025 Results Live UpdatesCheck who is leading and trailing, constituency-wise Before the rise of AI chatbots, websites allowed search engines to access their content in return for increased visibility, a system that rewarded them with traffic and advertising revenues. But the rapid development of generative AI has allowed tech giants like Google and OpenAI to harvest information for their chatbots with web crawlers, without humans ever needing to visit the original sites. Traditional content producers, such as media outlets, are being outpaced by AI crawlers, which have cut into their online operations and advertising revenues. "Sites that gave bots access to their content used to get readers in exchange," said Kurt Muehmel, head of AI strategy at data management firm Dataiku. But the arrival of generative AI "completely breaks" that model, he told AFP. Wikipedia's human internet traffic fell by eight percent between 2024 and 2025 because of a rise in AI search engine summaries, the online encyclopaedia reported last month. "The fundamental tension is that the new business of the internet that is AI-driven doesn't generate traffic," said Matthew Prince, CEO of Cloudflare, an American internet services provider. No trespassing Cloudflare, which processes more than 20 percent of all internet traffic, announced this summer a new measure aimed at blocking AI crawlers from accessing content without payment or permission from website owners. "It's basically like putting a speed limit sign or a no trespassing sign," Prince told AFP on the sidelines of the Web Summit in Lisbon. "Badly behaving bots can get by that, but we can track that... Over time, we can tighten these controls in a way that we're confident the AI companies can't get through." The measure, which applies to more than 10 million websites, has already "attracted the attention of artificial intelligence giants", he added. On a smaller scale, American startup TollBit is providing online news publishers with tools to block, monitor and monetise AI crawler traffic. "The internet is a highway," said CEO and co-founder Toshit Panigrahi, who described the company as a "tollbooth on the internet". TollBit works with more than 5,600 sites, including USA Today, Time magazine and the Associated Press, allowing media outlets to set their own access fees for their content. The analytics are free for publishers, but AI companies are charged a "transaction fee for every piece of content they access". But for Muehmel, the online takeover by AI crawlers cannot be resolved with only "partial measures or by an individual company". "This is an evolution of the entire internet economy, which will take years," he said. If the bot swarm continues to roam freely online, "all of the incentives for content creation are going to go away," Prince said. "That would be a loss, not just for us humans that want to consume it, but actually for the AI companies that need original content in order to train their systems."
Share
Share
Copy Link
Major tech companies' AI crawlers are harvesting web content without permission, disrupting the traditional internet economy and forcing publishers to deploy new blocking and monetization tools to protect their revenues.

A massive wave of artificial intelligence "crawlers" is fundamentally reshaping the internet economy, as tech giants harvest web content without permission or payment, leaving traditional publishers struggling to maintain their revenue streams. The rapid proliferation of these automated data collection systems has created what industry experts describe as an existential threat to the established web ecosystem
1
.Before the emergence of AI chatbots, the internet operated on a mutually beneficial arrangement where websites granted search engines access to their content in exchange for increased visibility, driving traffic and advertising revenues. However, the advent of generative AI has allowed major technology companies like Google and OpenAI to extract information for their chatbots through web crawlers, eliminating the need for human visitors to access original sources
2
.The consequences of this shift are becoming increasingly apparent across the digital landscape. Wikipedia, one of the internet's most visited resources, reported an eight percent decline in human traffic between 2024 and 2025, directly attributable to the rise of AI search engine summaries that provide users with information without requiring them to visit the original source
1
.Kurt Muehmel, head of AI strategy at data management firm Dataiku, explained the fundamental disruption: "Sites that gave bots access to their content used to get readers in exchange." The arrival of generative AI "completely breaks" this traditional model, creating a scenario where content creators provide value without receiving compensation or traffic in return
2
.In response to this challenge, several companies have developed innovative solutions to protect publishers' interests. Cloudflare, which processes more than 20 percent of all internet traffic, announced new measures this summer designed to block AI crawlers from accessing content without explicit permission or payment from website owners. Matthew Prince, Cloudflare's CEO, described the initiative as "basically like putting a speed limit sign or a no trespassing sign"
1
.The Cloudflare system, which covers more than 10 million websites, employs sophisticated tracking mechanisms to monitor bot behavior. "Badly behaving bots can get by that, but we can track that," Prince explained. "Over time, we can tighten these controls in a way that we're confident the AI companies can't get through." The measure has already attracted attention from major artificial intelligence companies
2
.Related Stories
On a more targeted scale, American startup TollBit is providing specialized tools for online news publishers to block, monitor, and monetize AI crawler traffic. CEO and co-founder Toshit Panigrahi conceptualized his company as a "tollbooth on the internet," operating on the principle that "the internet is a highway" requiring controlled access points
1
.TollBit's platform serves more than 5,600 websites, including prominent media outlets such as USA Today, Time magazine, and the Associated Press. The system allows publishers to establish their own access fees for content, while providing free analytics tools. AI companies accessing protected content are charged transaction fees for each piece of information they retrieve
2
.Industry experts warn that the current trajectory poses significant risks to the broader internet ecosystem. Muehmel emphasized that the AI crawler challenge cannot be resolved through "partial measures or by an individual company," describing it as "an evolution of the entire internet economy, which will take years" to fully address
1
.Prince highlighted the paradoxical nature of the situation, noting that unrestricted bot access could ultimately harm both content creators and AI companies themselves. "If the bot swarm continues to roam freely online, all of the incentives for content creation are going to go away," he warned. "That would be a loss, not just for us humans that want to consume it, but actually for the AI companies that need original content in order to train their systems"
2
.Summarized by
Navi
[2]
1
Technology

2
Technology

3
Business and Economy
