AI Crawling Creates Publisher Divide as Big Outlets Get Paid While Millions Block Bots

2 Sources

Share

A stark divide is emerging in how publishers respond to AI crawling. Major outlets like CNN and Fox News are signing content licensing deals with Meta and OpenAI for undisclosed sums, while over 5.6 million websites now block GPTBot—a 70% surge since July. Meanwhile, 13% of AI bots ignore blocking rules entirely, forcing smaller publishers to fight unauthorized content scraping without legal resources.

Major Publishers Strike Content Licensing Deals While Smaller Sites Struggle

A two-tier system is taking shape across the publishing industry as AI crawling intensifies. Meta recently announced partnerships with CNN, Fox News, USA Today, The Daily Caller, People, and Le Monde to bring "real-time content on Meta AI," with the company paying publishers undisclosed amounts

1

. These content licensing deals represent the latest in a growing trend of AI platforms compensating select publishers for access to their work.

Source: PC Magazine

Source: PC Magazine

According to Columbia Journalism School's Tow Center for Digital Journalism, 128 such arrangements have been signed between AI operators and news publishers since July 2023

1

. High-profile examples include OpenAI's deal with the Financial Times and Perplexity paying the Washington Post and Los Angeles Times for inclusion in its Comet browser's premium service. Google and OpenAI also inked content-licensing contracts with Reddit in 2024, demonstrating how AI model training depends heavily on access to quality content.

Millions of Sites Turn to Blocking AI Bots as Defense Strategy

While major publishers negotiate deals, millions of smaller sites are taking a different approach: blocking AI bots entirely. Online traffic analysis by BuiltWith reveals that approximately 5.6 million websites have added OpenAI's GPTBot to the disallow list in their robots.txt file, up from about 3.3 million at the start of July 2025—an increase of almost 70 percent

2

. Anthropic's ClaudeBot now faces blocks at about 5.8 million websites, up from 3.2 million in early July, while AppleBot encounters similar resistance at 5.8 million sites.

Source: The Register

Source: The Register

Tollbit, a company helping publishers monetize AI access, reported a 336 percent increase in sites blocking AI crawlers over the past year

2

. As of July, about half of news sites blocked GPTBot, according to Arc XP, a publishing platform spun out of The Washington Post. Even Google's GoogleBot faces growing resistance, with 18 million sites now banning the bot—likely because it's also used for AI Overviews atop search results.

Unauthorized Content Scraping Persists Despite Blocking Efforts

The situation grows more troubling as evidence mounts that some AI companies ignore blocking rules. Tollbit's Q2 2025 report found that 13.26 percent of AI bot requests ignored robots.txt directives, up from 3.3 percent in Q4 2024

2

. This alleged behavior has sparked copyright infringement lawsuits, with the Tow Center tracking 21 lawsuits filed by publishers against AI providers, including PCMag's parent company Ziff Davis suing OpenAI in April 2025 for allegedly infringing copyrights in training and operating its AI systems

1

.

Will Allen, VP of product at Cloudflare, confirmed seeing "a lot of people that are out there trying to scrape large amounts of data, ignoring any robots.txt directives, and ignoring other attempts to block them"

2

. The challenge intensifies as AI firms launch browsers incorporating their models, making bot traffic indistinguishable from human visitors in site logs.

Bandwidth Costs from AI Crawlers Threaten Smaller Operations

Beyond lost revenue, AI crawling imposes significant bandwidth costs from AI crawlers on websites. In April, Wikipedia warned that an onslaught of AI bots—largely "automated programs that scrape the Wikimedia Commons image catalog of openly licensed images to feed images to AI models"—was eating into its server costs and capacity

1

. In October, blogging service Bear reported an outage based on AI bot traffic, highlighting how smaller operations lack resources to handle the surge.

AI-Enhanced Search Results Reduce Website Traffic

The impact on website traffic is measurable. A Pew Research Center study published in summer 2025 found that Google's AI Overviews diminished clickthrough rates among survey respondents from 15% to 8%

1

. While Google claims it's not seeing an overall drop and that AI Overviews send "high-quality" clicks resulting in more time spent at sites, it has yet to publish numbers documenting this claim. Food bloggers have been particularly affected, with one lamenting that AI summaries of recipes often leave readers with incorrect instructions while damaging traffic so severely that "I'm going to have to find something else to do."

New Solutions Emerge to Monetize AI Access

Cloudflare launched Pay Per Crawl in summer 2025, allowing site owners to grant access to AI crawlers from companies that pay for that access

1

. Speaking at the Web Summit conference in Lisbon in November, Cloudflare CEO Matthew Prince warned that "if these new AI tools aren't generating traffic, then the fundamental business model of the internet is going to break down." Fortune executive editor Jim Edwards confirmed this threat, stating that AI "is reducing readership, certainly, it's making revenue harder."

Prince noted that most AI developers recognize they must pay for content, "with one very notable exception"—widely understood to be Google

1

. He criticized Google for making it impossible for sites to permit its essential web indexing but block its AI crawling using standard robots.txt files, because the same bot does both tasks. "They need to play by the same rules as everyone else and split their crawler so that search and AI are two separate things," Prince argued.

Court rulings have yet to establish legal consensus on how much AI platforms should be able to reuse human work. In February, one federal judge ruled that a now-defunct AI startup infringed Thomson Reuters' copyrights when it leveraged Westlaw content to create a competing service. In June, another ruled that Anthropic buying books and scanning them to train Claude AI met fair-use criteria, but downloading copies from pirated works did not

1

. As data scraping intensifies and the business model of content creation faces pressure, publishers watch closely to see whether compensation or blocking will define the future relationship between AI and the web.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2025 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo