2 Sources
2 Sources
[1]
AI Platforms Are Paying (Some) Big Publishers, Leaving Smaller Ones Behind
An ideologically wide range of news outlets now stand to make some money off Meta's obsession with AI. CNN, Fox News, USA Today, The Daily Caller, People, Le Monde, and others have signed on to bring "real-time content on Meta AI." Partnering means paying; Meta's plans to compensate those publishers an undisclosed amount, Axios media reporter Sara Fischer confirms. It's the latest in a series of moves by the operators of AI services to pay sites for access to their content. A tracker of AI deals maintained by Columbia Journalism School's Tow Center for Digital Journalism lists 128 such arrangements between AI operators and news publishers since July 2023, including such high-profile tie-ups as OpenAI's deal with the Financial Times and Perplexity paying the Washington Post, the Los Angeles Times, and other publishers for inclusion in its Comet browser's premium service. (Tow's tracker also counts 21 lawsuits filed by publishers against AI providers in that time, including the lawsuit PCMag's parent company Ziff Davis filed against OpenAI in April 2025 alleging it infringed Ziff Davis copyrights in training and operating its AI systems.) But all of these deals, plus similar ones with non-news sites like the content-licensing contracts Google and OpenAI inked with Reddit in 2024, have one unfortunate thing in common: They leave out smaller sites that can't afford lawyers to negotiate with the likes of Google and Meta. And small and large sites seem equally exposed to the risk of AI-enhanced search results giving web users enough information to save them from having to click through to a search result. In a study published this summer, the Pew Research Center found that Google's AI Overview search results diminished the clickthrough rate among survey respondents from 15% to 8%. Google has repeatedly said that it's not seeing an overall drop in clickthrough traffic and that AI Overview sends sites a little more "high-quality" clicks, meaning ones that result in more time spent at the site. It has yet to publish numbers documenting that second claim. Court rulings have not yielded a legal consensus about how much an AI platform should be able to reuse the work of humans. In February, one federal judge ruled that a now-defunct AI startup infringed Thomson Reuters' copyrights when it leveraged content from that firm's Westlaw reference to create a competing service. In June, another ruled that Anthropic buying books and scanning them to train its Claude AI platform met fair-use criteria, but Anthropic downloading copies of books from a trove of pirated works did not. The crawlers that read sites to provide data for training AI models can also impose bandwidth costs on those sites. In April, Wikipedia warned that an onslaught of these AI bots -- largely "automated programs that scrape the Wikimedia Commons image catalog of openly licensed images to feed images to AI models" -- was eating into its server costs and capacity. And the automated results of all this AI crawling and scraping can wind up harming both online creators and their former readers. A Nov. 25 Bloomberg story recounted how AI summaries of recipes often leave readers with incorrect instructions while doing enough damage to the traffic of food bloggers that one lamented that "I'm going to have to find something else to do." Breaking the Fundamental Business Model of the Internet In July, the internet-services company Cloudflare, which already lets sites using its services (even the free tier) block AI-crawler bots, announced a new "pay per crawl" feature, which lets site owners grant access to AI crawlers from sites that pay for that access. In a panel at the Web Summit conference in Lisbon in November, Cloudflare CEO Matthew Prince called it a badly needed response to an existential threat to the internet we've known. "If these new AI tools aren't generating traffic, then the fundamental business model of the internet is going to break down," Prince told his onstage interviewer, Fortune executive editor Jim Edwards, who replied that Fortune has seen AI do just that: "It's reducing readership, certainly, it's making revenue harder." Prince, however, said he'd seen a recognition among most AI developers that they can't only take: "When we have conversations with the AI companies, with one very notable exception, they are all saying we have to pay for this content." You can probably guess the exception. Calling this one company both "the great patron of the internet for the last 27 years" and "the great villain of the internet today," Prince said Google makes it impossible for sites to permit its essential web indexing but block its AI crawling using standard robots.txt files, because the same bot does both tasks. "They need to play by the same rules as everyone else and split their crawler so that search and AI are two separate things," he said. Prince then suggested that Google was open to that idea: "I guarantee you that immediately after I get offstage, I will be having this conversation with senior Google executives." Google declined to provide a comment on Prince's talk. The company does allow site owners to block Google from using their content to train its Gemini AI platform, but that does not affect AI Overviews. A separate "nosnippet" option blocks Google from displaying a brief text preview of a page's content but affects both Google's traditional search as well as its AI Overviews. Cloudflare did not name any AI companies now making payments to site owners via Pay Per Crawl, citing this feature's private-beta status. An executive with a trade group for small online newsrooms couldn't offer any details about member uptake of this option. "I do not know -- and can't get clarity on -- which if any are using the anti-crawling tool," emailed Chris Krewson, executive director of LION Publishers (the abbreviation is short for "local independent online news"). He did note that Cloudflare had tried to sell LION on adopting it, which he took as evidence of limited early adoption. Another possibility for smaller sites and solo creators could be the Really Simple Licensing standard now backed by a coalition of larger online properties including Reddit, Yahoo, and Ziff Davis, which would let sites post terms for AI use of their content -- and which could work with Cloudflare's AI bot blocking or a similar screen acting as an enforcer. Toward the end of his Web Summit panel, Prince suggested that even AI developers weary of being leapfrogged by rivals should welcome being required to pay for access-because that could let them stand out by buying better content. "What's going to differentiate them?" he asked and then shared his own answer: "Do they have access to original and unique content?"
[2]
Publishers say no to AI scrapers, block bots at server level
A growing number of websites are taking steps to ban AI bot traffic so that their work isn't used as training data and their servers aren't overwhelmed by non-human users. However, some companies are ignoring the bans and scraping anyway. Online traffic analysis conducted by BuiltWith, a web metrics biz, indicates that the number of publishers trying to prevent AI bots from scraping content for use in model training has surged since July. About 5.6 million websites presently have added OpenAI's GPTBot to the disallow list in their robots.txt file, up from about 3.3 million at the start of July 2025. That's an increase of almost 70 percent. Websites can signal to visiting crawlers whether they allow automated requests to harvest information through entries in their robots.txt files. Compliance with these directives is voluntary, but repeated failure to respect these rules may come up in litigation, as it did in Reddit's scraping lawsuit against Anthropic earlier this year. Speaking of Anthropic, the company's ClaudeBot is also increasingly wearing out its welcome. ClaudeBot is now blocked at about 5.8 million websites, up from 3.2 million in early July. The company's Claude-SearchBot - used for surfacing sites in Claude search results - also faces a rising block rate. The situation is similar for AppleBot, now blocked at about 5.8 million websites, up from about 3.2 million in July. Even GoogleBot - which indexes data for search - faces growing resistance, perhaps because it's also used for the AI Overviews now surfaced atop search results. BuiltWith reports that 18 million sites now ban the bot, which would also mean that those sites could not be indexed in Google Search. As of July, about half of news sites blocked GPTBot, according to Arc XP, a publishing platform biz spun out of The Washington Post. Anthropic, OpenAI, and Google did not immediately respond to requests for comment. Anirudh Agarwal, CEO of OutreachX, a web marketing consultancy, said in an emailed statement that it's noteworthy how often GPTBot is getting turned away because that signals how publishers think about AI crawlers. If OpenAI's GPTBot is being blocked, every other AI crawler faces that possibility. Tollbit, a biz that aims to help publishers monetize AI traffic through access fees for crawlers, said in its Q2 2025 report that, in the past year, there's been a 336 percent increase in sites blocking AI crawlers. The company also said that, across all AI bots, 13.26 percent of requests ignored robots.txt directives in Q2 2025, up from 3.3 percent in Q4 2024. This alleged behavior has been challenged in court by Reddit as noted above, and in a lawsuit filed by major news publishers against Perplexity in 2024. But bot blocking efforts have become more complicated because AI firms like OpenAI and Perplexity have launched browsers that incorporate their AI models. According to the Tollbit report, "The latest AI browsers like Perplexity Comet, and devtools like Firecrawl or Browserless are indistinguishable from humans in site logs." So publishers that block Comet or the like might just be blocking human traffic. As a result, Tollbit argues, it's critical that non-human site traffic accurately identifies itself. For organizations that are not major publishers, the AI bot onslaught can be overwhelming. In October, blogging service Bear reported an outage based on AI bot traffic, a problem also noted by Belgium-based blogger Wouter Groeneveld. And developer David Gerard, who runs AI-skeptic blog Pivot-to-AI, last month wrote on Mastodon about how RationalWiki.org was having trouble keeping AI bots at bay. Will Allen, VP of product at Cloudflare, told The Register in an interview last month that the company sees "a lot of people that are out there trying to scrape large amounts of data, ignoring any robots.txt directives, and ignoring other attempts to block them." Bot traffic, said Allen, is increasing, which in and of itself isn't necessarily a bad thing. But it does mean, he said, that there are more attacks and more people trying to get around paywalls and content restrictions. Cloudflare, over the summer, launched a service called Pay per crawl in a bid to allow content owners to offer automated access for a price. Allen declined to disclose which sites have signed up to participate in the beta testing but said it's clear that new economic options would be helpful. "We have a thesis or two about how that could evolve," he said. "But really, we think there's going to be a lot of different evolution, a lot of different experimentation. And so we're keeping a pretty tight private beta for our Pay per crawl product just to really learn, from both sides of the market - people who are looking to access content at scale and people who are looking to protect content." ®
Share
Share
Copy Link
A stark divide is emerging in how publishers respond to AI crawling. Major outlets like CNN and Fox News are signing content licensing deals with Meta and OpenAI for undisclosed sums, while over 5.6 million websites now block GPTBot—a 70% surge since July. Meanwhile, 13% of AI bots ignore blocking rules entirely, forcing smaller publishers to fight unauthorized content scraping without legal resources.
A two-tier system is taking shape across the publishing industry as AI crawling intensifies. Meta recently announced partnerships with CNN, Fox News, USA Today, The Daily Caller, People, and Le Monde to bring "real-time content on Meta AI," with the company paying publishers undisclosed amounts
1
. These content licensing deals represent the latest in a growing trend of AI platforms compensating select publishers for access to their work.
Source: PC Magazine
According to Columbia Journalism School's Tow Center for Digital Journalism, 128 such arrangements have been signed between AI operators and news publishers since July 2023
1
. High-profile examples include OpenAI's deal with the Financial Times and Perplexity paying the Washington Post and Los Angeles Times for inclusion in its Comet browser's premium service. Google and OpenAI also inked content-licensing contracts with Reddit in 2024, demonstrating how AI model training depends heavily on access to quality content.While major publishers negotiate deals, millions of smaller sites are taking a different approach: blocking AI bots entirely. Online traffic analysis by BuiltWith reveals that approximately 5.6 million websites have added OpenAI's GPTBot to the disallow list in their robots.txt file, up from about 3.3 million at the start of July 2025—an increase of almost 70 percent
2
. Anthropic's ClaudeBot now faces blocks at about 5.8 million websites, up from 3.2 million in early July, while AppleBot encounters similar resistance at 5.8 million sites.
Source: The Register
Tollbit, a company helping publishers monetize AI access, reported a 336 percent increase in sites blocking AI crawlers over the past year
2
. As of July, about half of news sites blocked GPTBot, according to Arc XP, a publishing platform spun out of The Washington Post. Even Google's GoogleBot faces growing resistance, with 18 million sites now banning the bot—likely because it's also used for AI Overviews atop search results.The situation grows more troubling as evidence mounts that some AI companies ignore blocking rules. Tollbit's Q2 2025 report found that 13.26 percent of AI bot requests ignored robots.txt directives, up from 3.3 percent in Q4 2024
2
. This alleged behavior has sparked copyright infringement lawsuits, with the Tow Center tracking 21 lawsuits filed by publishers against AI providers, including PCMag's parent company Ziff Davis suing OpenAI in April 2025 for allegedly infringing copyrights in training and operating its AI systems1
.Will Allen, VP of product at Cloudflare, confirmed seeing "a lot of people that are out there trying to scrape large amounts of data, ignoring any robots.txt directives, and ignoring other attempts to block them"
2
. The challenge intensifies as AI firms launch browsers incorporating their models, making bot traffic indistinguishable from human visitors in site logs.Beyond lost revenue, AI crawling imposes significant bandwidth costs from AI crawlers on websites. In April, Wikipedia warned that an onslaught of AI bots—largely "automated programs that scrape the Wikimedia Commons image catalog of openly licensed images to feed images to AI models"—was eating into its server costs and capacity
1
. In October, blogging service Bear reported an outage based on AI bot traffic, highlighting how smaller operations lack resources to handle the surge.Related Stories
The impact on website traffic is measurable. A Pew Research Center study published in summer 2025 found that Google's AI Overviews diminished clickthrough rates among survey respondents from 15% to 8%
1
. While Google claims it's not seeing an overall drop and that AI Overviews send "high-quality" clicks resulting in more time spent at sites, it has yet to publish numbers documenting this claim. Food bloggers have been particularly affected, with one lamenting that AI summaries of recipes often leave readers with incorrect instructions while damaging traffic so severely that "I'm going to have to find something else to do."Cloudflare launched Pay Per Crawl in summer 2025, allowing site owners to grant access to AI crawlers from companies that pay for that access
1
. Speaking at the Web Summit conference in Lisbon in November, Cloudflare CEO Matthew Prince warned that "if these new AI tools aren't generating traffic, then the fundamental business model of the internet is going to break down." Fortune executive editor Jim Edwards confirmed this threat, stating that AI "is reducing readership, certainly, it's making revenue harder."Prince noted that most AI developers recognize they must pay for content, "with one very notable exception"—widely understood to be Google
1
. He criticized Google for making it impossible for sites to permit its essential web indexing but block its AI crawling using standard robots.txt files, because the same bot does both tasks. "They need to play by the same rules as everyone else and split their crawler so that search and AI are two separate things," Prince argued.Court rulings have yet to establish legal consensus on how much AI platforms should be able to reuse human work. In February, one federal judge ruled that a now-defunct AI startup infringed Thomson Reuters' copyrights when it leveraged Westlaw content to create a competing service. In June, another ruled that Anthropic buying books and scanning them to train Claude AI met fair-use criteria, but downloading copies from pirated works did not
1
. As data scraping intensifies and the business model of content creation faces pressure, publishers watch closely to see whether compensation or blocking will define the future relationship between AI and the web.Summarized by
Navi
[2]
10 Sept 2025•Technology

14 Nov 2025•Business and Economy

01 Jul 2025•Technology

1
Science and Research

2
Technology

3
Policy and Regulation
