7 Sources
[1]
Cloudflare's new policy pushes AI companies to pay for publishers' content
Cloudflare has just issued the AI industry a new deadline to separate the web crawlers used for traditional search purposes, like Google Search, from those used for AI agents and training. Starting on September 15, 2026, Cloudflare's default settings will block "mixed-use" crawlers from any pages that host ads, the company announced on Wednesday. That means that the crawlers that blend search, agent use, and training will be blocked from crawling these sites by default, unless the site owner adjusts the settings otherwise. These changes to the defaults will apply to new Cloudflare customers, new sites set up by existing customers, and all existing free customers, the company says. The move could impact how AI model providers are able to access web content for training purposes and to help power their agentic services. Cloudflare points out that most website owners want their content to be discoverable via search and often through AI services as well, but they want protections against having their intellectual property given away for free. Cloudflare specifically calls out the "world's largest search engine" (clearly a Google reference!) as having access to about "2x more information" than other AI companies because the search giant makes it difficult for customers to remain discoverable without being used for AI. Google has pushed back against this generalization in the past, noting that it provides a bot called Google Extended that lets site owners opt out of having their content used for training and AI products and services like Gemini Apps and Vertex API. Its use doesn't impact a site's inclusion in Google Search. However, the tech giant's flagship Googlebot crawls for Search, including AI features like AI Overviews and AI Mode. "Now that the majority of traffic on the Internet is non-human, we must go further and act faster so that a sustainable ecosystem can emerge," said Cloudflare co-founder and CEO Matthew Prince in his announcement of the news, referring to the recent milestone where bots surpassed human traffic online for the first time. That shift was not expected to occur until next year. "Cloudflare's new tools and partnerships give website owners increased visibility and commercial opportunities and benefit AI companies that have bots with clear and transparent intent. We hope that our proposed default changes encourage mixed-use crawlers to separate out search from agent use and training," Prince said. While Cloudflare offers a number of products to help users launch their own AI systems, the company has also released a range of tools to give publishers more control over their content in the AI era. In recent years, Cloudflare launched tools to combat AI bots, including a marketplace that lets websites charge AI bots for scraping, dubbed Pay Per Crawl. The latter is now also evolving into "Pay Per Use," the company said, which will allow publishers to charge AI companies when their content creates value, not just when it's fetched. The change could also help conserve publishers' bandwidth and compute resources for AI model providers, as Cloudflare's data suggested that over 50% of crawl traffic from AI crawlers is spent re-fetching unchanged pages. To put this into action, Cloudflare is initially working with two partners, Ceramic.ai and You.com. When a publisher opts in, they're paid when their content appears in Ceramic's AI search results or when You.com accesses a piece of their premium content. Other AI companies can customize this model for how they work, Cloudflare says.
[2]
Cloudflare to block cynical search-and-scrape bots from ad-supported web pages
Cloudflare on Wednesday said it will soon prevent mixed-use crawlers from accessing ad-supported customer websites by default, part of its ongoing efforts to give site publishers more control over how they engage with AI services. Apple, Google, and Microsoft's Bing operate crawlers that could fall afoul of Cloudflare's decision, although each of the tech giants offers an AI opt-out that may allow them to escape sanctions. Web crawlers make automated network requests to websites for various purposes. Google has used them for decades to visit websites for inclusion in its search index. Over the past few years, many crawlers have started visiting sites to harvest content for training AI models. This has prompted various countermeasures - publishers feel they're not being fairly compensated for the content AI companies scrape to feed into their models. But since Google's crawler, Googlebot, combines crawling for search indexing and content harvesting for AI training, site publishers have tended to accept the bot's presence because they fear blocking could mean they disappear from Google Search results. The situation is similar for Microsoft's Bingbot. And Apple also has enlisted its Applebot crawler to handle AI data gathering in addition to its indexing duties. The iBiz in June said: "The data crawled by Applebot may also be used to help train Apple foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools." Apple and Google support robots.txt directives that allow publishers to opt out of AI data harvesting (via Applebot-Extended and Google-Extended). Bing supports a content="noarchive" attribute for the robots meta tag that also blocks data harvesting. Other crawler operators, however, often ignore the voluntary robots.txt. Cloudflare therefore aims to provide site owners with a declarative content gate. "Now that the majority of traffic on the Internet is non-human, we must go further and act faster so that a sustainable ecosystem can emerge," said Matthew Prince, co-founder and CEO of Cloudflare, in a statement. "Cloudflare's new tools and partnerships give website owners increased visibility and commercial opportunities and reward AI companies that have bots with clear and transparent intent. We hope that our proposed default changes encourage mixed use crawlers to separate out search from agent use and training." Starting September 15, 2026, new Cloudflare customers and new sites for existing customers will default to allowing search crawling but blocking training and agents from pages with ads. The changes will also be applied to free tier customers who have not changed their settings. As the company puts it: "This ensures that content that drives revenue cannot be crawled without explicit permission of those content owners." Between humans running ad blockers and Cloudflare blocking bots from pages with ads, a lot of marketing material may be consigned to oblivion. Cloudflare customers, however, can readmit crawlers to their ad-supported pages by changing their default site settings. Cloudflare is also making two other changes. Its "Pay Per Crawl" tollbooth is being rebranded "Pay Per Use." The idea is to reward publishers when their content creates value instead of just when it's fetched. To make that happen, Cloudflare is partnering with Ceramic.ai, an API-based search biz, so that publishers get paid whenever their content appears in a Ceramic.ai search result. It's also working with You.com, a search engine for AI agents, to generate content payments whenever there's demand from an agent. A company spokesperson didn't immediately respond when asked about Pay Per Crawl uptake. Finally, Cloudflare is introducing a new Business Insights Dashboard to give publishers more visibility into how bots are consuming content and how much traffic AI models send. ®
[3]
Cloudflare will filter out web crawlers that serve AI companies - Engadget
The hosting platform wants sites to have more control over how AI companies use their content. Cloudflare has announced plans to automatically block mixed-use web crawlers that index websites for search engines and act as AI agents and trainers at the same time. The company previously offered its customers the optional ability to prevent crawlers from scraping their sites for AI chatbots, but now Cloudflare's stance is becoming more defensive by default. "Now that the majority of traffic on the Internet is non-human, we must go further and act faster so that a sustainable ecosystem can emerge," Matthew Prince, Cloudflare's CEO and co-founder shared in a statement. "Cloudflare's new tools and partnerships give website owners increased visibility and commercial opportunities and benefit AI companies that have bots with clear and transparent intent. We hope that our proposed default changes encourage mixed use crawlers to separate out search from agent use and training." Web traffic used to indicate that people were viewing a website's ads or paying for its subscriptions, but the popularity of AI models that can visit sites on a user's behalf to pull up-to-date information has upended that system. Cloudflare's new approach is an attempt to rebalance the relationship in a way that's fair for both AI companies and anyone running a website. Starting September 15, 2026, new customers and new websites from existing Cloudflare subscribers will default "to allow for search but block training and agent use for pages with ads." Mixed-use crawlers that don't give site owners the option to choose whether their site is used for AI will also be blocked on pages with ads by default. Users with free accounts will also switch to these defaults unless they opt-out ahead of the September 15 deadline, according to the company. As part of these changes, Cloudflare is also releasing a new version of the Pay Per Crawl feature it introduced in 2025 that allowed websites to block AI web crawlers by default unless companies paid to scrape their content. The feature is now called Pay Per Use, and rather than base payments on whether a webpage has been crawled, Cloudflare says site owners will be paid when their content appears in answers from AI chatbots. The announcement only mentions partnerships with Ceramic.AI and You.com, but Cloudflare likely hopes other AI companies will join as its customers opt in. Besides generally trying to make the relationship between websites and AI companies more fair, as TechCrunch notes, Cloudflare also seems to be indirectly targeting Google. The company's announcement mentions that "the largest search engine has access to about 2X more information than leading AI companies because they make it difficult for customers to remain discoverable without also being used for AI." Google's main crawler, Googlebot, both indexes websites for the company's various search engines and collects information to train Gemini and power AI features like AI Overviews and AI Mode. Google lets websites opt-in to a separate crawler called Google-Extended that only crawls websites for traditional search results, but if a publisher wanted to be included in AI Mode results, but doesn't want their content to train Google's models, they don't have an option. Cloudflare's new policy is an attempt to force Google and other companies with mixed-use crawlers to change their tactics.
[4]
Cloudflare gives AI crawlers a September deadline to pay up
From 15 September, Cloudflare will block crawlers that harvest content for AI training from any page carrying ads, unless the owner opts in, and pay publishers when their work shapes an AI answer. It is the boldest bid yet to make AI pay for the open web. Cloudflare has set the AI industry a deadline. From September, it will block the crawlers that hoover up content for AI training. Any page that carries ads becomes off-limits, unless the site's owner says otherwise. The pitch is simple: stop giving the web away for free. The company sits in front of a large share of the world's web traffic. It announced the change on Wednesday. From 15 September, new Cloudflare sites will keep letting search engines index their pages. They will block AI training and AI agents from any page with advertising by default. The rule also catches "mixed-use" crawlers, the bots that blend search, training, and agent tasks into one. If a crawler will not let a site owner separate those uses, it gets blocked on ad-supported pages. The defaults apply to new customers and to new sites from existing customers. They also cover every free user who has not changed their settings. Owners can always let the bots back in from their dashboard. But the starting position has flipped. Content that earns money is now off-limits to AI unless its owner opts in. Why now Cloudflare's argument rests on a stark number. Automated bots now drive more than half of all web traffic, a milestone the company says arrived earlier than expected. Chief executive Matthew Prince said most internet traffic is now non-human. Cloudflare, he argued, "must go further and act faster so that a sustainable ecosystem can emerge." The deeper problem is a trap publishers know well. Most sites want to appear in AI answers, just as they want to rank in search. But the same crawl often feeds a model that then answers the user directly. The visit, and the ad revenue, never arrive. Cloudflare singled out the "world's largest search engine," a clear jab at Google. Its Googlebot blends indexing with AI training. That gives Google roughly twice the data access of rival AI firms. Blocking the bot risks vanishing from search. Microsoft's Bing and Apple's Applebot raise the same dilemma. From tollbooth to meter Blocking is only half the plan. Cloudflare is turning last year's "Pay Per Crawl" tollbooth into "Pay Per Use." It now pays publishers when their content shapes an AI answer, not just when it is fetched. Early partners are the AI search firms Ceramic.ai and You.com. Cloudflare is also adding a dashboard so publishers can see which bots take their work and how little traffic those firms send back. It gives the behaviour a name, Answer Engine Optimisation, the AI-era heir to SEO. That reframing lands in a market already tilting this way. A wave of startups now sells tools to help brands stay visible inside chatbots, betting that GEO is the new SEO. Cloudflare wants to own the plumbing beneath it. The open web at stake The backdrop is grim for publishers. AI-generated answers are cutting the clicks that fund the web. They keep users on Google or inside a chatbot, rather than on the sites that did the work. One field study found Google's AI Overviews cut outbound clicks by about 40 per cent. Economists have even started to model an outright collapse of the open web if the bargain is not repaired. Whether one company can repair it is doubtful. Google and Apple already offer opt-out crawlers that may slip past Cloudflare's block, and rivals could route around it. Regulators are circling the same problem from another angle. The UK is forcing Google to let publishers opt out of AI search without losing their ranking, and news publishers are suing OpenAI over training. Cloudflare's move is the most aggressive attempt yet to make AI pay for what it reads. The deadline is 15 September. The rest of the web will be watching what the AI giants do next.
[5]
Cloudflare to block AI crawlers from ad-supported webpages by default
Come 15 September, multipurpose crawlers used by the likes of Google, Microsoft and Apple will be blocked by default according to Cloudflare's new rules. IT and network services provider Cloudflare has announced new rules designed to give website owners more control over the types of web crawlers will be allowed or blocked from their sites - along with plans to block multipurpose crawlers by default on ad-supported pages. Traditionally, search engines and website maintained a sort of "symbiotic relationship", as Cloudflare puts it, whereby web owners allowed search engines to crawl their sites and in return, search engines sent users back to their pages. The company explained that this crawl-to-referral process, when balanced, would help sites generate the pageviews needed to sustain advertising, affiliate revenue and subscriptions. However, the rise of AI crawlers and agents changed things, where AI chatbots scrape sites to synthesise answers and bypass original sources - often leading to imbalanced crawl-to-referral ratios. Cloudflare's own research from last year noted ratios of 118:1 up to nearly 50,000:1 - meaning an AI crawler could have scraped a site thousands of times and only sent back a single user. Nowadays, many of these crawlers are used for multiple purposes - AI training and search indexing - which puts website owners in a difficult position, as turning off all automation and crawler access to their sites could diminish their chances of showing up on search results. Cloudflare hopes to tackle this issue with its new rules, which include options for managing crawler access by establishing three categories of crawler purposes: Search, Agent and Training. Search refers to crawlers that are used for search indexing, Agent refers to automated behaviours used by the likes of chatbots and browser-use agents, and Training refers to crawlers that scrape content for fine-tuning AI models. With these three classifications, website owners will be able to selectively allow or block crawlers that are used for each of the three classifications - meaning, if a web owner wanted to allow Search crawlers but block Agent and Training crawlers, they will now be able to do so As part of these new rules, Cloudflare will also block Training and Agent crawlers by default on pages that display ads. The default block settings, which will apply to any new domain onboarded to Cloudflare from 15 September, won't apply to crawlers used for search indexing, while multipurpose crawlers - specifically those used for both search and training purposes - will be allowed or blocked "according to all of their behaviours". As a result, multipurpose crawlers used by the likes of Google, Microsoft and Apple will be blocked by default come 15 September. "We believe it should be simple for all website owners to manage access for these three AI-centered use cases," read a blogpost by Cloudflare. "We believe that bot operators should separate their crawlers because that creates more transparency for website owners: allowing them to better understand why a given crawler is visiting them, as well as to better manage the access they extend to that crawler. "If a company runs automation that builds Search indexes, acts as an Agent, and collects data to Train their models, then we strongly encourage that company to separate the automation into three separate crawlers." In the lead up to the September default deadline, Cloudflare customers can opt out of the default settings if they want to. Cloudflare's new rules are the latest in the company's attempts to curb crawler misuse. This time last year, the company introduced new crawler controls for website owners, including a 'pay per crawl' system designed to integrate with existing web infrastructure and leverage HTTP status codes and established authentication mechanisms to create a framework for paid content access. The year before that, Cloudflare introduced a tool that allowed website owners to block all bots at once. Don't miss out on the knowledge you need to succeed. Sign up for the Daily Brief, Silicon Republic's digest of need-to-know sci-tech news.
[6]
Cloudflare will block AI crawlers unless sites opt in
Cloudflare plans to automatically block mixed-use web crawlers that index websites for search engines while also serving as AI agents and trainers. The company previously offered customers the option to prevent these crawlers from scraping their sites for AI chatbots but is now adopting a more defensive default position. CEO Matthew Prince stated, "Now that the majority of traffic on the Internet is non-human, we must go further and act faster so that a sustainable ecosystem can emerge." He emphasized that the new tools and partnerships give website owners increased visibility and commercial opportunities while benefiting AI companies with transparent intent. Starting September 15, 2026, new customers and new websites from existing subscribers will have settings that default to allow search but block training and agent use on pages containing ads. Mixed-use crawlers that do not allow site owners to control AI content use will also be blocked on ad pages by default. Free account users will automatically switch to these new defaults unless they opt-out before the deadline. Cloudflare is updating its Pay Per Crawl feature, now named Pay Per Use, where site owners will receive compensation when their content is referenced by AI chatbots. The announcement includes partnerships with Ceramic.AI and You.com, with expectations of attracting further AI companies to adopt these changes. The new policy is viewed as a strategic move to challenge Google, which has access to significantly more information than leading AI companies. Google's main web crawler, Googlebot, effectively indexes sites and collects data for its AI models, complicating publishers' options regarding content usage. Cloudflare's policies aim to prompt Google and other firms with mixed-use crawlers to reconsider their strategies.
[7]
Cloudflare Arms Website Owners in Fight Against AI Crawlers | PYMNTS.com
The new offerings are designed to provide this choice to, for example, businesses that are built on advertising or subscriptions and don't want AI systems training on their content without compensation, the company said in a Wednesday (July 1) press release. "We believe that if you're a business that wants your content in AI systems, we should make it as easy and efficient as possible; but if you're a business where you do not, then you should have the tools to restrict AI's access," Cloudflare said in the release. "In response, Cloudflare is testing new default classifications, delivering deeper insights to customers, making AI search faster, and ensuring creators are compensated when their content powers an answer." Cloudflare plans to change its defaults on Sept. 15 to allow for search, but block training and agent use for pages with ads. Customers will be able to change their setting at any time in the dashboard, according to the release. In addition, the company is introducing a new Attribution Business Insights dashboard that lets businesses see how AI bots consume their content and how much traffic each AI company sends back to the site; testing signals that tell AI crawlers whether a webpage has changed in order to reduce wasted crawling; and evolving its Pay Per Crawl into Pay Per Use so that publishers get paid when their content creates value, per the release. "Now that the majority of traffic on the internet is nonhuman, we must go further and act faster so that a sustainable ecosystem can emerge," Cloudflare Co-Founder and CEO Matthew Prince said in the release. "Cloudflare's new tools and partnerships give website owners increased visibility and commercial opportunities and benefit AI companies that have bots with clear and transparent intent." The announcement came a year after Cloudflare introduced a tool that lets website owners decide if they want AI crawlers to access their content, determine how AI firms can use the content, and set a price for access via the Pay Per Crawl model. Prince said in the Wednesday press release that "we are thrilled with the benefits it has had to the ecosystem."
Share
Copy Link
Cloudflare announced it will block mixed-use web crawlers from ad-supported pages starting September 15, 2026, unless site owners opt in. The new policy targets Google, Microsoft, and Apple's multipurpose bots that blend search indexing with AI training. Publishers will now get paid when their content appears in AI answers through partnerships with Ceramic.ai and You.com.
Cloudflare has drawn a line in the sand for AI companies that scrape the web without fair compensation. Starting September 15, 2026, the company will block mixed-use web crawlers from ad-supported web pages by default, fundamentally shifting how AI companies access publisher content . The Cloudflare new policy applies to new customers, new sites from existing customers, and all free-tier users who haven't modified their settings, though site owners retain the ability to adjust permissions
2
.
Source: Silicon Republic
The move directly addresses a growing imbalance in web infrastructure where bots now generate more than half of all internet traffic, a milestone that arrived earlier than anticipated
4
. "Now that the majority of traffic on the Internet is non-human, we must go further and act faster so that a sustainable ecosystem can emerge," said Cloudflare co-founder and CEO Matthew Prince .
Source: TechCrunch
Cloudflare specifically calls out what it describes as the "world's largest search engine"—a clear reference to Google—noting the company has access to roughly 2x more information than other AI companies because it makes separation difficult for publishers . Googlebot combines search indexing with content scraping for AI training, powering features like AI Overviews and AI Mode while simultaneously feeding data into Gemini models
3
.Similar issues plague Microsoft's Bingbot and Apple's Applebot, which also serve dual purposes
2
. Apple recently disclosed that "data crawled by Applebot may also be used to help train Apple foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools"2
. While these tech giants offer opt-out mechanisms through Google-Extended, Applebot-Extended, and Bing's noarchive attribute, publishers face a difficult choice: allow AI training or risk disappearing from search results entirely.To address this dilemma, Cloudflare introduced a classification system that separates crawler purposes into three distinct categories: Search, Agent, and Training
5
. Search refers to crawlers used for search indexing, Agent covers automated behaviors used by chatbots and browser-use agents, and Training encompasses data scraping for fine-tuning AI models5
.This granular control allows website owners to selectively permit search indexing while blocking Agent and Training activities on the same pages. The system aims to restore transparency to publisher-crawler relationships and force AI companies to separate their multipurpose bots into distinct crawlers with clear intent
5
.Cloudflare is evolving its monetization approach by transforming last year's Pay Per Crawl marketplace into a Pay Per Use model . Instead of charging AI companies when they fetch content, publishers will now receive payment when their work appears in AI-generated answers and creates actual value
4
.Initial partnerships with Ceramic.ai and You.com demonstrate how this works in practice. When a publisher opts in, they receive compensation when their content surfaces in Ceramic.ai's AI search results or when You.com accesses their premium content . Other AI companies can customize this framework to match their specific operational models, Cloudflare says .
Related Stories
The urgency behind these changes stems from alarming data about how AI companies exploit the open web. Cloudflare's research revealed crawl-to-referral ratios ranging from 118:1 to nearly 50,000:1, meaning AI crawlers could scrape a site thousands of times while sending back only a single user
5
. Additionally, over 50% of crawl traffic from AI crawlers involves re-fetching unchanged pages, wasting publishers' bandwidth and compute resources .
Source: The Register
This imbalanced relationship threatens the traditional web ecosystem where search engines and websites maintained what Cloudflare describes as a "symbiotic relationship"
5
. AI chatbots now synthesize answers that keep users on their platforms rather than directing traffic to original sources, cutting the pageviews that sustain advertising, affiliate revenue, and subscriptions5
. One field study found Google's AI Overviews cut outbound clicks by approximately 40%, prompting economists to model potential collapse scenarios for the open web if this trend continues unchecked4
.Cloudflare customers can opt out of the default blocking settings before the September 15, 2026 deadline if they prefer to maintain current access levels
5
. The company is also introducing a Business Insights Dashboard that provides publishers with visibility into which bots consume their content and how much traffic AI models actually send back2
.Whether this policy will force major tech companies to restructure their crawler operations remains uncertain. Google, Apple, and Microsoft could potentially route around these restrictions or argue their existing opt-out crawlers satisfy Cloudflare's transparency requirements
4
. Regulators are approaching similar issues from different angles—the UK is already forcing Google to let publishers opt out of AI search without losing their ranking, while news publishers have filed lawsuits against OpenAI over unauthorized training4
. This represents the most aggressive industry attempt yet to make AI companies pay for the content they consume, and the response from major AI players in the coming months will shape the future relationship between publishers and artificial intelligence.Summarized by
Navi
[4]
[5]
1
Policy and Regulation

2
Policy and Regulation

3
Policy and Regulation
