2 Sources
[1]
Cloudflare wants Google to change its AI search crawling. Google likely won't.
After Cloudflare started testing new features that would allow websites to block AI crawlers or require payment for scraping, the tech company immediately faced questions over the logistics of the plan. In particular, website owners and SEO experts wanted to know how Cloudflare planned to block Google's bot from scraping sites to fuel AI overviews without risking blocking the same bot from crawling for valuable search engine placements. Last week, a travel blogger raised questions about the blocking and so-called pay-per-crawl features pushed Cloudflare CEO Matthew Prince to respond on X (formerly Twitter). "We will get Google to provide ways to block Answer Box and AI Overview, without blocking classic search indexing, as well," Prince said. Asked if that was even possible, Prince doubled down, responding, "it is. #staytuned" In another post responding to a search engine optimization specialist, he claimed that Cloudflare was in "encouraging" talks with Google that he hopes will result in Google separating its crawlers to better work in Cloudflare's system. But if those talks go nowhere, he revealed Cloudflare is pushing for a law to be passed that's considered a "very viable option" in "many jurisdictions." "Worst case we'll pass a law somewhere that requires them to break out their crawlers and then announce all routes to their crawlers from there," Prince said. "And that wouldn't be hard. But I'm hopeful it won't need to come to that." Ars could not immediately find any legislation that seemed to match Prince's description, and Cloudflare did not respond to Ars' request to comment. Passing tech laws is notoriously hard, though, partly because technology keeps advancing as policy debates drag on, and challenges with regulating artificial intelligence are an obvious example of that pattern today. Google declined Ars' request to confirm whether talks were underway or if the company was open to separating its crawlers. Although Cloudflare singled out Google, other search engines that view AI search features as part of their search products also use the same bots for training as they do for search indexing. It seems likely that Cloudflare's proposed legislation would face resistance from tech companies in a similar position to Google, as The Wall Street Journal reported that the tech companies "have few incentives to work with intermediaries." Additionally, Cloudflare's initiative faces criticism from those who "worry that academic research, security scans, and other types of benign web crawling will get elbowed out of websites as barriers are built around more sites" through Cloudflare's blocks and paywalls, the WSJ reported. Cloudflare's system could also threaten web projects like The Internet Archive, which notably played a crucial role in helping track data deleted from government websites after Donald Trump took office. Among commenters discussing Cloudflare's claims about Google on Search Engine Round Table, one user suggested Cloudflare may risk a lawsuit or other penalties from Google for poking the bear. Ars will continue monitoring for updates on Cloudflare's attempts to get Google on board with its plan.
[2]
The New AI Sheriff Takes a Shot at Google
In a series of pointed X posts, Cloudflare’s CEO Matthew Prince says out a bold new policy that treats AI companies like unwelcome guests, and hints that even Google might be forced to play by his rules. Cloudflare CEO Matthew Prince didn’t make his biggest AI announcement in a press release or earnings call. He made it one X reply at a time. Earlier this month, Cloudflare launched what it called "Content Independence Day," a policy change that blocks AI companies from scraping the websites it protects unless they compensate content creators. The move challenges the decades-old web economy where companies like Google could freely index content in exchange for traffic, and replaces it with a new, much tougher standard: no more crawling without a deal. But the real story is what happened next. In a series of unfiltered X (formerly Twitter) replies over several days, Prince revealed that Cloudflare is already treating some AI giants as violators, signaling a dramatic power shift in who sets the rules of the web. One of the most notable admissions? “Gemini is blocked by default,†Prince wrote on July 3, referring to Google’s AI model. In other words, Google’s AI agents are no longer welcome to freely ingest data from websites protected by Cloudflare unless Google complies with new rules or pays. That’s a huge deal. Cloudflare protects roughly 20% of the web, including major publishers, media outlets, and creator platforms. If it cuts off AI crawlers from those sites, the large language models that power today’s chatbots, AI summaries, and answer boxes could go hungry. A major sticking point for publishers has been Googlebot, Google's main crawler, which traditionally indexed content for search. Now, Googlebot is also used to feed data to Google's AI models, including its new AI Overviews and the Gemini LLM (Large Language Model) that powers many of its generative AI features. This dual role creates a conflict of interest for creators who want to appear in traditional search results but not have their content used for AI training without compensation. Prince made it clear that Google’s current practices won’t be allowed under the old terms. “We will get Google to provide ways to block Answer Box and AI Overview, without blocking classic search indexing, as well,†he wrote. If not? “We have a number of other ways to force them to.†Translation: Cloudflare, a company once known for protecting websites from DDoS attacks, now sees itself as a watchdog for the AI economy, and it’s not afraid to flex. But a user pushed back on the technical feasibility of this: "Is that possible? Are ai overviews not a representation of the search index ranking itself - isn’t most rag?" "Rag" refers to Retrieval Augmented Generation, where LLMs pull from indexed data. Prince's curt, confident reply: "It is. #staytuned." This hashtag hints that Cloudflare believes it has the technical chops to separate AI driven summaries and features from standard search indexing, something Google has thus far been unwilling or unable to offer publishers. Prince’s tone is diplomatic but unmistakably firm. He says he is "encouraged from conversations with them." But he also hints at enforcement tools if Big Tech doesn’t cooperate. "Worst case we’ll pass a law somewhere that requires them to break out their crawlers and then announce all routes to their crawlers from there. And that wouldn’t be hard. But I’m hopeful it won’t need to come to that," he said in another post. The idea of Google being "forced" to adapt is a powerful statement coming from a company that doesn't dictate web standards, but effectively controls a significant portion of its traffic. Prince's confidence underscores Cloudflare's unique leverage. Prince isn’t alone in sounding the alarm. One X user asked about blocking Amazon’s AI crawler, Nova. Prince responded by acknowledging “conflicts of interest with the hyperscalers and their AI efforts," a reference to how companies like Amazon, Google, and Microsoft run both AI services and massive infrastructure backbones. Prince’s comments go further than most CEOs in tech have dared. While others issue vague calls for “AI safety†or “fair licensing,†he’s laying out the next steps. First: stop the crawlers. Second: build a marketplace where AI engines pay creators not for traffic, but for value, or how well their content fills knowledge gaps in AI models. Think of it as SEO for the post-search web. Technically speaking, Cloudflare can enforce these rules by identifying AI user agents, basically the software labels that crawlers use, and blocking them automatically unless allowed. For instance, it can block Gemini (Google), Claude (Anthropic), and ChatGPT (OpenAI) from accessing content unless a publisher explicitly whitelists them. It’s not a perfect systemâ€"companies can spoof crawlers, and not all bots identify themselvesâ€"but it’s a powerful signal. And with billions of pages under its watch, Cloudflare is now in a unique position to shape the future of AI training data, one firewall rule at a time. Prince’s posts reveal that the rules of engagement are shifting in real time. AI companies, once used to quietly hoovering up the web, may soon need to negotiate publicly, transparently, and on creator-friendly terms. In short: Cloudflare wants to protect the very idea that content has value. Think of Cloudflare as a massive, intelligent security guard and express delivery service for your website. When someone tries to access your site, their request often goes through Cloudflare's global network first. Cloudflare can then block malicious traffic, speed up content delivery, and, crucially, identify and control specific types of automated bots, like AI crawlers, before they ever reach your website's actual server. In the emerging AI arms race, that makes Cloudflare one of the most important gatekeepers on the internet.
Share
Copy Link
Cloudflare CEO Matthew Prince announces new policies to block AI crawlers, including Google's, from scraping websites without compensation, potentially reshaping the web's content economy.
Cloudflare, a company known for protecting websites from DDoS attacks, has taken a significant step in reshaping the web's content economy. CEO Matthew Prince announced a new policy dubbed "Content Independence Day," which aims to block AI companies from scraping websites protected by Cloudflare unless they compensate content creators 1.
Source: Gizmodo
At the heart of this initiative is a direct challenge to Google and other AI companies. Cloudflare is now treating some AI giants as violators of their new policy. Prince revealed that "Gemini is blocked by default," referring to Google's AI model 1. This move is particularly impactful as Cloudflare protects approximately 20% of the web, including major publishers and media outlets.
A major point of contention is Google's main crawler, Googlebot. Traditionally used for indexing content for search, Googlebot now also feeds data to Google's AI models, including AI Overviews and the Gemini LLM. This dual role creates a conflict for creators who want to appear in traditional search results but not have their content used for AI training without compensation 2.
Source: Ars Technica
Prince stated, "We will get Google to provide ways to block Answer Box and AI Overview, without blocking classic search indexing" 2. If Google doesn't comply, Prince hinted at potential legal action: "Worst case we'll pass a law somewhere that requires them to break out their crawlers and then announce all routes to their crawlers from there" 1.
Cloudflare can enforce these rules by identifying AI user agents and blocking them automatically unless allowed by publishers. This system can block crawlers from companies like Google (Gemini), Anthropic (Claude), and OpenAI (ChatGPT) 1.
This move by Cloudflare could significantly impact the AI industry, particularly in how AI companies acquire training data. It may force AI giants to negotiate more transparently and on creator-friendly terms for access to web content 1.
However, Cloudflare's initiative faces criticism from those worried about its impact on academic research, security scans, and other benign web crawling activities. There are also concerns about potential threats to web projects like The Internet Archive 2.
As the situation unfolds, it remains to be seen how Google and other AI companies will respond to Cloudflare's demands. The outcome of this confrontation could potentially reshape the landscape of web content usage, AI training data acquisition, and the broader digital economy.
Summarized by
Navi
[2]
Databricks raises $1 billion in a new funding round, valuing the company at over $100 billion. The data analytics firm plans to invest in AI database technology and an AI agent platform, positioning itself for growth in the evolving AI market.
12 Sources
Business
19 hrs ago
12 Sources
Business
19 hrs ago
Microsoft has integrated a new AI-powered COPILOT function into Excel, allowing users to perform complex data analysis and content generation using natural language prompts within spreadsheet cells.
9 Sources
Technology
19 hrs ago
9 Sources
Technology
19 hrs ago
Adobe launches Acrobat Studio, integrating AI assistants and PDF Spaces to transform document management and collaboration, marking a significant evolution in PDF technology.
10 Sources
Technology
19 hrs ago
10 Sources
Technology
19 hrs ago
Meta rolls out an AI-driven voice translation feature for Facebook and Instagram creators, enabling automatic dubbing of content from English to Spanish and vice versa, with plans for future language expansions.
5 Sources
Technology
11 hrs ago
5 Sources
Technology
11 hrs ago
Nvidia introduces significant updates to its app, including global DLSS override, Smooth Motion for RTX 40-series GPUs, and improved AI assistant, enhancing gaming performance and user experience.
4 Sources
Technology
19 hrs ago
4 Sources
Technology
19 hrs ago