Curated by THEOUTPOST
On Fri, 4 Apr, 4:01 PM UTC
3 Sources
[1]
GenAI bots could well be scraping your web apps, researchers warn
Not all bots are bad, but many extract huge amounts of data without permission New research from Barracuda has identified "gray bots", alongside good and bad bots that crawl the web and extract data - and while the "good bots", like SEO and customer service bots look for information, "bad bots" are designed for harmful activities like fraud, data stealing, and breaching accounts. In the space between, there are "gray bots", which Baraccuda explains are GenAI scraper bots designed to extract serious amounts of data from websites, most likely in order to train AI models, or to collect web content like news, reviews, and travel offers. These bots are "blurring the boundaries of legitimate activity," the report argues. Whilst they aren't outright malicious, their approach can be "questionable" and some are even "highly aggressive". Detection software from Baraccuda found millions of requests received by web applications from GenAI bots between December 2024 and February 2025, with one tracked web application receiving 9.7 million scraper bot requests in just 30 days. These bots collect data and can remove it without permission, and can also overwhelm web applications with traffic, disrupt operations, and take copyright-protected data to train AI models, which may be in violation of the owner's rights. There has been lots of pushback against practices like these, with creative industries in the UK launching a 'Make it Fair' campaign to protest against their work being used by AI models to create photos, videos, stories, or other content without permission or credit. Data privacy risks also come with this level of scraping, as some sites carry sensitive customer data - for instance those in healthcare or financial services. The bots can also obscure website analytics, making it very difficult for organisations to assess and track genuine traffic or user behaviour, making business decisions more difficult.
[2]
Barracuda Says Generative AI Gray Bots Hit Websites 500K Times Daily
Barracuda Says Generative AI Gray Bots Hit Websites 500K Times Daily Generative AI scraper bots target websites 24 hours a day with up to half a million requests for information, according to the latest Barracuda detection data. In a new report, Barracuda threat analysts highlight the relentless behavior of generative AI (Gen AI) bots, which form part of an emerging category that Barracuda calls "gray bots". Gray bots are automated programs that are not overtly malicious but which trawl the internet with the aim of extracting information from websites and other web applications.
[3]
Generative AI 'gray bots' pound websites up to half a million times a day, Barracuda new research highlights
Generative AI scraper bots target websites 24 hours a day with up to half a million requests for information, according to the latest Barracuda detection data. In a new report, Barracuda threat analysts highlight the relentless behavior of generative AI (Gen AI) bots, which form part of an emerging category that Barracuda calls "gray bots". Gray bots are automated programs that are not overtly malicious but which trawl the internet with the aim of extracting information from websites and other web applications. Barracuda detection data shows that: Between December 2024 and the end of February 2025, millions of requests were received by web applications from GenAI bots such as ClaudeBot and TikTok's Bytespider bot. One tracked web application received 9.7 million Gen AI scraper bot requests over a period of 30 days. Another tracked web application received over half a million Gen AI scraper bot requests in a single day. Analysis of the gray bot traffic targeting a further tracked web application found that requests remained relatively consistent over 24 hours -- averaging around17,000 requests an hour. "Gen AI gray bots are blurring the boundaries of legitimate online activity," said Rahul Gupta, Senior Principal Software Engineer, Application Security Engineering at Barracuda. "They can scrape vast volumes of sensitive, proprietary, or commercial data and can overwhelm web application traffic and disrupt operations. Frequent scraping by these bots can degrade web performance, and their presence can distort website analytics leading to misleading insights and impaired decision-making. For many organizations, managing gray bot traffic has become an important component of their application security strategies." To defend against Gen AI gray bots and the scraping of information, websites can deploy robots.txt. This is a line of code added to the website that signals to a scraper that it should not take any of that site's data. However, robots.txt is not legally binding, the specific name of the scraper bot needs to be added, and not every Gen AI bot owner respects the guidelines. Organizations can enhance their protection against unwanted Gen AI gray bots by implementing bot protection capable of detecting and blocking generative AI scraper bot activity. Advanced features such as cutting-edge AI and machine learning technologies to address the unique threats posed by gray bots, with behavior-based detection, adaptive machine learning, comprehensive fingerprinting, and real-time blocking, will help to keep this rapidly rising threat at bay. Other examples of gray bots are web scraper bots and automated content aggregators that collect web content such as news, reviews, travel offers etc.
Share
Share
Copy Link
New research from Barracuda reveals the emergence of 'gray bots', AI-powered scrapers that inundate websites with up to half a million daily requests, posing potential risks to data privacy, web performance, and copyright.
Recent research conducted by Barracuda has unveiled a new category of web crawlers known as "gray bots," which are powered by generative AI technology. These bots occupy a space between benign and malicious automated programs, raising concerns about their impact on web applications and data privacy 1.
Gray bots are designed to extract large volumes of data from websites, potentially for training AI models or collecting web content such as news, reviews, and travel offers. While not overtly malicious, their activities blur the lines of legitimate online behavior 2.
Barracuda's detection data reveals the significant impact of these AI-powered bots:
The prevalence of gray bots poses several challenges for website owners and organizations:
Data Privacy: Websites containing sensitive customer information, such as those in healthcare or financial services, may be at risk of unauthorized data extraction 1.
Web Performance: The high volume of requests can overwhelm web applications, potentially disrupting operations and degrading overall performance 3.
Copyright Infringement: Gray bots may collect copyright-protected data to train AI models, potentially violating intellectual property rights 1.
Analytics Distortion: The presence of gray bots can skew website analytics, making it difficult for organizations to assess genuine traffic and user behavior accurately 1.
To protect against GenAI gray bots and unauthorized data scraping, organizations can consider the following strategies:
Implement robots.txt: This code can be added to websites to signal that scraping is not permitted. However, it's important to note that this measure is not legally binding and relies on bot owners respecting the guidelines 3.
Deploy Advanced Bot Protection: Utilize bot protection systems capable of detecting and blocking generative AI scraper bot activity. Features such as behavior-based detection, adaptive machine learning, and real-time blocking can help mitigate the threat 3.
As the landscape of AI-powered web crawling evolves, organizations must remain vigilant and adapt their application security strategies to address the unique challenges posed by gray bots.
Cloudflare introduces a new tool called 'AI Labyrinth' that uses AI-generated content to confuse and waste resources of unauthorized web crawlers, aiming to protect websites from data scraping for AI training.
9 Sources
9 Sources
Companies are increasingly blocking AI web crawlers due to performance issues, security threats, and content guideline violations. These new AI-powered bots are more aggressive and intelligent than traditional search engine crawlers, raising concerns about data scraping practices and their impact on websites.
2 Sources
2 Sources
Cloudflare introduces new bot management tools allowing website owners to control AI data scraping. The tools enable blocking, charging, or setting conditions for AI bots accessing content, potentially reshaping the landscape of web data collection.
13 Sources
13 Sources
Cybersecurity researchers uncover a sophisticated AI-powered spam campaign called AkiraBot that targeted over 420,000 websites, successfully spamming 80,000, using OpenAI's GPT-4o-mini to generate custom messages and bypass CAPTCHA protections.
6 Sources
6 Sources
Cybersecurity experts have identified malware attacks using AI-generated code, marking a significant shift in the landscape of digital threats. This development raises concerns about the potential for more sophisticated and harder-to-detect cyberattacks.
6 Sources
6 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved