2 Sources
[1]
This open-source bot blocker shields your site from pesky AI scrapers - here's how
Also: AI bots scraping your data? This free tool gives those pesky crawlers the run-around F5, the application delivery network company, found that more than half of all web visits come not from people but from data scrapers, including OpenAI, Anthropic, Google, and Perplexity AI bots. (Disclosure: Ziff Davis, ZDNET's parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.) People are sick and tired of wasting money on their sites only to have AI companies rip off everything of value. So, Xe Iaso, a technical educator and part-time bot fighter, wrote an open-source program, Anubis, to stop AI bots in their tracks. Anubis isn't the only such program. Indeed, Iaso freely admits it's "basically the Cloudflare. Are you a bot? [aka Cloudflare Turnstile] page, but self-hostable." This enables you to run it on your own server without incurring any fees. Also: How AI companies are secretly collecting training data from the web (and why it matters) Anubis is designed to protect websites -- particularly those run by small organizations, open-source projects, and archives -- from the relentless onslaught of automated scrapers that threaten to overwhelm servers and increase hosting costs. The program is a web AI firewall utility. All incoming HTTP connections must successfully pass through it before reaching your actual website. Tongue in cheek, Iaso describes Anubis as like the ancient Egyptian god, weighing the soul of your connection using one or more challenges to protect upstream resources from scraper bots. It does this by requiring visitors to solve a computational puzzle, which is trivial for PCs but expensive for bots operating at scale. The system checks whether visitors behave like real browsers, using JavaScript and cookies to verify authenticity. When a bot fails these challenges, it is blocked before reaching the website's core resources. Now, you may ask, "Isn't this just a CAPTCHA? And, aren't AI programs just as good at solving those as people are?" That's true -- they are. But as Iaso says, "Anubis is an uncaptcha. It uses features of your browser to automate a lot of the work that a CAPTCHA would, and right now, the main implementation is by having it run a bunch of cryptographic math with JavaScript to prove that you can run JavaScript in a way that can be validated on the server." Also: Cloudflare just changed the internet, and it's bad news for the AI giants She is well aware that many people are hesitant to run JavaScript due to security and privacy concerns. She's working on a non-JavaScript version of Anubis, but it's not here yet. It will be a while. On a Reddit thread, Iaso said she's "am working on a better one that doesn't rely on JS, but oh god, it is going to be a hell of a thing to implement." Anubis is written in Go and licensed under the open-source MIT License. It's designed to be "as lightweight as possible to ensure that everyone can afford to protect the communities closest to them." On average, the program uses less than 128 MB of RAM on the server side. Most of the workload is handled by visitors' PCs and smartphones. Still, the end-user processing load is so low that ordinary users won't notice. Indeed, since Anubis operates transparently, there are no CAPTCHAs to solve or images to click; most people won't even know that anything is happening. Also: This proxy provider I tested is the best for web scraping - and it's not IPRoyal or MarsProxies The proof-of-work runs in the background, and only those with outdated browsers or JavaScript disabled may encounter issues. It's another story for bot farms -- their load quickly adds up. In a blog, Iaso says: At a high level, Anubis has a big old set of rules in your bot policy file. If clients match a rule, they are either passed through, blocked, or selected for secondary screening. By default, Anubis is meant to instantly work by stopping all the bleeding and letting administrators sleep without downtime alerts waking them up. This means that it's overly paranoid and aggressively challenges everything, similar to Cloudflare's "I'm under attack" mode. My intent was that admins would start out with Anubis being quite paranoid and then slowly lessen the paranoia as they find better patterns and match out ways to do things. Users tend to use Anubis in its default configuration, but this default configuration interferes with RSS feed readers and other "good bots." The result is a tool that Iaso describes as a "bit of a nuclear response." "This will result in your website being blocked from smaller scrapers and may inhibit 'good bots' like the Internet Archive. You can configure bot policy definitions to explicitly allowlist them, and we are working on a curated set of 'known good' bots to allow for a compromise between discoverability and uptime," Iaso says. Also: Reddit sues Anthropic for scraping its users' content without consent Many groups were ready for a nuclear response. Organizations such as GNOME, FFmpeg, and UNESCO have adopted Anubis to protect their online infrastructure. Since its release in January 2025, Anubis has been downloaded over 200,000 times and is credited with helping numerous organizations avoid outages and reduce the burden of unwanted AI scraping. According to Duke University, a happy Anubis user, the school's library systems have successfully blocked about 90 percent of unwanted traffic and over 4 million unwanted HTTP requests per day, while improving service performance with minimal blockage for real users. There are several ways to install and run Anubis. Typically, Anubis is meant to sit between your reverse proxy and your target service. Support is currently free. You can access it via its GitHub issue page or, for live chat, join Iaso's Patreon and ask in the Patreon Discord channel. There's also a commercial version of Anubis named BotStopper, which, at this point, just offers organizations more control over the program's branding. Also: How global threat actors are weaponizing AI now, according to OpenAI The battle between bot developers and defenders promises to be never-ending. Anubis's creators are updating the tool to counter new evasion tactics, such as headless browsers and advanced browser fingerprinting. The goal is to keep the internet accessible for humans while making it uneconomical for abusive bots to operate at scale. This is not easy. If you find the project useful, do support it. She can use all the help you can give. Get the morning's top stories in your inbox each day with our Tech Today newsletter.
[2]
Anubis: Fighting off the hordes of LLM bot crawlers
Anubis is a sort of CAPTCHA test, but flipped: instead of checking visitors are human, it aims to make web crawling prohibitively expensive for companies trying to feed their hungry LLM bots. It's is a clever response to a growing problem: the ever expanding list of companies who want to sell "AI" bots powered by Large Language Models (LLMs). LLMs are built from a "corpus," a very large database of human-written text. To keep updating the model, an LLM bot-herder needs fresh text for their "corpus." Anubis is named after the ancient Egyptian jackal-headed god who weighed the heart of the dead, to determine their fitness. To protect websites from AI crawlers, the Anubis software weighs their willingness to do some computation, in what is called a proof of work challenge. A human visitor merely sees a jackal-styled animé girl for a moment, while their browser solves a cryptographic problem. For companies running large-scale bot farms, though, that means the expensive sound of the fans of a whole datacenter spinning up to full power. In theory, when scanning a site is so intensive, the spider backs off. There are existing measures to stop search engines crawling your site, such as a robots.txt file. But as Google's explanation says, just having a file doesn't prevent a web spider crawling through the site. It's an honor system, and that's a weakness. If the organization running the scraper chooses not to honor it - or your intellectual property rights - then they can simply take whatever they want, as often as they want. Repeat visits are a big problem. It's cheaper to repeatedly scrape largely identical material than it is to store local copies of it -- or as Drew DeVault put it, please stop externalizing your costs directly into my face. Iaso says that Anubis works, and that post contains an impressive list of users, from UNESCO to the WINE, GNOME and Enlightenment projects. Others agree too. Drew DeVault, quoted above, now uses it to protect his SourceHut code forge. There are other such measures. Nepenthes is an LLM bot tarpit: it generates endless pages of link-filled nonsense text, trapping bot-spiders. The Quixotic and Linkmaze tools work similarly, while TollBit is commercial. Some observers have suggested using the work performed by the browser to mine cryptocurrency, but that risks being deemed malicious. Coinhive tried it nearly a decade ago, and got blocked as a result. Here, we respect Iaso's response: It is wasteful - that's the point - but then, so is the vast traffic generated by these bot-feeding harvesters. Some would argue that LLM bots themselves are an even vaster waste of resources and energy, and we would not disagree. As such, we're in favor of anything that hinders them. ®
Share
Copy Link
Anubis, an open-source program, is gaining popularity as a defense against AI web scrapers, protecting websites from data harvesting while reducing server load and costs.
In recent years, the proliferation of AI-powered web scrapers has become a significant concern for website owners and content creators. F5, an application delivery network company, reported that over half of all web visits now come from data scrapers, including those operated by major AI companies like OpenAI, Anthropic, Google, and Perplexity AI 1. This trend has led to increased server loads, higher hosting costs, and concerns about intellectual property rights.
Source: The Register
In response to this challenge, Xe Iaso, a technical educator and part-time bot fighter, developed Anubis, an open-source program designed to stop AI bots in their tracks 1. Anubis functions as a web AI firewall utility, requiring all incoming HTTP connections to pass through a series of challenges before reaching the actual website.
Anubis employs a unique approach to bot detection:
Source: ZDNet
Since its release in January 2025, Anubis has been downloaded over 200,000 times and adopted by various organizations, including GNOME, FFmpeg, and UNESCO 1. It has helped numerous organizations avoid outages and reduce the burden of unwanted AI scraping.
While Anubis is not the only bot-blocking solution available, it stands out for its open-source nature and self-hostable design. Other similar tools include:
Iaso is currently working on a non-JavaScript version of Anubis to address concerns about running JavaScript for security and privacy reasons 1. However, this development is expected to take some time due to its complexity.
As the battle between website owners and AI scrapers continues, tools like Anubis represent an important step in protecting online content and infrastructure. However, the ongoing challenge will be to balance effective bot blocking with maintaining accessibility for legitimate users and beneficial web services.
Mira Murati's AI startup Thinking Machines Lab secures a historic $2 billion seed round, reaching a $12 billion valuation. The company plans to unveil its first product soon, focusing on collaborative general intelligence.
9 Sources
Startups
8 hrs ago
9 Sources
Startups
8 hrs ago
Meta's new Superintelligence Lab is considering abandoning its open-source AI model, Behemoth, in favor of developing closed models, marking a significant shift in the company's AI strategy and potentially reshaping the AI landscape.
7 Sources
Technology
16 hrs ago
7 Sources
Technology
16 hrs ago
AMD and Nvidia receive approval to resume sales of specific AI chips to China, marking a significant shift in US trade policy and potentially boosting their revenues.
5 Sources
Business and Economy
16 hrs ago
5 Sources
Business and Economy
16 hrs ago
Leading AI researchers from major tech companies and institutions urge the industry to prioritize studying and preserving Chain-of-Thought (CoT) monitoring capabilities in AI models, viewing it as a crucial but potentially fragile tool for AI safety.
3 Sources
Technology
24 mins ago
3 Sources
Technology
24 mins ago
Google announces major advancements in AI-driven cybersecurity, including the first-ever prevention of a live cyberattack by an AI agent, ahead of Black Hat USA and DEF CON 33 conferences.
2 Sources
Technology
27 mins ago
2 Sources
Technology
27 mins ago