4 Sources
[1]
Dead Internet Theory Is 17% of the Way to Becoming Reality, Study Finds
More than a third of new websites on the internet were created by AI, according to a paper published online by researchers from Imperial College London, Stanford University, and the Internet Archive. The study is based on data collected by the Internet Archive's Wayback Machine from late 2022 (when ChatGPT kicked off the AI craze) to mid-2025. As of May 2025, the researchers found that 35.3% of all newly published websites on the internet were created with the assistance of AI, including 17.6% of websites that were completely AI-generated. That might not totally shock you if you subscribe to the Dead Internet Theory, aka the belief that most of the internet is driven by bot activity, with its more conspiratorial proponents claiming this is being done on purpose to control the public. The rough figure the researchers have found is also in line with previous findings. Cloudflare reported in September 2025 that nearly one-third of all internet traffic was driven by bots. A few days later, the company's CEO, Matthew Prince, appeared on a podcast to share his "frighteningly likely" forecast that AI will completely change the way information is shared online, and concentrate power over this online knowledge in the hands of a few tech giants. An even earlier report from data security company Imperva claimed that automated surfing surpassed human activity on the internet for the first time in 2024, making up roughly half of all web traffic. The report concluded that this was "largely driven by the rapid adoption of AI and large language models." There is some anecdotal evidence that shows just how pervasive AI-generated websites have become across the internet, too. Scammers are using AI tools to rapidly generate fake websites to trick victims. AI is also being used to plagiarize news organizations and create trash websites for the sole purpose of SEO-farming. A new report by Model Republic also claims that a website linked to OpenAI-backed super PAC Leading The Future was publishing an onslaught of mostly AI-generated "news" articles to attack critics of artificial intelligence products. But the Internet Archive study goes even further than previous evidence, exploring whether bots have taken over the internet and whether this is leading to the widely anticipated outcomes. Many people fear that as AI infiltrates the internet, the language and accuracy of the internet will change along with it. In the study, the researchers tested six beliefs that they said the majority of U.S. adults hold about an online future dominated by AI content. But they found only two of those hypotheses to be playing out. The researchers found that AI-generated online content wasn't as factually incorrect as expected. They also found that it cited its sources through external links, didn't extinguish individual writing styles in favor of a generic voice, and wasn't a long, winding block of text with little meaningful information, despite common belief. But, what they did find, as expected, was that more AI content meant less "range of unique ideas and diverse viewpoints" and writing that "feels increasingly sanitized and artificially cheerful." Even OpenAI CEO Sam Altman admitted to this fake positivity. Last year, after the company released its AI coding agent Codex, Altman said that the intense praise for the release on the subreddit r/Claudecode felt somewhat bot-driven. This study is just the beginning, and could become a useful tool to help users discern credible information on the internet. The researchers told 404 Media this week that they were working on creating "a continuous tool" to monitor this phenomenon and understand "which kinds of websites are most affected, broken down by category or language, and generally providing more nuance about where these impacts are landing."
[2]
Study Finds A Third of New Websites are AI-Generated
Researchers found the internet is becoming aggressively positive as AI-generated text floods the web. Researchers working with data from the Internet Archive have discovered that a third of websites created since 2022 are AI-generated. The team of researchers -- which includes people from Stanford, the Imperial College London, and the Internet Archive -- published their findings online in a paper titled "The Impact of AI-Generated Text on the Internet." The research also found that all this AI-generated text is making the web more cheery and less verbose. Inspired by the Dead Internet Theory -- the idea that much of the internet is now just bots talking back and forth -- the team set out to find out how ChatGPT and its competitors had reshaped the internet since 2022. "The proliferation of AI-generated and AI-assisted text on the internet is feared to contribute to a degradation in semantic and stylistic diversity, factual accuracy, and other negative developments," the researchers write in the paper. "We find that by mid-2025, roughly 35% of newly published websites were classified as AI-generated or AI-assisted, up from zero before ChatGPT's launch in late 2022." "I find the sheer speed of the AI takeover of the web quite staggering," Jonáš Doležal, an AI researcher at Stanford and co-author of the paper, told 404 Media. "After decades of humans shaping it, a significant portion of the internet has become defined by AI in just three years. We're witnessing, in my opinion, a major transformation of the digital landscape in a fraction of the time it took to build in the first place." The researchers also tested six common critiques of AI-generated text. Does it lead to a shrinking of viewpoints? Does it create more disinformation as hallucinations proliferate? Does online writing feel more sanitized and cheerful? Does it frail to cite its sources? Does it create strings of words with low semantic density? Has it forced writing into a monoculture where unique voices vanish and a generic, uniform style takes hold? To answer these questions, the researchers partnered with the Internet Archive to pull samples of websites from the 33 months between August 2022 and May 2025. "For each sampled URL, we retrieve the oldest available archived snapshot via the Wayback Machine's CDX Server API," the research said. "The raw HTML of each snapshot is downloaded and stored locally for subsequent processing." The researchers took the extracted website text and used the AI-detection software Pangram v3 to find AI-created websites. The team tested several AI-detection tools and found Pangram v3 had the highest detection rate. Once Pangram v3 had identified an AI-generated website, the researchers used that website as a sample to test their other six hypotheses. "For each hypothesis, we define a measurable signal, compute it for each monthly sample of websites, and test whether it correlates with the aggregate AI likelihood score across months," the research said. To test if AI was creating an internet full of falsehoods, for example, the team extracted fact based claims from the websites they'd selected and then paid human factcheckers to verify them. To figure out if AI is citing its sources, the team computed the outbound link density in AI-generated text. To the surprise of the researchers, only two of the six theories they tested about the effects of AI-generated text seemed true. AI was making the internet less semantically diverse and more positive overall, but it wasn't causing a proliferation in lies or cutting out its sources. "The most surprising result was that our Truth Decay hypothesis wasn't confirmed," Doležal said. "It's worth noting that we were specifically looking for an increase in verifiably untrue statements, which we didn't find. But it could still be the case that AI is quietly increasing the volume of unverifiable claims, ones that can't be checked against existing fact-checking tools and infrastructure. Or it may simply be that the internet wasn't a particularly truth-adhering place to begin with." The researchers said they'd continue to study how AI-generated text shaped the internet. "We're now working with the Internet Archive to turn this into a continuous tool that keeps providing this signal going forward, rather than a single fixed snapshot bounded by the static nature of a paper," Maty Bohacek, a student researcher at Stanford and one of the co-authors of the paper, told 404 Media. "We're also interested in adding more granularity: looking at which kinds of websites are most affected, broken down by category or language, and generally providing more nuance about where these impacts are landing." For Doležal, studies like this are critical for ensuring a useful and productive internet. "As AI-generated content spreads, the challenge is finding a role for these models that doesn't just result in a sanitized, repetitive web," he said. "Rather than forcing models to be perfectly compliant and agreeable, allowing them to have a more distinct personality or 'friction' might help them act as a creative partner rather than a replacement for human voice."
[3]
Dead Internet? A Third of New Websites Are AI-Generated, Says Stanford - Decrypt
At 35% AI prevalence, model collapse risk shifts from a theoretical concern to an empirical one for the next generation of foundation models. A new study has a number for how much of the internet is now AI-generated: 35%. That's the share of newly published websites classified as AI-generated or AI-assisted by mid-2025, according to research from Stanford University, Imperial College London, and the Internet Archive. The figure was essentially zero before ChatGPT launched in November 2022. "I find the sheer speed of the AI takeover of the web quite staggering," Jonáš Doležal, researcher at Imperial College London and co-author of the paper, told 404 Media. "After decades of humans shaping it, a significant portion of the internet has become defined by AI in just three years." The study, titled "The Impact of AI-Generated Text on the Internet," drew on 33 months of website snapshots from the Internet Archive's Wayback Machine and used an AI text detector called Pangram v3 to classify each page. The confirmed harms: vibes, not facts Researchers tested six hypotheses about what AI content does to the web. Only two held up under data scrutiny. The first: We're turning into a horde of dumb NPCs acting in the same way... Or more scientifically put, the web is becoming less semantically diverse. AI-generated sites showed pairwise semantic similarity scores 33% higher than human-written ones. The same ideas keep getting expressed in nearly the same ways. The paper suggests the online Overton window may be narrowing, not through censorship or coordinated campaigns, but because language models optimize for outputs close to their training distribution. The second: The web is getting aggressively cheerful. AI content showed positive sentiment scores more than 107% higher than human content. Researchers tie this to the well-documented sycophantic tendencies of LLMs -- trained on human approval signals, they produce text that feels sanitized, friction-free, and relentlessly upbeat. An internet flooded with cheerful, homogenized content may marginalize human dissent at scale without anyone pulling a lever. Despite widespread public belief, the study found no statistically significant evidence that AI content is making the internet less factually accurate. Researchers found no meaningful correlation between AI prevalence and factual error rate. The stylistic monoculture hypothesis -- AI flattening individual voices into a generic uniform register -- was the belief respondents held most strongly (83% agreed). The data didn't confirm it. Character-level analysis found no statistically significant increase in stylistic homogeneity tied to AI prevalence. The broader stakes go beyond discourse quality. At 35% AI prevalence, the theoretical risk of model collapse -- where future models degrade after training on AI-generated data -- shifts from academic concern to empirical reality. Future foundation models trained on contemporary web crawls will inevitably ingest data that is substantially AI-generated and measurably less semantically diverse. The team is now working with the Internet Archive to turn the study into a continuous, live monitoring tool, tracking AI's share of the web in real time rather than as a one-off snapshot. A U.S. survey conducted alongside the study found most Americans already believe all six negative hypotheses, including the ones the data doesn't support. People who use AI infrequently were 12% more likely to believe in the harms than frequent users. Dead Internet Theory believers, meet the data: The internet isn't dead, but 35% of what's new is probably zombie content in some way.
[4]
One-third of new websites are now AI-generated -- here's how to spot them
Summary A new study has found that about 35% of new websites published by mid-2025 are AI-generated. Over 20% of these sites are fully AI-generated. This shift has resulted in a reduction in unique ideas and an increase in artificial positivity. It looks like proponents of the Dead Internet Theory may have been onto something -- a new study has found that, since 2022, a third of new websites are AI-generated. The study was performed by a team of researchers from the Imperial College of London, the Internet Archive, and Stanford University. Related I asked ChatGPT to find me a free movie and didn't expect this ChatGPT's free movie pick blew my mind. Posts 1 By Tashreef Shareef The study's findings AI everywhere To arrive at its conclusions, the researchers used the Internet Archive's Wayback Machine to examine a sample of websites published from mid-2022 to mid-2025, with an eye on the launch of the feature-packed ChatGPT in late 2022. What they found is eye-opening, if not entirely surprising: the percentage of AI-generated websites has risen dramatically from zero before ChatGPT's launch to around 35% by mid-2025. Over 20% of these sites are fully AI-generated. The study also looked at how these AI-generated sites have impacted the quality of content on the web. This research was based on common criticisms of AI writing -- for example, it can feel sanitized, generic, and lacking in unique viewpoints. The team tested six hypotheses: Semantic contraction: "As AI text becomes more common on the internet, the range of unique ideas and diverse viewpoints shrinks." Truth decay: "As AI content becomes more common on the internet, I am encountering factually incorrect information and hallucinations more frequently." Positivity shift: "As AI content becomes more common on the internet, online writing feels increasingly sanitized and artificially cheerful." Epistemic islands: "As AI content becomes more common on the internet, articles are increasingly providing answers without including links to external sources." Entropy dilution: "As AI content becomes more common on the internet, content is becoming significantly longer in word count while having lower semantic density." Stylistic monoculture: "As AI content becomes more common on the internet, distinct individual writing styles are disappearing in favor of a generic, uniform voice." The results Surprisingly, only two of the six hypotheses proved true: #1 and #3. Based on the websites studied, AI seemed to be making the internet less diverse, with fewer unique viewpoints and ideas, and more artificially positive. Thankfully, they did not find that factual accuracy is decreasing. How to spot AI sites ChatGPT still isn't a good writer At the end of the study period (mid-2025), the percentage of AI-generated sites was trending up sharply. The obvious takeaway is that, going forward, readers who prefer human-generated content need to be careful of their sources for news and information. Subscribe to the newsletter for AI web trends and tips Curate your reading with our newsletter: evidence-based guides to spotting AI-generated websites, clear analysis of how AI reshapes online content diversity, and practical checks to help you identify human-sourced material. Get Updates By subscribing, you agree to receive newsletter and marketing emails, and accept our Terms of Use and Privacy Policy. You can unsubscribe anytime. Here are a few ways to ensure you're getting factual content created by actual people: Look at bylines: Look for articles that list an actual author. You can do a quick search to verify that they seem to be an actual person (it can be hard to tell these days) and that their expertise is genuine. Pay attention to the feel of the writing: ChatGPT might be good at planning, but it just isn't a very good writer. AI content tends to have a certain tone to it -- often lacking depth and personal insight. Sentences can feel overly generic and may not always make a ton of sense. Many people consider the presence of em dashes to be a red flag, but this is not a good indicator -- these are very frequently used by real authors (including yours truly). Be wary of large volumes of content: Sites that seem to push out very large volumes of content might be using AI to write it. Unless it's a very large site that's known for having a large team of authors, this is probably a red flag. Check more than one source: Whether you're suspicious of AI or not, it's always a good idea to verify claims. Fact-check key information and try to get your news from more than one source whenever possible. Have you encountered an uptick in obvious AI content? Does it affect your reading habits? Let us know in the comments!
Share
Copy Link
Researchers from Stanford University, Imperial College London, and the Internet Archive analyzed 33 months of web data and found that 35% of newly published websites by mid-2025 are AI-generated or AI-assisted. The study confirms the internet is becoming less diverse in ideas and increasingly cheerful in tone, though concerns about factual accuracy weren't validated.
The Dead Internet Theory—once dismissed as conspiracy—is inching toward reality. A comprehensive study titled "The Impact of AI-Generated Text on the Internet" reveals that 35.3% of all newly published websites by mid-2025 were created with AI assistance, with 17.6% being completely AI-generated
1
. This figure represents a dramatic shift from essentially zero before ChatGPT launched in November 20223
.
Source: Decrypt
Researchers from Stanford University, Imperial College London, and the Internet Archive partnered to examine this transformation of the digital landscape. "I find the sheer speed of the AI takeover of the web quite staggering," Jonáš Doležal, an AI researcher at Stanford and co-author of the paper, told 404 Media. "After decades of humans shaping it, a significant portion of the internet has become defined by AI in just three years"
2
.The research team used the Internet Archive's Wayback Machine to pull samples of websites from 33 months between August 2022 and May 2025. For each sampled URL, they retrieved the oldest available archived snapshot and extracted raw HTML for analysis
2
. The team then deployed AI detection tools, ultimately selecting Pangram v3 for its superior detection rate in identifying AI-created content.
Source: 404 Media
This finding aligns with previous reports on bot activity. Cloudflare reported in September 2025 that nearly one-third of all internet traffic was driven by bots, while data security company Imperva claimed that automated surfing surpassed human activity for the first time in 2024, making up roughly half of all web traffic
1
.The researchers tested six common critiques about AI-generated text to understand its impact on the web. Only two hypotheses proved true under scrutiny. The first confirmed concern involves semantic contraction: AI-generated websites showed pairwise semantic similarity scores 33% higher than human-written ones
3
. Language models optimize for outputs close to their training distribution, causing the same ideas to be expressed in nearly identical ways. This represents a measurable reduction in the diversity of ideas and unique viewpoints across the internet.The second validated hypothesis centers on what researchers call the "positivity shift." AI-generated content showed positive sentiment scores more than 107% higher than human content
3
. This sanitized and artificially cheerful writing stems from the well-documented sycophantic tendencies of language models, which are trained on human approval signals and produce friction-free, relentlessly upbeat text.Even OpenAI CEO Sam Altman acknowledged this phenomenon. After the company released its AI coding agent, Altman admitted that intense praise on the subreddit r/Claudecode felt somewhat bot-driven
1
. An internet flooded with this increase in artificially positive content may marginalize human dissent at scale without deliberate intervention.
Source: MakeUseOf
Surprisingly, the study found no statistically significant evidence that AI content is degrading factual accuracy on the internet. "The most surprising result was that our Truth Decay hypothesis wasn't confirmed," Doležal said. "We were specifically looking for an increase in verifiably untrue statements, which we didn't find"
2
. Researchers paid human fact-checkers to verify claims extracted from AI-generated websites and found no meaningful correlation between AI prevalence and factual error rates.The team also tested whether AI was creating content without citing sources, whether it produced low semantic density text, and whether it forced writing styles into a monoculture. None of these hypotheses were confirmed by the data
1
.Related Stories
At 35% AI prevalence, the model collapse risk—where future models degrade after training on AI-generated data—shifts from theoretical concern to empirical reality
3
. Future foundation models trained on contemporary web crawls will inevitably ingest data that is substantially AI-generated and measurably less semantically diverse. This creates a feedback loop where language models train on their own outputs, potentially degrading performance over time.The research team plans to expand their work beyond this initial snapshot. "We're now working with the Internet Archive to turn this into a continuous tool that keeps providing this signal going forward, rather than a single fixed snapshot bounded by the static nature of a paper," Maty Bohacek, a student researcher at Stanford and co-author, told 404 Media
2
. The tool will track which kinds of websites are most affected, broken down by category or language, providing more nuanced understanding of where impacts are landing.A U.S. survey conducted alongside the study found most Americans already believe all six negative hypotheses about AI content, including those the data doesn't support. People who use AI infrequently were 12% more likely to believe in the harms than frequent users . As AI detection tools improve and monitoring continues, readers seeking human-generated content will need to verify sources, check bylines for actual authors, and remain skeptical of sites producing large volumes of content
4
.Summarized by
Navi
[2]
1
Technology

2
Policy and Regulation

3
Science and Research
