3 Sources
[1]
An 83-year-old short story by Borges portends a bleak future for the internet
University of Memphis provides funding as a member of The Conversation US. How will the internet evolve in the coming decades? Fiction writers have explored some possibilities. In his 2019 novel "Fall," science fiction author Neal Stephenson imagined a near future in which the internet still exists. But it has become so polluted with misinformation, disinformation and advertising that it is largely unusable. Characters in Stephenson's novel deal with this problem by subscribing to "edit streams" - human-selected news and information that can be considered trustworthy. The drawback is that only the wealthy can afford such bespoke services, leaving most of humanity to consume low-quality, noncurated online content. To some extent, this has already happened: Many news organizations, such as The New York Times and The Wall Street Journal, have placed their curated content behind paywalls. Meanwhile, misinformation festers on social media platforms like X and TikTok. Stephenson's record as a prognosticator has been impressive - he anticipated the metaverse in his 1992 novel "Snow Crash," and a key plot element of his "Diamond Age," released in 1995, is an interactive primer that functions much like a chatbot. On the surface, chatbots seem to provide a solution to the misinformation epidemic. By dispensing factual content, chatbots could supply alternative sources of high-quality information that aren't cordoned off by paywalls. Ironically, however, the output of these chatbots may represent the greatest danger to the future of the web - one that was hinted at decades earlier by Argentine writer Jorge Luis Borges. The rise of the chatbots Today, a significant fraction of the internet still consists of factual and ostensibly truthful content, such as articles and books that have been peer-reviewed, fact-checked or vetted in some way. The developers of large language models, or LLMs - the engines that power bots like ChatGPT, Copilot and Gemini - have taken advantage of this resource. To perform their magic, however, these models must ingest immense quantities of high-quality text for training purposes. A vast amount of verbiage has already been scraped from online sources and fed to the fledgling LLMs. The problem is that the web, enormous as it is, is a finite resource. High-quality text that hasn't already been strip-mined is becoming scarce, leading to what The New York Times called an "emerging crisis in content." This has forced companies like OpenAI to enter into agreements with publishers to obtain even more raw material for their ravenous bots. But according to one prediction, a shortage of additional high-quality training data may strike as early as 2026. As the output of chatbots ends up online, these second-generation texts - complete with made-up information called "hallucinations," as well as outright errors, such as suggestions to put glue on your pizza - will further pollute the web. And if a chatbot hangs out with the wrong sort of people online, it can pick up their repellent views. Microsoft discovered this the hard way in 2016, when it had to pull the plug on Tay, a bot that started repeating racist and sexist content. Over time, all of these issues could make online content even less trustworthy and less useful than it is today. In addition, LLMs that are fed a diet of low-calorie content may produce even more problematic output that also ends up on the web. An infinite - and useless - library It's not hard to imagine a feedback loop that results in a continuous process of degradation as the bots feed on their own imperfect output. A July 2024 paper published in Nature explored the consequences of training AI models on recursively generated data. It showed that "irreversible defects" can lead to "model collapse" for systems trained in this way - much like an image's copy and a copy of that copy, and a copy of that copy, will lose fidelity to the original image. How bad might this get? Consider Borges' 1941 short story "The Library of Babel." Fifty years before computer scientist Tim Berners-Lee created the architecture for the web, Borges had already imagined an analog equivalent. In his 3,000-word story, the writer imagines a world consisting of an enormous and possibly infinite number of hexagonal rooms. The bookshelves in each room hold uniform volumes that must, its inhabitants intuit, contain every possible permutation of letters in their alphabet. Initially, this realization sparks joy: By definition, there must exist books that detail the future of humanity and the meaning of life. The inhabitants search for such books, only to discover that the vast majority contain nothing but meaningless combinations of letters. The truth is out there -but so is every conceivable falsehood. And all of it is embedded in an inconceivably vast amount of gibberish. Even after centuries of searching, only a few meaningful fragments are found. And even then, there is no way to determine whether these coherent texts are truths or lies. Hope turns into despair. Will the web become so polluted that only the wealthy can afford accurate and reliable information? Or will an infinite number of chatbots produce so much tainted verbiage that finding accurate information online becomes like searching for a needle in a haystack? The internet is often described as one of humanity's great achievements. But like any other resource, it's important to give serious thought to how it is maintained and managed - lest we end up confronting the dystopian vision imagined by Borges.
[2]
Did a short story by Borges predict a dark future for the web?
University of Memphis provides funding as a member of The Conversation US. How will the internet evolve in the coming decades? Fiction writers have explored some possibilities. In his 2019 novel "Fall," science fiction author Neal Stephenson imagined a near future in which the internet still exists. But it has become so polluted with misinformation, disinformation and advertising that it is largely unusable. Characters in Stephenson's novel deal with this problem by subscribing to "edit streams" - human-selected news and information that can be considered trustworthy. The drawback is that only the wealthy can afford such bespoke services, leaving most of humanity to consume low-quality, noncurated online content. To some extent, this has already happened: Many news organizations, such as The New York Times and The Wall Street Journal, have placed their curated content behind paywalls. Meanwhile, misinformation festers on social media platforms like X and TikTok. Stephenson's record as a prognosticator has been impressive - he anticipated the metaverse in his 1992 novel "Snow Crash," and a key plot element of his "Diamond Age," released in 1995, is an interactive primer that functions much like a chatbot. On the surface, chatbots seem to provide a solution to the misinformation epidemic. By dispensing factual content, chatbots could supply alternative sources of high-quality information that aren't cordoned off by paywalls. Ironically, however, the output of these chatbots may represent the greatest danger to the future of the web - one that was hinted at decades earlier by Argentine writer Jorge Luis Borges. The rise of the chatbots Today, a significant fraction of the internet still consists of factual and ostensibly truthful content, such as articles and books that have been peer-reviewed, fact-checked or vetted in some way. The developers of large language models, or LLMs - the engines that power bots like ChatGPT, Copilot and Gemini - have taken advantage of this resource. To perform their magic, however, these models must ingest immense quantities of high-quality text for training purposes. A vast amount of verbiage has already been scraped from online sources and fed to the fledgling LLMs. The problem is that the web, enormous as it is, is a finite resource. High-quality text that hasn't already been strip-mined is becoming scarce, leading to what The New York Times called an "emerging crisis in content." This has forced companies like OpenAI to enter into agreements with publishers to obtain even more raw material for their ravenous bots. But according to one prediction, a shortage of additional high-quality training data may strike as early as 2026. As the output of chatbots ends up online, these second-generation texts - complete with made-up information called "hallucinations," as well as outright errors, such as suggestions to put glue on your pizza - will further pollute the web. And if a chatbot hangs out with the wrong sort of people online, it can pick up their repellent views. Microsoft discovered this the hard way in 2016, when it had to pull the plug on Tay, a bot that started repeating racist and sexist content. Over time, all of these issues could make online content even less trustworthy and less useful than it is today. In addition, LLMs that are fed a diet of low-calorie content may produce even more problematic output that also ends up on the web. An infinite - and useless - library It's not hard to imagine a feedback loop that results in a continuous process of degradation as the bots feed on their own imperfect output. A July 2024 paper published in Nature explored the consequences of training AI models on recursively generated data. It showed that "irreversible defects" can lead to "model collapse" for systems trained in this way - much like an image's copy and a copy of that copy, and a copy of that copy, will lose fidelity to the original image. How bad might this get? Consider Borges' 1941 short story "The Library of Babel." Fifty years before computer scientist Tim Berners-Lee created the architecture for the web, Borges had already imagined an analog equivalent. In his 3,000-word story, the writer imagines a world consisting of an enormous and possibly infinite number of hexagonal rooms. The bookshelves in each room hold uniform volumes that must, its inhabitants intuit, contain every possible permutation of letters in their alphabet. Initially, this realization sparks joy: By definition, there must exist books that detail the future of humanity and the meaning of life. The inhabitants search for such books, only to discover that the vast majority contain nothing but meaningless combinations of letters. The truth is out there -but so is every conceivable falsehood. And all of it is embedded in an inconceivably vast amount of gibberish. Even after centuries of searching, only a few meaningful fragments are found. And even then, there is no way to determine whether these coherent texts are truths or lies. Hope turns into despair. Will the web become so polluted that only the wealthy can afford accurate and reliable information? Or will an infinite number of chatbots produce so much tainted verbiage that finding accurate information online becomes like searching for a needle in a haystack? The internet is often described as one of humanity's great achievements. But like any other resource, it's important to give serious thought to how it is maintained and managed - lest we end up confronting the dystopian vision imagined by Borges.
[3]
An 83-year-old short story by Borges portends a bleak future for the internet
How will the internet evolve in the coming decades? Fiction writers have explored some possibilities. In his 2019 novel "Fall," science fiction author Neal Stephenson imagined a near future in which the internet still exists. But it has become so polluted with misinformation, disinformation and advertising that it is largely unusable. Characters in Stephenson's novel deal with this problem by subscribing to "edit streams" -- human-selected news and information that can be considered trustworthy. The drawback is that only the wealthy can afford such bespoke services, leaving most of humanity to consume low-quality, noncurated online content. To some extent, this has already happened: Many news organizations, such as The New York Times and The Wall Street Journal, have placed their curated content behind paywalls. Meanwhile, misinformation festers on social media platforms like X and TikTok. Stephenson's record as a prognosticator has been impressive -- he anticipated the metaverse in his 1992 novel "Snow Crash," and a key plot element of his "Diamond Age," released in 1995, is an interactive primer that functions much like a chatbot. On the surface, chatbots seem to provide a solution to the misinformation epidemic. By dispensing factual content, chatbots could supply alternative sources of high-quality information that aren't cordoned off by paywalls. Ironically, however, the output of these chatbots may represent the greatest danger to the future of the web -- one that was hinted at decades earlier by Argentine writer Jorge Luis Borges. The rise of the chatbots Today, a significant fraction of the internet still consists of factual and ostensibly truthful content, such as articles and books that have been peer-reviewed, fact-checked or vetted in some way. The developers of large language models, or LLMs -- the engines that power bots like ChatGPT, Copilot and Gemini -- have taken advantage of this resource. To perform their magic, however, these models must ingest immense quantities of high-quality text for training purposes. A vast amount of verbiage has already been scraped from online sources and fed to the fledgling LLMs. The problem is that the web, enormous as it is, is a finite resource. High-quality text that hasn't already been strip-mined is becoming scarce, leading to what The New York Times called an "emerging crisis in content." This has forced companies like OpenAI to enter into agreements with publishers to obtain even more raw material for their ravenous bots. But according to one prediction, a shortage of additional high-quality training data may strike as early as 2026. As the output of chatbots ends up online, these second-generation texts -- complete with made-up information called "hallucinations," as well as outright errors, such as suggestions to put glue on your pizza -- will further pollute the web. And if a chatbot hangs out with the wrong sort of people online, it can pick up their repellent views. Microsoft discovered this the hard way in 2016, when it had to pull the plug on Tay, a bot that started repeating racist and sexist content. Over time, all of these issues could make online content even less trustworthy and less useful than it is today. In addition, LLMs that are fed a diet of low-calorie content may produce even more problematic output that also ends up on the web. An infinite -- and useless -- library It's not hard to imagine a feedback loop that results in a continuous process of degradation as the bots feed on their own imperfect output. A July 2024 paper published in Nature explored the consequences of training AI models on recursively generated data. It showed that "irreversible defects" can lead to "model collapse" for systems trained in this way -- much like an image's copy and a copy of that copy, and a copy of that copy, will lose fidelity to the original image. How bad might this get? Consider Borges' 1941 short story "The Library of Babel." Fifty years before computer scientist Tim Berners-Lee created the architecture for the web, Borges had already imagined an analog equivalent. In his 3,000-word story, the writer imagines a world consisting of an enormous and possibly infinite number of hexagonal rooms. The bookshelves in each room hold uniform volumes that must, its inhabitants intuit, contain every possible permutation of letters in their alphabet. Initially, this realization sparks joy: By definition, there must exist books that detail the future of humanity and the meaning of life. The inhabitants search for such books, only to discover that the vast majority contain nothing but meaningless combinations of letters. The truth is out there -- but so is every conceivable falsehood. And all of it is embedded in an inconceivably vast amount of gibberish. Even after centuries of searching, only a few meaningful fragments are found. And even then, there is no way to determine whether these coherent texts are truths or lies. Hope turns into despair. Will the web become so polluted that only the wealthy can afford accurate and reliable information? Or will an infinite number of chatbots produce so much tainted verbiage that finding accurate information online becomes like searching for a needle in a haystack? The internet is often described as one of humanity's great achievements. But like any other resource, it's important to give serious thought to how it is maintained and managed -- lest we end up confronting the dystopian vision imagined by Borges.
Share
Copy Link
A 1941 short story by Jorge Luis Borges eerily predicts the potential future of the internet, where AI-generated content could lead to an information crisis similar to the fictional Library of Babel.
In an uncanny parallel to today's digital landscape, Jorge Luis Borges' 1941 short story "The Library of Babel" seems to have predicted the potential future of the internet. This literary work, written decades before the web's inception, presents a world that bears striking similarities to the challenges we may soon face in our increasingly AI-driven online environment 1.
Today's internet still contains a significant amount of factual and vetted content. However, the rise of large language models (LLMs) powering chatbots like ChatGPT, Copilot, and Gemini is rapidly changing the digital landscape. These AI models require vast amounts of high-quality text for training, leading to what The New York Times has termed an "emerging crisis in content" 2.
As high-quality, unprocessed text becomes scarce, tech companies are resorting to agreements with publishers to obtain more training data. Some predictions suggest that a shortage of additional high-quality training data could occur as early as 2026 3.
The proliferation of AI-generated content poses several risks:
A July 2024 paper in Nature explored the consequences of training AI models on recursively generated data, showing that "irreversible defects" can lead to "model collapse" 1.
Borges' story imagines a world of infinite hexagonal rooms filled with books containing every possible combination of letters. Initially exciting, the inhabitants soon realize that finding meaningful information amidst the vast sea of gibberish is nearly impossible 2.
Two possible scenarios emerge:
As one of humanity's great achievements, the internet requires careful consideration of how it is maintained and managed. Without proper oversight, we risk confronting the dystopian vision imagined by Borges over eight decades ago 3.
NVIDIA announces significant upgrades to its GeForce NOW cloud gaming service, including RTX 5080-class performance, improved streaming quality, and an expanded game library, set to launch in September 2025.
9 Sources
Technology
13 hrs ago
9 Sources
Technology
13 hrs ago
Google's Made by Google 2025 event showcases the Pixel 10 series, featuring advanced AI capabilities, improved hardware, and ecosystem integrations. The launch includes new smartphones, wearables, and AI-driven features, positioning Google as a strong competitor in the premium device market.
4 Sources
Technology
13 hrs ago
4 Sources
Technology
13 hrs ago
Palo Alto Networks reports impressive Q4 results and forecasts robust growth for fiscal 2026, driven by AI-powered cybersecurity solutions and the strategic acquisition of CyberArk.
6 Sources
Technology
13 hrs ago
6 Sources
Technology
13 hrs ago
OpenAI updates GPT-5 to make it more approachable following user feedback, sparking debate about AI personality and user preferences.
6 Sources
Technology
21 hrs ago
6 Sources
Technology
21 hrs ago
President Trump's plan to deregulate AI development in the US faces a significant challenge from the European Union's comprehensive AI regulations, which could influence global standards and affect American tech companies' operations worldwide.
2 Sources
Policy
5 hrs ago
2 Sources
Policy
5 hrs ago