Curated by THEOUTPOST
On Wed, 12 Mar, 12:05 AM UTC
2 Sources
[1]
AGI is suddenly a dinner table topic
The concept of artificial general intelligence -- an ultra-powerful AI system we don't have yet -- can be thought of as a balloon, repeatedly inflated with hype during peaks of optimism (or fear) about its potential impact and then deflated as reality fails to meet expectations. This week, lots of news went into that AGI balloon. I'm going to tell you what it means (and probably stretch my analogy a little too far along the way). First, let's get the pesky business of defining AGI out of the way. In practice, it's a deeply hazy and changeable term shaped by the researchers or companies set on building the technology. But it usually refers to a future AI that outperforms humans on cognitive tasks. Which humans and which tasks we're talking about makes all the difference in assessing AGI's achievability, safety, and impact on labor markets, war, and society. That's why defining AGI, though an unglamorous pursuit, is not pedantic but actually quite important, as illustrated in a new paper published this week by authors from Hugging Face and Google, among others. In the absence of that definition, my advice when you hear AGI is to ask yourself what version of the nebulous term the speaker means. (Don't be afraid to ask for clarification!) Okay, on to the news. First, a new AI model from China called Manus launched last week. A promotional video for the model, which is built to handle "agentic" tasks like creating websites or performing analysis, describes it as "potentially, a glimpse into AGI." The model is doing real-world tasks on crowdsourcing platforms like Fiverr and Upwork, and the head of product at Hugging Face, an AI platform, called it "the most impressive AI tool I've ever tried." It's not clear just how impressive Manus actually is yet, but against this backdrop -- the idea of agentic AI as a stepping stone toward AGI -- it was fitting that New York Times columnist Ezra Klein dedicated his podcast on Tuesday to AGI. It also means that the concept has been moving quickly beyond AI circles and into the realm of dinner table conversation. Klein was joined by Ben Buchanan, a Georgetown professor and former special advisor for artificial intelligence in the Biden White House. They discussed lots of things -- what AGI would mean for law enforcement and national security, and why the US government finds it essential to develop AGI before China -- but the most contentious segments were about the technology's potential impact on labor markets. If AI is on the cusp of excelling at lots of cognitive tasks, Klein said, then lawmakers better start wrapping their heads around what a large-scale transition of labor from human minds to algorithms will mean for workers. He criticized Democrats for largely not having a plan. We could consider this to be inflating the fear balloon, suggesting that AGI's impact is imminent and sweeping. Following close behind and puncturing that balloon with a giant safety pin, then, is Gary Marcus, a professor of neural science at New York University and an AGI critic who wrote a rebuttal to the points made on Klein's show. Marcus points out that recent news, including the underwhelming performance of OpenAI's new ChatGPT-4.5, suggests that AGI is much more than three years away. He says core technical problems persist despite decades of research, and efforts to scale training and computing capacity have reached diminishing returns. Large language models, dominant today, may not even be the thing that unlocks AGI. He says the political domain does not need more people raising the alarm about AGI, arguing that such talk actually benefits the companies spending money to build it more than it helps the public good. Instead, we need more people questioning claims that AGI is imminent. That said, Marcus is not doubting that AGI is possible. He's merely doubting the timeline. Just after Marcus tried to deflate it, the AGI balloon got blown up again. Three influential people -- Google's former CEO Eric Schmidt, Scale AI's CEO Alexandr Wang, and director of the Center for AI Safety Dan Hendrycks -- published a paper called "Superintelligence Strategy." By "superintelligence," they mean AI that "would decisively surpass the world's best individual experts in nearly every intellectual domain," Hendrycks told me in an email. "The cognitive tasks most pertinent to safety are hacking, virology, and autonomous-AI research and development -- areas where exceeding human expertise could give rise to severe risks."
[2]
The meaning of artificial general intelligence remains unclear
Testing for AGI may not be the best measure of AI's abilities and impacts When Chinese AI startup DeepSeek burst onto the scene in January, it sparked intense chatter about its efficient and cost-effective approach to generative AI. But like its U.S. competitors, DeepSeek's main goal is murkier than just efficiency: The company aims to create the first true artificial general intelligence, or AGI. For years, AI developers -- from small startups to big tech companies -- have been racing toward this elusive endpoint. AGI, they say, would mark a critical turning point, enabling computer systems to replace human workers, making AI more trustworthy than human expertise and positioning artificial intelligence as the ultimate tool for societal advancement. Yet, years into the AI race, AGI remains a poorly defined and contentious concept. Some computer scientists and companies frame it as a threshold for AI's potential to transform society. Tech advocates suggest that once we have superintelligent computers, day-to-day life could fundamentally change, affecting work, governance and the pace of scientific discovery. But many experts are skeptical about how close we are to an AI-powered utopia and the practical utility of AGI. There's limited agreement about what AGI means, and no clear way to measure it. Some argue that AGI functions as little more than a marketing term, offering no concrete guidance on how to best use AI models or their societal impact. In tech companies' quest for AGI, the public is tasked with navigating a landscape filled with marketing hype, science fiction and actual science, says Ben Recht, a computer scientist at the University of California, Berkeley. "It becomes very tricky. That's where we get stuck." Continuing to focus on claims of imminent AGI, he says, could muddle our understanding of the technology at hand and obscure AI's current societal effects. The term "artificial general intelligence" was coined in the mid-20th century. Initially, it denoted an autonomous computer capable of performing any task a human could, including physical activities like making a cup of coffee or fixing a car. But as advancements in robotics lagged behind the rapid progress of computing, most in the AI field shifted to narrower definitions of AGI: Initially, this included AI systems that could autonomously perform tasks a human could at a computer, and more recently, machines capable of executing most of only the "economically valuable" tasks a human could handle at a computer, such as coding and writing accurate prose. Others think AGI should encompass flexible reasoning ability and autonomy when tackling a number of unspecified tasks. "The problem is that we don't know what we want," says Arseny Moskvichev, a machine learning engineer at Advanced Micro Devices and computer scientist at the Santa Fe Institute. "Because the goal is so poorly defined, there's also no roadmap for reaching it, nor reliable way to identify it." To address this uncertainty, researchers have been developing benchmark tests, similar to student exams, to evaluate how close systems are to achieving AGI. For example, in 2019, French computer scientist and former Google engineer Francois Chollet released the Abstract Reasoning Corpus for Artificial General Intelligence, or ARC-AGI. In this test, an AI model is repeatedly given some examples of colored squares arranged in different patterns on a grid. For each example set, the model is then asked to generate a new grid to complete the visual pattern, a task intended to assess flexible reasoning and the model's ability to acquire new skills outside of its training. This setup is similar to Raven's Progressive Matrices, a test of human reasoning. The test results are part of what OpenAI and other tech companies use to guide model development and assessment. Recently, OpenAI's soon-to-be released o3 model achieved vast improvement on ARC-AGI compared to previous AI models, leading some researchers to view it as a breakthrough in AGI. Others disagree. "There's nothing about ARC that's general. It's so specific and weird," Recht says. Computer scientist José Hernández-Orallo of the Universitat Politécnica de València in Spain says that it's possible ARC-AGI just assesses a model's ability to recognize images. Previous generations of language models could solve similar problems with high accuracy if the visual grids were described using text, he says. That context makes o3's results seem less novel. Plus, there's a limited number of grid configurations, and some AI models with tons of computing power at their disposal can "brute force" their way to correct responses simply by generating all possible answers and selecting the one that fits best -- effectively reducing the task to a multiple-choice problem rather than one of novel reasoning. To tackle each ARC-AGI task, o3 uses an enormous amount of computing power (and money) at test time. Operating in an efficient mode, it costs about $30 per task, Chollet says. In a less-efficient setting, one task can cost about $3,000. Just because the model can solve the problem doesn't mean it's practical or feasible to routinely use it on similarly challenging tasks. It's not just ARC-AGI that's contentious. Determining whether an AI model counts as AGI is complicated by the fact that every available test of AI ability is flawed. Just as Raven's Progressive Matrices and other IQ tests are imperfect measures of human intelligence and face constant criticism for their biases, so too do AGI evaluations, says Amelia Hardy, a computer scientist at Stanford University. "It's really hard to know that we're measuring [what] we care about." Open AI's o3, for example, correctly responded to more than a quarter of the questions in a collection of exceptionally difficult problems called the Frontier Math benchmark, says company spokesperson Lindsay McCallum. These problems take professional mathematicians hours to solve, according to the benchmark's creators. On its face, o3 seems successful. But this success may be partly due to OpenAI funding the benchmark's development and having access to the testing dataset while developing o3. Such data contamination is a continual difficulty in assessing AI models, especially for AGI, where the ability to generalize and abstract beyond training data is considered crucial. AI models can also seem to perform very well on complex tasks, like accurately responding to Ph.D.-level science questions, while failing on more basic ones, like counting the number of r's in "strawberry." This discrepancy indicates a fundamental misalignment in how these computer systems process queries and understand problems. Yet, AI developers aren't collecting and sharing the sort of information that might help researchers better gauge why, Hernández-Orallo says. Many developers provide only a single accuracy value for each benchmark, as opposed to a detailed breakdown of which types of questions a model answered correctly and incorrectly. Without additional detail, it's impossible to determine where a model is struggling, why it's succeeding, or if any single test result demonstrates a breakthrough in machine intelligence, experts say. Even if a model passes a specific, quantifiable test with flying colors, such as the bar exam or medical boards, there are few guarantees that those results will translate to expert-level human performance in messy, real-world conditions, says David Rein, a computer scientist at the nonprofit Model Evaluation and Threat Research based in Berkeley, Calif. For instance, when asked to write legal briefs, generative AI models still routinely fabricate information. Although one study of GPT-4 suggested that the chatbot could outperform human physicians in diagnosing patients, more detailed research has found that comparable AI models perform far worse than actual doctors when faced with tests that mimic real-world conditions. And no study or benchmark result indicates that current AI models should be making major governance decisions over expert humans. The benchmarks that OpenAI, DeepSeek and other companies report results from "do not tell us much about capabilities in the real world," Rein says, although they can provide reasonable information for comparing models to one another. So far, researchers have tested AI models largely by providing them with discrete problems that have known answers. However, humans don't always have the luxury of knowing what the problem before them is, whether it's solvable or in what time frame. People can identify key problems, prioritize tasks and, crucially, know when to give up. It's not yet clear that machines can or do. The most advanced "autonomous" agents struggle to navigate ordering pizza or groceries online. Large language models and neural networks have improved dramatically in recent months and years. "They're definitely useful in a lot of different ways," Recht says, pointing to the ability of newer models to summarize and digest data or produce serviceable computer code with few mistakes. But attempts like ARC-AGI to measure general ability don't necessarily clarify what AI models can and can't be used for. "I don't think it matters whether or not they're artificially generally intelligent," he says. What might matter far more, based on the recent DeepSeek news, is traditional metrics of cost per task. Utility is determined by both the quality of a tool and whether that tool is affordable enough to scale. Intelligence is only part of the equation. AGI is supposed to serve as a guiding light for AI developers. If achieved, it's meant to herald a major turning point for society, beyond which machines will function independently on equal or higher footing than humans. But so far, AI has had major societal impacts, both good and bad, without any consensus on whether we're nearing (or have already surpassed) this turning point, Recht, Hernández-Orallo and Hardy say. For example, scientists are using AI tools to create new, potentially lifesaving molecules. Yet in classrooms worldwide, generative chatbots have disrupted assessments. A recent Pew Research Center survey found that more and more U.S. teens are outsourcing assignments to ChatGPT. And a 2023 study in Nature reported that growing AI assistance in university courses has made cheating harder to detect. To say that AI will become transformative once we reach AGI ignores all the trees for the forest.
Share
Share
Copy Link
As the concept of Artificial General Intelligence (AGI) gains mainstream attention, experts debate its definition, timeline, and potential impact on society, while questioning the validity of current benchmarks and tests.
Artificial General Intelligence (AGI) has recently become a topic of widespread discussion, moving beyond AI research circles and into mainstream conversations. This shift is exemplified by its appearance on Ezra Klein's podcast and in various news articles 1. The concept of AGI, often described as an AI system capable of outperforming humans on cognitive tasks, has sparked debates about its definition, achievability, and potential impacts on society.
One of the primary challenges in the AGI discourse is the lack of a clear, universally accepted definition. Experts argue that the term is often shaped by the goals of researchers or companies developing the technology 1. This ambiguity has led to calls for more precise definitions, as highlighted in a recent paper by authors from Hugging Face and Google.
The AGI conversation has been fueled by recent developments in AI technology. For instance, the launch of Manus, a new AI model from China designed for "agentic" tasks, has been described as a potential glimpse into AGI 1. Such advancements have led some to speculate about the imminent arrival of AGI and its potential impacts on labor markets and national security.
However, not all experts share the optimism about AGI's near-term potential. Gary Marcus, a professor of neural science at New York University, argues that core technical problems persist despite decades of research 1. He suggests that the focus on AGI may be benefiting companies more than serving the public good.
As the race towards AGI intensifies, researchers have developed benchmark tests to evaluate AI systems' progress. One such test is the Abstract Reasoning Corpus for Artificial General Intelligence (ARC-AGI), created by Francois Chollet 2. However, these tests face criticism for their limitations and potential biases.
The pursuit of AGI raises questions about its practical utility and societal impact. Some experts, like Ben Recht from the University of California, Berkeley, warn that focusing on claims of imminent AGI could obscure our understanding of current AI technologies and their effects on society 2.
The development of AGI-like capabilities often requires significant computing power and financial resources. For example, OpenAI's o3 model, which showed improvements on the ARC-AGI test, uses an enormous amount of computing power, with costs ranging from $30 to $3,000 per task 2. This raises questions about the practicality and scalability of such systems.
As the debate continues, researchers and companies are exploring various approaches to AGI development. Some, like DeepSeek, are focusing on efficiency and cost-effectiveness in their pursuit of AGI 2. Others are calling for a more nuanced understanding of AI capabilities and their societal implications, moving beyond the binary question of whether AGI has been achieved.
In conclusion, while AGI remains a captivating concept, the path towards its realization is fraught with challenges, debates, and uncertainties. As research progresses, it becomes increasingly important to critically examine claims about AGI and consider the practical implications of AI advancements on society, economy, and technology.
Reference
[1]
[2]
OpenAI CEO Sam Altman's recent statements about achieving AGI and aiming for superintelligence have ignited discussions about AI progress, timelines, and implications for the workforce and society.
20 Sources
20 Sources
Google's DeepMind takes the lead in the AI race with the launch of Veo 2, outperforming OpenAI's Sora in video generation capabilities. This development, along with other AI advancements, marks a significant shift in the competitive landscape of artificial intelligence.
4 Sources
4 Sources
A comprehensive look at the AI landscape in 2024, highlighting key developments, challenges, and future trends in the rapidly evolving field.
8 Sources
8 Sources
OpenAI is reportedly on the verge of a significant breakthrough in AI reasoning capabilities. This development has sparked both excitement and concern in the tech community, as it marks a crucial step towards Artificial General Intelligence (AGI).
7 Sources
7 Sources
A comprehensive look at the latest developments in AI, including OpenAI's internal struggles, regulatory efforts, new model releases, ethical concerns, and the technology's impact on Wall Street.
6 Sources
6 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved