Curated by THEOUTPOST
On Thu, 3 Apr, 12:02 AM UTC
7 Sources
[1]
DeepMind has detailed all the ways AGI could wreck the world
As AI hype permeates the Internet, tech and business leaders are already looking toward the next step. AGI, or artificial general intelligence, refers to a machine with human-like intelligence and capabilities. If today's AI systems are on a path to AGI, we will need new approaches to ensure such a machine doesn't work against human interests. Unfortunately, we don't have anything as elegant as Isaac Asimov's Three Laws of Robotics. Researchers at DeepMind have been working on this problem and have released a new technical paper (PDF) that explains how to develop AGI safely, which you can download at your convenience. It contains a huge amount of detail, clocking in at 108 pages before references. While some in the AI field believe AGI is a pipe dream, the authors of the DeepMind paper project that it could happen by 2030. With that in mind, they aimed to understand the risks of a human-like synthetic intelligence, which they acknowledge could lead to "severe harm." This work has identified four possible types of AGI risk, along with suggestions on how we might ameliorate said risks. The DeepMind team, led by company co-founder Shane Legg, categorized the negative AGI outcomes as misuse, misalignment, mistakes, and structural risks. The first possible issue, misuse, is fundamentally similar to current AI risks. However, because AGI will be more powerful by definition, the damage it could do is much greater. A ne'er-do-well with access to AGI could misuse the system to do harm, for example, by asking the system to identify and exploit zero-day vulnerabilities or create a designer virus that could be used as a bioweapon. DeepMind says companies developing AGI will have to conduct extensive testing and create robust post-training safety protocols. Essentially, AI guardrails on steroids. They also suggest devising a method to suppress dangerous capabilities entirely, sometimes called "unlearning," but it's unclear if this is possible without substantially limiting models.
[2]
DeepMind's 145-page paper on AGI safety may not convince skeptics | TechCrunch
Google DeepMind on Wednesday published an exhaustive paper on its safety approach to AGI, roughly defined as AI that can accomplish any task a human can. AGI is a bit of a controversial subject in the AI field, with naysayers suggesting that it's little more than a pipe dream. Others, including major AI labs like Anthropic, warn that it's around the corner, and could result in catastrophic harms if steps aren't taken to implement appropriate safeguards. DeepMind's 145-page document, which was co-authored by DeepMind co-founder Shane Legg, predicts that AGI could arrive by 2030, and that it may result in what the authors call "severe harm." The paper doesn't concretely define this, but gives the alarmist example of "existential risks" that "permanently destroy humanity." "[We anticipate] the development of an Exceptional AGI before the end of the current decade," the authors wrote. "An Exceptional AGI is a system that has a capability matching at least 99th percentile of skilled adults on a wide range of non-physical tasks, including metacognitive tasks like learning new skills." Off the bat, the paper contrasts DeepMind's treatment of AGI risk mitigation with Anthropic's and OpenAI's. Anthropic, it says, places less emphasis on "robust training, monitoring, and security," while OpenAI is overly bullish on "automating" a form of AI safety research known as alignment research. The paper also casts doubt on the viability of superintelligent AI -- AI that can perform jobs better than any human. (OpenAI recently claimed that it's turning its aim from AGI to superintelligence.) Absent "significant architectural innovation," the DeepMind authors aren't convinced that superintelligent systems will emerge soon -- if ever. The paper does find it plausible, though, that current paradigms will enable "recursive AI improvement": a positive feedback loop where AI conducts its own AI research to create more sophisticated AI systems. And this could be incredibly dangerous, assert the authors. At a high level, the paper proposes and advocates for the development of techniques to block bad actors' access to hypothetical AGI, improve the understanding of AI systems' actions, and "harden" the environments in which AI can act. It acknowledges that many of the techniques are nascent and have "open research problems," but cautions against ignoring the safety challenges possibly on the horizon. "The transformative nature of AGI has the potential for both incredible benefits as well as severe harms," the authors write. "As a result, to build AGI responsibly, it is critical for frontier AI developers to proactively plan to mitigate severe harms." Some experts disagree with the paper's premises, however. Heidy Khlaaf, chief AI scientist at the nonprofit AI Now Institute, told TechCrunch that she thinks the concept of AGI is too ill-defined to be "rigorously evaluated scientifically." Another AI researcher, Matthew Guzdial, an assistant professor at the University of Alberta, said that he doesn't believe recursive AI improvement is realistic at present. "[Recursive improvement] is the basis for the intelligence singularity arguments," Guzdial told TechCrunch, "but we've never seen any evidence for it working." Sandra Wachter, a researcher studying tech and regulation at Oxford, argues that a more realistic concern is AI reinforcing itself with "inaccurate outputs." "With the proliferation of generative AI outputs on the internet and the gradual replacement of authentic data, models are now learning from their own outputs that are riddled with mistruths, or hallucinations," she told TechCrunch. "At this point, chatbots are predominantly used for search and truth-finding purposes. That means we are constantly at risk of being fed mistruths and believing them because they are presented in very convincing ways." Comprehensive as it may be, DeepMind's paper seems unlikely to settle the debates over just how realistic AGI is -- and the areas of AI safety in most urgent need of attention.
[3]
Google says now is the time to plan for AGI safety
Why it matters: With better-than-human level AI (or AGI) now on many experts' horizon, we can't put off figuring out how to keep these systems from running wild, Google argues in a paper released Wednesday. Between the lines: The argument spotlights a continuing rift in the AI world. Driving the news: In the 145-page paper, Google DeepMind outlines its strategy to "address the risk of harms consequential enough to significantly harm humanity," dividing the concerns into four main areas: Zoom in: Google dives into the concerns around each category and offers ways to reduce the risks, including measures taken by AI developers, as well as societal shifts and policy changes. The big picture: Google's paper comes as interest in addressing the risks of AI has fallen significantly, especially in government circles where a desire to beat other countries has seemingly supplanted concerns over existential risk that were a hot topic as recently as last year. This shift was on full display at the Paris AI Action Summit. Yes, but: Excitement for AI's possibilities shouldn't overshadow safety concerns, Legg said. The intrigue: It's difficult to predict when AGI will arrive, though many experts have been pulling in their predictions. Even with today's less-than-superintelligent AI, there are examples of the kinds of issues that Google warns about in its paper.
[4]
Taking a responsible path to AGI
In the paper, we detail how we're taking a systematic and comprehensive approach to AGI safety, exploring four main risk areas: misuse, misalignment, accidents, and structural risks, with a deeper focus on misuse and misalignment. Misuse occurs when a human deliberately uses an AI system for harmful purposes. Improved insight into present-day harms and mitigations continues to enhance our understanding of longer-term severe harms and how to prevent them. For instance, misuse of present-day generative AI includes producing harmful content or spreading inaccurate information. In the future, advanced AI systems may have the capacity to more significantly influence public beliefs and behaviors in ways that could lead to unintended societal consequences. The potential severity of such harm necessitates proactive safety and security measures. As we detail in the paper, a key element of our strategy is identifying and restricting access to dangerous capabilities that could be misused, including those enabling cyber attacks. We're exploring a number of mitigations to prevent the misuse of advanced AI. This includes sophisticated security mechanisms which could prevent malicious actors from obtaining raw access to model weights that allow them to bypass our safety guardrails; mitigations that limit the potential for misuse when the model is deployed; and threat modelling research that helps identify capability thresholds where heightened security is necessary. Additionally, our recently launched cybersecurity evaluation framework takes this work step a further to help mitigate against AI-powered threats. Even today, we regularly evaluate our most advanced models, such as Gemini, for potential dangerous capabilities. Our Frontier Safety Framework delves deeper into how we assess capabilities and employ mitigations, including for cybersecurity and biosecurity risks. For AGI to truly complement human abilities, it has to be aligned with human values. Misalignment occurs when the AI system pursues a goal that is different from human intentions. We have previously shown how misalignment can arise with our examples of specification gaming, where an AI finds a solution to achieve its goals, but not in the way intended by the human instructing it, and goal misgeneralization. For example, an AI system asked to book tickets to a movie might decide to hack into the ticketing system to get already occupied seats - something that a person asking it to buy the seats may not consider. We're also conducting extensive research on the risk of deceptive alignment, i.e. the risk of an AI system becoming aware that its goals do not align with human instructions, and deliberately trying to bypass the safety measures put in place by humans to prevent it from taking misaligned action. Our goal is to have advanced AI systems that are trained to pursue the right goals, so they follow human instructions accurately, preventing the AI using potentially unethical shortcuts to achieve its objectives. We do this through amplified oversight, i.e. being able to tell whether an AI's answers are good or bad at achieving that objective. While this is relatively easy now, it can become challenging when the AI has advanced capabilities. As an example, even Go experts didn't realize how good Move 37, a move that had a 1 in 10,000 chance of being used, was when AlphaGo first played it. To address this challenge, we enlist the AI systems themselves to help us provide feedback on their answers, such as in debate. Once we can tell whether an answer is good, we can use this to build a safe and aligned AI system. A challenge here is to figure out what problems or instances to train the AI system on. Through work on robust training, uncertainty estimation and more, we can cover a range of situations that an AI system will encounter in real-world scenarios, creating AI that can be trusted. Through effective monitoring and established computer security measures, we're aiming to mitigate harm that may occur if our AI systems did pursue misaligned goals. Monitoring involves using an AI system, called the monitor, to detect actions that don't align with our goals. It is important that the monitor knows when it doesn't know whether an action is safe. When it is unsure, it should either reject the action or flag the action for further review. All this becomes easier if the AI decision making becomes more transparent. We do extensive research in interpretability with the aim to increase this transparency. To facilitate this further, we're designing AI systems that are easier to understand. For example, our research on Myopic Optimization with Nonmyopic Approval (MONA) aims to ensure that any long-term planning done by AI systems remains understandable to humans. This is particularly important as the technology improves. Our work on MONA is the first to demonstrate the safety benefits of short-term optimization in LLMs. Led by Shane Legg, Co-Founder and Chief AGI Scientist at Google DeepMind, our AGI Safety Council (ASC) analyzes AGI risk and best practices, making recommendations on safety measures. The ASC works closely with the Responsibility and Safety Council, our internal review group co-chaired by our COO Lila Ibrahim and Senior Director of Responsibility Helen King, to evaluate AGI research, projects and collaborations against our AI Principles, advising and partnering with research and product teams on our highest impact work. Our work on AGI safety complements our depth and breadth of responsibility and safety practices and research addressing a wide range of issues, including harmful content, bias, and transparency. We also continue to leverage our learnings from safety in agentics, such as the principle of having a human in the loop to check in for consequential actions, to inform our approach to building AGI responsibly. Externally, we're working to foster collaboration with experts, industry, governments, nonprofits and civil society organizations, and take an informed approach to developing AGI. For example, we're partnering with nonprofit AI safety research organizations, including Apollo and Redwood Research, who have advised on a dedicated misalignment section in the latest version of our Frontier Safety Framework. Through ongoing dialogue with policy stakeholders globally, we hope to contribute to international consensus on critical frontier safety and security issues, including how we can best anticipate and prepare for novel risks. Our efforts include working with others in the industry - via organizations like the Frontier Model Forum - to share and develop best practices, as well as valuable collaborations with AI Institutes on safety testing. Ultimately, we believe a coordinated international approach to governance is critical to ensure society benefits from advanced AI systems. Educating AI researchers and experts on AGI safety is fundamental to creating a strong foundation for its development. As such, we've launched a new course on AGI Safety for students, researchers and professionals interested in this topic. Ultimately, our approach to AGI safety and security serves as a vital roadmap to address the many challenges that remain open. We look forward to collaborating with the wider AI research community to advance AGI responsibly and help us unlock the immense benefits of this technology for all.
[5]
DeepMind is already figuring out ways to keep us safe from AGI
Artificial General Intelligence is a huge topic right now -- even though no one has agreed what AGI really is. Some scientists think it's still hundreds of years away and would need tech that we can't even begin to imagine yet, while Google DeepMind says it could be here by 2030 -- and it's already planning safety measures. It's not uncommon for the science community to disagree on topics like this, and it's good to have all of our bases covered with people planning for both the immediate future and the distant future. Still, five years is a pretty shocking number. Recommended Videos Right now, the "frontier AI" projects known to the public are all LLMs -- fancy little word guessers and image generators. ChatGPT, for example, is still terrible at math, and every model I've ever tried is awful at listening to instructions and editing their responses accurately. Anthropic's Claude still hasn't beaten Pokémon and as impressive as the language skills of these models are, they're still trained on all the worst writers in the world and have picked up plenty of bad habits. It's hard to imagine jumping from what we have now to something that, in DeepMind's words, displays capabilities that match or exceed "that of the 99th percentile of skilled adults." In other words, DeepMind thinks that AGI will be as smart or smarter than the top 1% of humans in the world. So, what kind of risks does DeepMind think an Einstein-level AGI could pose? According to the paper, we have four main categories: misuse, misalignment, mistakes, and structural risks. They were so close to four Ms, that's a shame. DeepMind considers "misuse" to be things like influencing political races with deepfake videos or impersonating people during scams. It mentions in the conclusion that its approach to safety "centers around blocking malicious actors' access to dangerous capabilities." That sounds great, but DeepMind is a part of Google and there are plenty of people who would consider the U.S. tech giant to be a potential bad actor itself. Sure, Google hopefully won't try to steal money from elderly people by impersonating their grandchildren -- but that doesn't mean it won't use AGI to bring itself profit while ignoring consumers' best interests. It looks like "misalignment" is the Terminator situation, where we ask the AI for one thing and it just does something completely different. That one is a little bit uncomfortable to think about. DeepMind says the best way to counter this is to make sure we understand how our AI systems work in as much detail as possible, so we can tell when something is going wrong, where it's going wrong, and how to fix it. This goes against the whole "spontaneous emergence" of capabilities and the concept that AGI will be so complex that we won't know how it works. Instead, if we want to stay safe, we need to make sure we do know what's going on. I don't know how hard that will be but it definitely makes sense to try. The last two categories refer to accidental harm -- either mistakes on the AI's part or things just getting messy when too many people are involved. For this, we need to make sure we have systems in place that approve the actions an AGI wants to take and prevent different people from pulling it in opposite directions. While DeepMind's paper is completely exploratory, it seems there are already plenty of ways we can imagine AGI going wrong. This isn't as bad as it sounds -- the problems we can imagine are the problems we can best prepare for. It's the problems we don't anticipate that are scarier, so let's hope we're not missing anything big.
[6]
Read Google DeepMind's new paper on responsible artificial general intelligence (AGI).
Artificial general intelligence (AGI), AI that's at least as capable as humans at most cognitive tasks, could be here within the coming years. It has the power to transform our world, acting as a catalyst for progress in many areas of life. But it is essential with any technology this powerful, that it is developed responsibly. Today, we're sharing our views on AGI safety and security as we navigate the path toward this transformational technology. This new paper, titled "An Approach to Technical AGI Safety and Security," is a starting point for vital conversations with the wider industry about how we monitor AGI progress.
[7]
Google DeepMind outlines safety framework for future AGI development - SiliconANGLE
Google DeepMind outlines safety framework for future AGI development A new paper from Google DeepMind Technologies Ltd., the artificial intelligence research laboratory that is part of Alphabet Inc., has laid out a comprehensive framework for navigating the risks and responsibilities of developing Artificial General Intelligence, marking one of the clearest commitments yet from the company on AGI safety. AGIs are theoretical AI systems that would be capable of performing any intellectual task that a human can, with the ability to generalize knowledge across domains. Differing from existing narrow AI models, which are designed for specific tasks, AGI aims for broad cognitive flexibility, learning and adapting in ways that mirror human reasoning. Put more simply, AGI is AI that would surpass human capabilities - a system smarter than humans - but while AI development isn't at that stage yet, many predict that AGI may only be a few years away. When the point is hit, it will offer both amazing discoveries, but could also present serious risks. Google DeepMind discusses many of those risks in its "An Approach to Technical AGI Safety & Security" paper, which outlines its strategy for the responsible development of AGI. The paper categorizes the risks of AGI into four primary areas: misuse, misalignment, accidents and structural risks. Misuse of AGI is the concern that such systems could be weaponized or exploited for harmful purposes, while misalignment refers to the difficulty of ensuring these systems consistently act in line with human values and intentions. Accidents with AGI could involve unintended behaviors or failures that could emerge as systems operate in complex environments, while structural risks in AGI include the potential of the technology to have societal impacts, such as economic disruption or power imbalances. According to the paper, addressing misalignment will involve ensuring that AGI systems are trained to pursue appropriate goals and accurately follow human instructions. Training of AGI systems should include developing methods for amplified oversight and uncertainty estimation to prepare AGI systems for a wide range of real-world scenarios. To mitigate these threats, DeepMind is focusing on enhanced oversight, training techniques and tools for estimating uncertainty in AGI outputs. The company is also researching scalable supervision methods, which aim to keep increasingly capable models grounded in human intent even as they grow more autonomous. The paper stresses the importance of transparency and interpretability in AGI systems. DeepMind says it is investing heavily in interpretability research to make these systems more understandable and auditable -- key steps for aligning them with human norms and ensuring responsible use. While the paper does discuss AGI through what Google DeepMind is doing, it is noted that no single organization should tackle AGI development alone. The paper argues that collaboration with the broader research community, policymakers and civil society will be essential in shaping a safe AGI future.
Share
Share
Copy Link
Google DeepMind releases a detailed 145-page paper outlining potential risks and safety measures for Artificial General Intelligence (AGI), which they predict could arrive by 2030. The paper addresses four main risk categories and proposes strategies to mitigate them.
Google DeepMind has released a comprehensive 145-page paper detailing its approach to ensuring the safety of Artificial General Intelligence (AGI), which it predicts could arrive as early as 2030 12. The paper, co-authored by DeepMind co-founder Shane Legg, outlines four main categories of AGI risks and proposes strategies to mitigate them 13.
DeepMind defines AGI as a system with capabilities matching or exceeding the 99th percentile of skilled adults across a wide range of non-physical tasks, including metacognitive skills like learning new abilities 2. The paper identifies four primary risk categories:
To address these risks, DeepMind proposes several safety measures:
The paper has sparked debate within the AI community. Some experts, like Heidy Khlaaf from the AI Now Institute, argue that AGI is too ill-defined to be scientifically evaluated 2. Others, such as Matthew Guzdial from the University of Alberta, question the feasibility of recursive AI improvement 2.
Sandra Wachter, an Oxford researcher, suggests that a more immediate concern is AI reinforcing itself with inaccurate outputs, potentially leading to the proliferation of misinformation 2.
Despite the controversy, DeepMind emphasizes the importance of proactive planning to mitigate potential severe harms 2. The company has established an AGI Safety Council, led by Shane Legg, to analyze AGI risks and recommend safety measures 4.
DeepMind's paper contrasts its approach with those of other major AI labs. It suggests that Anthropic places less emphasis on robust training and monitoring, while OpenAI focuses more on automating alignment research 2.
The release of this paper comes at a time when interest in addressing AI risks has reportedly decreased in government circles, with a focus on competition seemingly overshadowing safety concerns 3.
As the debate around AGI's feasibility and timeline continues, DeepMind's comprehensive safety plan represents a significant step in addressing potential risks. Whether AGI arrives by 2030 or later, the proactive approach to safety and ethics in AI development is likely to shape the future of the industry and its regulation.
Reference
[1]
[4]
[5]
As artificial intelligence rapidly advances, the concept of Artificial General Intelligence (AGI) sparks intense debate among experts, raising questions about its definition, timeline, and potential impact on society.
4 Sources
4 Sources
OpenAI CEO Sam Altman's recent statements about achieving AGI and aiming for superintelligence have ignited discussions about AI progress, timelines, and implications for the workforce and society.
20 Sources
20 Sources
The AI Action Summit in Paris marks a significant shift in global attitudes towards AI, emphasizing economic opportunities over safety concerns. This change in focus has sparked debate among industry leaders and experts about the balance between innovation and risk management.
7 Sources
7 Sources
As AI technology rapidly advances, experts challenge common misconceptions about AI safety, emphasizing the need for a more nuanced and comprehensive approach to managing both current and future risks.
2 Sources
2 Sources
OpenAI is reportedly on the verge of a significant breakthrough in AI reasoning capabilities. This development has sparked both excitement and concern in the tech community, as it marks a crucial step towards Artificial General Intelligence (AGI).
7 Sources
7 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved