3 Sources
3 Sources
[1]
Google DeepMind strengthens the Frontier Safety Framework
We're expanding our risk domains and refining our risk assessment process. AI breakthroughs are transforming our everyday lives, from advancing mathematics, biology and astronomy to realizing the potential of personalized education. As we build increasingly powerful AI models, we're committed to responsibly developing our technologies and taking an evidence-based approach to staying ahead of emerging risks. Today, we're publishing the third iteration of our Frontier Safety Framework (FSF) -- our most comprehensive approach yet to identifying and mitigating severe risks from advanced AI models. This update builds upon our ongoing collaborations with experts across industry, academia and government. We've also incorporated lessons learned from implementing previous versions and evolving best practices in frontier AI safety. With this update, we're introducing a Critical Capability Level (CCL)* focused on harmful manipulation -- specifically, AI models with powerful manipulative capabilities that could be misused to systematically and substantially change beliefs and behaviors in identified high stakes contexts over the course of interactions with the model, reasonably resulting in additional expected harm at severe scale. This addition builds on and operationalizes research we've done to identify and evaluate mechanisms that drive manipulation from generative AI. Going forward, we'll continue to invest in this domain to better understand and measure the risks associated with harmful manipulation. We've also expanded our Framework to address potential future scenarios where misaligned AI models might interfere with operators' ability to direct, modify or shut down their operations. While our previous version of the Framework included an exploratory approach centered on instrumental reasoning CCLs (i.e., warning levels specific to when an AI model starts to think deceptively), with this update we now provide further protocols for our machine learning research and development CCLs focused on models that could accelerate AI research and development to potentially destabilizing levels. In addition to the misuse risks arising from these capabilities, there are also misalignment risks stemming from a model's potential for undirected action at these capability levels, and the likely integration of such models into AI development and deployment processes. To address risks posed by CCLs, we conduct safety case reviews prior to external launches when relevant CCLs are reached. This involves performing detailed analyses demonstrating how risks have been reduced to manageable levels. For advanced machine learning research and development CCLs, large-scale internal deployments can also pose risk, so we are now expanding this approach to include such deployments. Our Framework is designed to address risks in proportion to their severity. We've sharpened our CCL definitions specifically to identify the critical threats that warrant the most rigorous governance and mitigation strategies. We continue to apply safety and security mitigations before specific CCL thresholds are reached and as part of our standard model development approach. Lastly, in this update, we go into more detail about our risk assessment process. Building on our core early-warning evaluations, we describe how we conduct holistic assessments that include systematic risk identification, comprehensive analyses of model capabilities and explicit determinations of risk acceptability. This latest update to our Frontier Safety Framework represents our continued commitment to taking a scientific and evidence-based approach to tracking and staying ahead of AI risks as capabilities advance toward AGI. By expanding our risk domains and strengthening our risk assessment processes, we aim to ensure that transformative AI benefits humanity, while minimizing potential harms. Our Framework will continue evolving based on new research, stakeholder input and lessons from implementation. We remain committed to working collaboratively across industry, academia and government. The path to beneficial AGI requires not just technical breakthroughs, but also robust frameworks to mitigate risks along the way. We hope that our updated Frontier Safety Framework contributes meaningfully to this collective effort.
[2]
Google DeepMind expands frontier AI safety framework to counter manipulation and shutdown risks - SiliconANGLE
Google DeepMind expands frontier AI safety framework to counter manipulation and shutdown risks Alphabet Inc.'s Google DeepMind lab today rolled out the third version of its Frontier Safety Framework to strengthen oversight of powerful artificial intelligence systems that could pose risks if left unchecked. The third iteration of the framework introduces a new focus on manipulation capabilities and expands safety reviews to cover scenarios where models may resist human shutdown or control. Leading the list of updates is the addition of what DeepMind calls a Critical Capability Level for harmful manipulation that addresses the possibility that advanced models could influence or alter human beliefs and behaviors at scale in high-stakes contexts. The capability builds on years of research into the mechanics of persuasion and manipulation in generative AI and formalizes how it will measure, monitor and mitigate such risks before models reach critical thresholds. The updated framework also brings greater scrutiny to misalignment and control challenges, the idea that highly capable systems could, in theory, resist modification or shutdown. DeepMind now requires safety case reviews not only before external deployment but also for large-scale internal rollouts once a model hits certain CCL thresholds. The reviews are designed to force teams to demonstrate that potential risks have been adequately identified, mitigated and judged acceptable before release. Along with new risk categories, the updated framework refines how DeepMind defines and applies capability levels. The refinements are designed to clearly separate routine operational concerns from the most consequential threats, ensuring governance mechanisms trigger at the right time. Notably, the Frontier Safety Framework stresses that mitigations must be applied proactively before systems cross dangerous boundaries, not just reactively after problems emerge. "This latest update to our Frontier Safety Framework represents our continued commitment to taking a scientific and evidence-based approach to tracking and staying ahead of AI risks as capabilities advance toward artificial general intelligence," said Google Deepmind's Four Flynn, Helen King and Anca Dragan, in a blog post. "By expanding our risk domains and strengthening our risk assessment processes, we aim to ensure that transformative AI benefits humanity while minimizing potential harms." The authors added that DeepMind expects the FSF to continue evolving with new research, deployment experience and stakeholder feedback.
[3]
Deepmind details AGI safety via frontier safety framework
In a September 2025 research paper, Google DeepMind presented its strategy for the safe development of Artificial General Intelligence (AGI). The research details frameworks and governance structures designed to address the significant risks of powerful AI systems. The paper, titled "An Approach to Technical AGI Safety and Security," focuses on the danger of "misaligned" AI, where an AI system's goals conflict with human values and well-being. Such a conflict could cause widespread harm, even if the AI appears to be functioning correctly from a technical perspective. DeepMind's strategy combines technical safety, risk assessment, and collaboration with the broader research community to manage these challenges. A key part of DeepMind's strategy is the Frontier Safety Framework. This protocol is designed to proactively identify and mitigate severe risks from advanced AI models before they are fully developed or widely deployed. The framework establishes clear protocols for assessing model capabilities in high-risk areas such as cybersecurity, autonomy, and harmful manipulation. DeepMind has also established internal governance bodies to supervise its AI development. The Responsibility and Safety Council works with the AGI Safety Council to oversee research and development, ensuring that ethical, technical, and security risks are systematically addressed. The company's research emphasizes that transparency and external collaboration are essential for the responsible development of AGI. The paper serves as a call to action for the global AI research community to work together on managing the complex risks associated with increasingly powerful artificial intelligence systems to prevent unintended negative outcomes.
Share
Share
Copy Link
Google DeepMind has released the third iteration of its Frontier Safety Framework, expanding risk domains and refining assessment processes to address potential dangers of advanced AI systems, including manipulation and control risks.
Google DeepMind has unveiled the third iteration of its Frontier Safety Framework (FSF), marking a significant step forward in addressing the potential risks associated with advanced artificial intelligence systems
1
. This update comes as AI technologies continue to transform various aspects of our lives, from advancing scientific research to personalizing education.Source: Google DeepMind
The updated framework introduces several key enhancements to identify and mitigate severe risks from advanced AI models. One of the most notable additions is the introduction of a Critical Capability Level (CCL) focused on harmful manipulation
1
. This new CCL addresses AI models with powerful manipulative capabilities that could potentially be misused to systematically alter beliefs and behaviors in high-stakes contexts, resulting in severe harm at scale2
.The framework now also covers potential future scenarios where misaligned AI models might interfere with operators' ability to direct, modify, or shut down their operations
1
. This expansion reflects growing concerns about AI systems potentially resisting human control or modification2
.DeepMind has strengthened its safety review process, now requiring detailed analyses demonstrating how risks have been reduced to manageable levels before external launches or large-scale internal deployments
1
. The company has also refined its risk assessment process, incorporating systematic risk identification, comprehensive analyses of model capabilities, and explicit determinations of risk acceptability2
.The Frontier Safety Framework emphasizes the importance of applying safety and security mitigations proactively, before specific CCL thresholds are reached and as part of the standard model development approach
1
. This proactive stance aims to ensure that potential risks are addressed early in the development process.Related Stories
DeepMind's approach to AI safety extends beyond internal measures. The company stresses the importance of transparency and external collaboration in the responsible development of AGI
3
. The updated framework and accompanying research serve as a call to action for the global AI research community to work together on managing the complex risks associated with increasingly powerful AI systems3
.Google DeepMind acknowledges that the Frontier Safety Framework will continue to evolve based on new research, stakeholder input, and lessons from implementation
1
. The company remains committed to working collaboratively across industry, academia, and government to ensure that transformative AI benefits humanity while minimizing potential harms2
.Summarized by
Navi
[1]
[2]
22 Sept 2025•Technology
03 Apr 2025•Science and Research
16 Apr 2025•Technology