Curated by THEOUTPOST
On Wed, 16 Apr, 4:02 PM UTC
5 Sources
[1]
OpenAI's latest AI models have a new safeguard to prevent biorisks
OpenAI says that it deployed a new system to monitor its latest AI reasoning models, o3 and o4-mini, for prompts related to biological and chemical threats. The system aims to prevent the models from offering advice that could instruct someone on carrying out potentially harmful attacks, according to OpenAI's safety report. O3 and o4-mini represent a meaningful capability increase over OpenAI's previous models, the company says, and thus pose new risks in the hands of bad actors. According to OpenAI's internal benchmarks, o3 is more skilled at answering questions around creating certain types of biological threats in particular. For this reason -- and to mitigate other risks -- OpenAI created the new monitoring system, which the company describes as a "safety-focused reasoning monitor." The monitor, custom-trained to reason about OpenAI's content policies, runs on top of o3 and o4-mini. It's designed to identify prompts related to biological and chemical risk and instruct the models to refuse to offer advice on those topics. To establish a baseline, OpenAI had red teamers spend around 1,000 hours flagging "unsafe" biorisk-related conversations from o3 and o4-mini. During a test in which OpenAI simulated the "blocking logic" of its safety monitor, the models declined to respond to risky prompts 98.7% of the time, according to OpenAI. OpenAI acknowledges that its test didn't account for people who might try new prompts after getting blocked by the monitor, which is why the company says it'll continue to rely in part on human monitoring. O3 and o4-mini don't cross OpenAI's "high risk" threshold for biorisks, according to the company. However, compared to o1 and GPT-4, OpenAI says that early versions of o3 and o4-mini proved more helpful at answering questions around developing biological weapons. The company is actively tracking how its models could make it easier for malicious users to develop chemical and biological threats, according to OpenAI's recently updated Preparedness Framework. OpenAI is increasingly relying on automated systems to mitigate the risks from its models. For example, to prevent GPT-4o's native image generator from creating child sexual abuse material (CSAM), OpenAI says it uses on a reasoning monitor similar to the one the company deployed for o3 and o4-mini. Yet several researchers have raised concerns OpenAI isn't prioritizing safety as much as it should. One of the company's red-teaming partners, Metr, said it had relatively little time to test o3 on a benchmark for deceptive behavior. Meanwhile, OpenAI decided not to release a safety report for its GPT-4.1 model, which launched earlier this week.
[2]
OpenAI says it may 'adjust' its safety requirements if a rival lab releases 'high-risk' AI | TechCrunch
In an update to its Preparedness Framework, the internal framework OpenAI uses to decide whether AI models are safe and what safeguards, if any, are needed during development and release, OpenAI said that it may "adjust" its requirements if a rival AI lab releases a "high-risk" system without comparable safeguards. The change reflects the increasing competitive pressures on commercial AI developers to deploy models quickly. OpenAI has been accused of lowering safety standards in favor of faster releases, and of failing to deliver timely reports detailing its safety testing. Perhaps anticipating criticism, OpenAI claims that it wouldn't make these policy adjustments lightly, and that it would keep its safeguards at "a level more protective." "If another frontier AI developer releases a high-risk system without comparable safeguards, we may adjust our requirements," wrote OpenAI in a blog post published Tuesday afternoon. "However, we would first rigorously confirm that the risk landscape has actually changed, publicly acknowledge that we are making an adjustment, assess that the adjustment does not meaningfully increase the overall risk of severe harm, and still keep safeguards at a level more protective." The refreshed Preparedness Framework also makes clear that OpenAI is relying more heavily on automated evaluations to speed up product development. The company says that, while it hasn't abandoned human-led testing altogether, it has built "a growing suite of automated evaluations" that can "keep up with [a] faster [model release] cadence." According to the Financial Times, OpenAI gave testers less than a week for safety checks for an upcoming major model -- a compressed timeline compared to previous releases. The publication's sources also alleged that many of OpenAI's safety tests are now conducted on earlier versions of models than the versions released to the public. Other changes to OpenAI's framework pertain to how the company categorizes models according to risk, including models that can conceal their capabilities, evade safeguards, prevent their own shutdown, and even self-replicate. OpenAI says that it'll now focus on whether models meet one of two thresholds: "high" capability or "critical" capability. OpenAI's definition of the former is a model that could "amplify existing pathways to severe harm." The latter are models that "introduce unprecedented new pathways to severe harm," per the company. "Covered systems that reach high capability must have safeguards that sufficiently minimize the associated risk of severe harm before they are deployed," wrote OpenAI in its blog post. "Systems that reach critical capability also require safeguards that sufficiently minimize associated risks during development." The changes are the first OpenAI has made to the Preparedness Framework since 2023.
[3]
OpenAI updates its system for evaluating AI risks
Why it matters: OpenAI uses its "preparedness framework" to decide whether AI models are safe and what, if any, safeguards are needed during development and for public release. Driving the news: In another change, OpenAI will no longer specifically evaluate models on their persuasive capabilities -- an area where its recent models had already risen to "medium" risk level. In addition to continuing to monitor the risk that AI might be used to create bioweapons or gain a capacity for self-improvement, OpenAI is adding several new "research" categories -- such as whether a model can conceal capabilities, evade safeguards or seek to replicate itself or prevent shutdowns. What they're saying: In an interview, OpenAI safety researcher Sandhini Agarwal told Axios the changes are designed to shift the company's efforts toward safeguards that protect against the most severe risks. Between the lines: The new research categories align with broader industry discussion around the prospect that models might act differently in testing than in the real world and that they might try to conceal their capabilities.
[4]
OpenAI May Adjust Safety Standards As Per Competitor AI Models
OpenAI has updated its 'Preparedness Framework', the ChatGPT developer announced in a blog post. The revised framework now mentions that the company "may adjust" its requirements if another "frontier AI developer" releases a "high-risk" model without comparable safeguards. The move comes amid growing competition OpenAI faces in the generative Artificial Intelligence (AI) space from Chinese counterparts like Deepseek. This comes at a time when the company recently lowered its safety testing duration for its AI models for a faster rollout. "The Preparedness Framework is OpenAI's approach to tracking and preparing for frontier capabilities that create new risks of severe harm," as per OpenAI's revised document. The company currently focuses on three areas of "frontier capability", which it calls Tracker Categories: The company further revealed that they tracked capabilities based on five key criteria, which includes risks that are "plausible, measurable, severe, net new, and instantaneous or irremediable." To put it in simpler terms, the Preparedness Framework helps OpenAI track areas and rate its frontier AI models so that bad actors cannot misuse its Large Language Models (LLMs). For example, ChatGPT can not help its users in building a nuclear weapon or conducting cyber attacks. OpenAI puts these risks from AI in two categories - High Capability and Critical Capability. OpenAI defines 'High Capability 'as models that could amplify existing pathways to severe harm, and 'Critical Capability' as models that could introduce "unprecedented" new pathways to severe harm. The company had clarified these capability levels for their models as part of the revised framework. Among other things, the company has left a scope for using less strict safeguards for its frontier AI models, if the competition does not provide the same safeguards as OpenAI. According to an OpenAI blog post, frontier AI models are "highly capable foundation models that could possess dangerous capabilities sufficient to pose severe risks to public safety". "If another frontier AI developer releases a high-risk system without comparable safeguards, we may adjust our requirements," OpenAI said in the blog post. "However, we would first rigorously confirm that the risk landscape has actually changed, publicly acknowledge that we are making an adjustment, assess that the adjustment does not meaningfully increase the overall risk of severe harm, and still keep safeguards at a level more protective." This reflects a shift in OpenAI's priorities, where it has been trying to stay ahead of its competition by compromising on safety. According to a Financial Times report, the ChatGPT developer was recently accused of relaxing its safety standards for testing new AI models. The company is reportedly doing this to favour faster rollout of models for public use. OpenAI reportedly gave testers less than a week, a shorter timeline than before, for doing safety checks. Steven Adler, a former OpenAI employee, said in a post on X (formerly Twitter), "OpenAI is quietly reducing its safety commitments. Omitted from OpenAI's list of Preparedness Framework changes: No longer requiring safety tests of finetuned models." Apart from the revisions mentioned above, the company has also moved persuasion risks outside its Preparedness Framework. OpenAI said, "Persuasion risks will be handled outside the Preparedness Framework, including via our Model Spec, restricting the use of our tools for political campaigning or lobbying, and our ongoing investigations into misuse of our products." In 2023, OpenAI constituted a Preparedness Team to review reports of the safety parameters for frontier AI models and identify the risks. The company also came out with its Preparedness Framework in the same year. In a 2023 blog post, OpenAI said, "We will define risk thresholds that trigger baseline safety measures. We have defined thresholds for risk levels along the following initial tracked categories - cybersecurity, CBRN (chemical, biological, radiological, nuclear threats), persuasion, and model autonomy."
[5]
OpenAI sharpens focus on safety with updated Preparedness Framework By Investing.com
Investing.com -- OpenAI, the artificial intelligence research lab, has released an updated version of its Preparedness Framework, aimed at addressing potential risks associated with advanced AI capabilities. This comes after CEO Sam Altman was questioned about AI safety during a recent TED interview with Chris Anderson. The updated framework is designed to provide a more focused approach to identifying and mitigating specific risks. It introduces stronger requirements to minimize those risks and offers clearer guidance on how the organization evaluates, governs, and discloses its safeguards. OpenAI also plans to invest heavily in making its preparedness work more actionable, rigorous, and transparent as the technology advances. The update includes clear criteria for prioritizing high-risk capabilities, using a structured risk assessment process to evaluate whether a frontier capability could lead to severe harm. It assigns a category to each capability based on defined criteria, tracking those that meet five key criteria. The framework also introduces sharper capability categories. Tracked Categories include Biological and Chemical capabilities, Cybersecurity capabilities, and AI Self-improvement capabilities. OpenAI believes these areas will yield some of the most transformative benefits from AI, especially in science, engineering, and research. In addition to the Tracked Categories, the organization is introducing Research Categories. These are areas that could pose risks of severe harm but do not yet meet the criteria to be Tracked Categories. Current focus areas under this new category include Long-range Autonomy, Sandbagging (intentionally underperforming), Autonomous Replication and Adaptation, Undermining Safeguards, and Nuclear and Radiological. The updated framework also clarifies capability levels, streamlining them to two clear thresholds: High capability and Critical capability. Both levels require safeguards to sufficiently minimize the associated risk of severe harm before deployment and during development. The Safety Advisory Group, a team of internal safety leaders, reviews these safeguards and makes recommendations to OpenAI Leadership. The updated framework also includes scalable evaluations to support more frequent testing and defined Safeguards Reports to provide more detail about how strong safeguards are designed and their effectiveness is verified. In the event of a shift in the frontier landscape, where another AI developer releases a high-risk system without comparable safeguards, OpenAI may adjust its requirements. However, it will first confirm that the risk landscape has changed, publicly acknowledge the adjustment, assess that the adjustment does not increase the overall risk of severe harm, and ensure safeguards remain protective. OpenAI will continue to publish its Preparedness findings with each frontier model release, as it has done for GPT‑4o, OpenAI o1, Operator, o3‑mini, deep research, and GPT‑4.5, and share new benchmarks to support broader safety efforts across the field. This update follows a TED interview in which CEO Sam Altman was questioned about AI safety, particularly concerning agentic AI. Altman acknowledged that the stakes are rising, describing agentic AI as the most interesting and consequential safety problem OpenAI has faced so far.
Share
Share
Copy Link
OpenAI revises its Preparedness Framework to address emerging AI risks, introduces new safeguards for biorisks, and considers adjusting safety standards in response to competitor actions.
OpenAI, a leading artificial intelligence research lab, has announced significant updates to its Preparedness Framework, a system designed to evaluate and mitigate risks associated with advanced AI models. The revisions come in response to the rapidly evolving AI landscape and growing competitive pressures in the industry 12.
One of the key additions to OpenAI's safety measures is a new monitoring system for its latest AI reasoning models, o3 and o4-mini. This "safety-focused reasoning monitor" is specifically designed to prevent the models from offering advice related to biological and chemical threats 1. During testing, the system demonstrated a 98.7% success rate in declining to respond to risky prompts 1.
In a notable shift, OpenAI has indicated that it may adjust its safety requirements if a rival AI developer releases a "high-risk" system without comparable safeguards 24. This decision reflects the increasing competitive pressures in the AI industry and has raised concerns about potential compromises on safety standards 2.
The updated framework introduces new categories for evaluating AI risks:
OpenAI has also streamlined its capability levels to two main thresholds: High capability and Critical capability 25.
To keep pace with the rapid advancements in AI, OpenAI is increasingly relying on automated evaluations for safety testing. This shift allows for a faster model release cadence while maintaining rigorous safety checks 2. However, this approach has sparked debate, with some researchers expressing concerns about potentially compromised safety standards 12.
OpenAI has committed to publishing its Preparedness findings with each frontier model release and sharing new benchmarks to support broader safety efforts across the field 5. This move aims to increase transparency and foster collaboration in addressing AI safety challenges.
The updates to OpenAI's Preparedness Framework have significant implications for the AI industry. While the company maintains its commitment to safety, the potential for adjusting standards based on competitor actions has raised eyebrows among experts 24. Former OpenAI employee Steven Adler criticized the company for "quietly reducing its safety commitments" 4.
As AI capabilities continue to advance rapidly, the balance between innovation and safety remains a critical challenge for the industry. OpenAI's revised framework represents an attempt to navigate this complex landscape while maintaining a competitive edge in the fast-paced world of AI development 12345.
Reference
[2]
OpenAI has significantly reduced the time allocated for safety testing of its new AI models, raising concerns about potential risks and the company's commitment to thorough evaluations.
4 Sources
4 Sources
OpenAI, the creator of ChatGPT, has announced a partnership with the U.S. AI Safety Institute. The company commits to providing early access to its future AI models and emphasizes its dedication to AI safety in a letter to U.S. lawmakers.
3 Sources
3 Sources
OpenAI has published safety scores for its latest AI model, GPT-4, identifying medium-level risks in areas such as privacy violations and copyright infringement. The company aims to increase transparency and address potential concerns about AI safety.
2 Sources
2 Sources
OpenAI has introduced its new O1 series of AI models, featuring improved performance, safety measures, and specialized capabilities. These models aim to revolutionize AI applications across various industries.
27 Sources
27 Sources
Miles Brundage, ex-OpenAI policy researcher, accuses the company of rewriting its AI safety history, sparking debate on responsible AI development and deployment strategies.
3 Sources
3 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved