4 Sources
[1]
OpenAI pledges to publish AI safety test results more often | TechCrunch
OpenAI is moving to publish the results of its internal AI model safety evaluations more regularly in what the outfit is pitching as an effort to increase transparency. On Wednesday, OpenAI launched the Safety Evaluations Hub, a webpage showing how the company's models score on various tests for harmful content generation, jailbreaks, and hallucinations. OpenAI says that it'll use the hub to share metrics on an "ongoing basis," and that it intends to update the hub with "major model updates" going forward. "As the science of AI evaluation evolves, we aim to share our progress on developing more scalable ways to measure model capability and safety," wrote OpenAI in a blog post. "By sharing a subset of our safety evaluation results here, we hope this will not only make it easier to understand the safety performance of OpenAI systems over time, but also support community effortsā to increase transparency across the field." OpenAI says that it may add additional evaluations to the hub over time. In recent months, OpenAI has raised the ire of some ethicists for reportedly rushing the safety testing of certain flagship models and failing to release technical reports for others. The company's CEO, Sam Altman, also stands accused of misleading OpenAI executives about model safety reviews prior to his brief ouster in November 2023. Late last month, OpenAI was forced to roll back an update to the default model powering ChatGPT, GPT-4o, after users began reporting that it responded in an overly validating and agreeable way. X became flooded with screenshots of ChatGPT applauding all sorts of problematic, dangerous decisions and ideas. OpenAI said that it would implement several fixes and changes to prevent future such incidents, including introducing an opt-in "alpha phase" for some models that would allow certain ChatGPT users to test the models and give feedback before launch.
[2]
OpenAI will show how models do on hallucination tests and 'illicit advice'
Sam Altman Co-founder and CEO of OpenAI speaks during the Italian Tech Week 2024 at OGR Officine Grandi Riparazioni on September 25, 2024 in Turin, Italy. OpenAI on Wednesday announced a new "safety evaluations hub," a webpage where it will publicly display artificial intelligence models' safety results and how they perform on tests for hallucinations, jailbreaks and harmful content, such as "hateful content or illicit advice." OpenAI said it used the safety evaluations "internally as one part of our decision making about model safety and deployment," and that while system cards release safety test results when a model is launched, OpenAI will from now on "share metrics on an ongoing basis." "We will update the hub periodically as part of our ongoing company-wide effort to communicate more proactively about safety," OpenAI wrote on the webpage, adding that the safety evaluations hub does not reflect the full safety efforts and metrics and instead shows a "snapshot." The news comes after CNBC reported earlier Wednesday that tech companies that are leading the way in artificial intelligence are prioritizing products over research, according to industry experts who are sounding the alarm about safety.
[3]
OpenAI promises greater transparency on model hallucinations and harmful content
The safety evaluations hub is a new resource that should be regularly updated. OpenAI has launched a new web page called the safety evaluations hub to publicly share information related to things like the hallucination rates of its models. The hub will also highlight if a model produces harmful content, how well it behaves as instructed and attempted jailbreaks. The tech company claims this new page will provide additional transparency on OpenAI, a company that, for context, has faced multiple lawsuits alleging it illegally used copyrighted material to train its AI models. Oh, yeah, and it's worth mentioning that The New York Times claims the tech company accidentally deleted evidence in the newspaper's plagiarism case against it. The safety evaluations hub is meant to expand on OpenAI's system cards. They only outline a development's safety measures at launch, whereas the hub should provide ongoing updates. "As the science of AI evaluation evolves, we aim to share our progress on developing more scalable ways to measure model capability and safety," OpenAI states in its announcement. "By sharing a subset of our safety evaluation results here, we hope this will not only make it easier to understand the safety performance of OpenAI systems over time, but also support community effortsā to increase transparency across the field." OpenAI adds that its working to have more proactive communication in this area throughout the company. Interested parties can look at each of the hub's sections and see information on relevant models, such as GPT-4.1 through 4.5. OpenAI notes that the information provided in this hub is only a "snapshot" and that interested parties should look at its system cards. assessments and other releases for further details. One of the big buts to the entire safety evaluation hub is that OpenAI is the entity doing these tests and choosing what information to share publicly. As a result, there isn't any way to guarantee that the company will share all its issues or concerns with the public.
[4]
OpenAI just published a new safety report on AI development -- here's what you need to know
An all-in-one place to find out about OpenAI safety evaluations OpenAI, in response to claims that it isn't taking AI safety seriously, has launched a new page called the Safety Evaluations Hub. This will publicly record things like hallucination rates of its models, likelihood to publish harmful content, and how easily the model can be circumvented. "This hub provides access to safety evaluation results for OpenAI's models. These evaluations are included in our system cards, and we use them internally as one part of our decision-making about model safety and deployment," the new page states. "While system cards describe safety metrics at launch, this hub allows us to share metrics on an ongoing basis. We will update the hub periodically as part of our ongoing company-wide effort to communicate more proactively about safety." System cards are reports that are published alongside AI models, explaining the testing process, limitations, and where the model could cause problems. OpenAI, alongside competitors like xAI (creators of Grok) and Google's Gemini, have all been accused in recent months of not taking AI safety seriously. Reports have been missing at the launch of new models and can often take months before they are published, or are skipped altogether. In April, the Financial Times reported that OpenAI employees were concerned about the speed of model releases and did not have enough time to complete tests properly. Google's Gemini also raised alarms when it was revealed that one of its more recent models performed worse on safety tests than previous models. It was also reported yesterday that, despite promising a safety report on Grok AI, xAI has now missed its deadline to do so. All of this is to say that OpenAI's attempt to improve transparency and publicly release information on the safety of its models is much needed and is an important step. As the race to be the best speeds up, with AI competitors battling it out at speed, these steps can be easily missed. OpenAI's new safety hub has a lot of information, but it isn't instantly clear what it all means. Luckily, the company also includes a helpful guide on how to use the page. The hub splits safety evaluations into four sections: Harmful content, jailbreaks, hallucinations, and instruction hierarchy. These more specifically mean: Harmful content: Evaluations to check that the model does not comply with requests for harmful content that violates OpenAI's policies, including hateful content. Jailbreaks: These evaluations include adversarial prompts that are meant to circumvent model safety training and induce the model to create harmful content. Hallucinations: How much OpenAI's models make factual errors. Instruction hierarchy: How the model values instructions from different sources (can't be overridden by 3rd party sources). For each of these measurements, OpenAI includes its own testing scores with explanations of what they were checking and how each of their different models ranks. This new hub also includes information on how OpenAI approaches safety and its privacy and security policies.
Share
Copy Link
OpenAI introduces a new Safety Evaluations Hub to publicly share AI model safety test results, aiming to increase transparency in AI development and address concerns about rushing safety testing.
In a move to enhance transparency in AI development, OpenAI has launched a new Safety Evaluations Hub. This online platform is designed to publicly share the results of the company's internal AI model safety evaluations on an ongoing basis 1.
The hub provides insights into four critical areas of AI safety:
OpenAI commits to updating the hub periodically, particularly with major model updates. This approach expands on the company's existing system cards, which only outline safety measures at launch 3.
The launch of the Safety Evaluations Hub comes amid growing concerns about AI safety and transparency in the tech industry:
OpenAI recently encountered issues with its GPT-4o model, which led to a rollback after users reported overly agreeable responses to problematic ideas. In response, the company has introduced an opt-in "alpha phase" for certain models, allowing select users to test and provide feedback before launch 1.
While the Safety Evaluations Hub represents a step towards greater transparency, it's important to note that:
As AI evaluation science evolves, OpenAI aims to share progress on developing more scalable ways to measure model capability and safety, potentially adding additional evaluations to the hub over time 1.
Summarized by
Navi
Google's release of Veo 3, an advanced AI video generation model, has led to a surge in realistic AI-generated content and creative responses from real content creators, raising questions about the future of digital media and misinformation.
2 Sources
Technology
14 hrs ago
2 Sources
Technology
14 hrs ago
OpenAI's internal strategy document reveals plans to evolve ChatGPT into an AI 'super assistant' that deeply understands users and serves as an interface to the internet, aiming to help with various aspects of daily life.
2 Sources
Technology
6 hrs ago
2 Sources
Technology
6 hrs ago
Meta plans to automate up to 90% of product risk assessments using AI, potentially speeding up product launches but raising concerns about overlooking serious risks that human reviewers might catch.
3 Sources
Technology
6 hrs ago
3 Sources
Technology
6 hrs ago
Google quietly released an experimental app called AI Edge Gallery, allowing Android users to download and run AI models locally without an internet connection. The app supports various AI tasks and will soon be available for iOS.
2 Sources
Technology
6 hrs ago
2 Sources
Technology
6 hrs ago
Google announces plans to appeal a federal judge's antitrust decision regarding its online search monopoly, maintaining that the original ruling was incorrect. The case involves proposals to address Google's dominance in search and related advertising, with implications for AI competition.
3 Sources
Policy and Regulation
6 hrs ago
3 Sources
Policy and Regulation
6 hrs ago