3 Sources
3 Sources
[1]
Enterprise AI coding grows teeth: GPT‑5.2‑Codex weaves security into large-scale software refactors
With the recent release of GPT 5.2, OpenAI updated other related models, including its popular coding model Codex, bringing more agentic use cases to its fold. GPT-5.2-Codex, which OpenAI called in a blog post "the most advanced agentic coding model yet for complex, real-world software engineer," has been optimized for long-horizon work with agents and will have stronger cybersecurity capabilities. The model is an offshoot of GPT-5.2, optimized for agentic building. "GPT‑5.2-Codex represents a step forward in how advanced AI can support real-world software engineering and specialized domains like cybersecurity -- helping developers and defenders tackle complex, long-horizon work, and strengthening the tools available for responsible security research," the company said in its blog post. Enterprises can access the new Codex model "in all Codex surfaces for paid ChatGPT users, and working towards safely enabling access to GPT‑5.2-Codex for API users in the coming weeks." The company is also piloting a program with invite-only trusted users to access "more permissive models for vetted professionals and organizations" for defensive cybersecurity work to determine a balance between accessibility and safety. Advances in cybersecurity with models OpenAI calls GPT-5.2-Codex its strongest cybersecurity model yet. Still, as its capabilities grow, the company said it needs to design a deployment approach that accounts for future growth and supports defensive cybersecurity. "As our models continue to advance along the intelligence frontier, we've observed that these improvements also translate to capability jumps in specialized domains such as cybersecurity," the company said. OpenAI said in its system card that it tested the model on three benchmarks: Capture-the-Flag (CTF) evals, CVE-Bench and Cyber Range. GPT-5.2-Codex became the company's strongest-performing model in CTF evals, which they attributed to compaction, or "the ability for the model to work coherently across multiple context windows." The model scored 87% in CVE-Bench, outperforming other models, with GPT-5.1-Codex-Max coming in a close second. This increase would be helpful for tasks involving running commands around vulnerability discovery and trying tools "with an almost brute-force approach." In the long-form Cyber Range test, the model had a combined pass rate of 72.7%. GPT-5.1-Codex-Max scored 81.8%. Cybersecurity deployment project OpenAI said some users of its GPT-5.1-Codex-Max, which launched in November, uncovered a source code exposure vulnerability in React and subsequently reported it. According to OpenAI, Andrew MacPherson, a security researcher at Privy, used GPT-5.1-Codex-Max to assess how well the model could support real-world vulnerability research. The model instead surfaced unexpected behavior. With improvements in cybersecurity capabilities for GPT-5.2-Codex and potentially for models that come after it, OpenAI said it needs to balance the deployment of frontier models with the necessary tools for defensive cybersecurity. While GPT-5.2-Codex "does not reach a high level of cyber capability under our Preparedness Framework," the company plans to bring selected users to test security capabilities. (OpenAI's Preparedness Framework to measure and track potential harms from AI to humans) "Security teams can run into restrictions when attempting to emulate threat actors, analyze malware to support remediation, or stress test critical infrastructure. We are developing a trusted access pilot to remove that friction for qualifying users and organizations and enable trusted defenders to use frontier AI cyber capabilities to accelerate cyberdefense." OpenAI said. Agentic frontiers GPT-5.2 already received praise from users for its use in business tasks and workflows. With the Codex version, some of those capabilities could transfer, especially as enterprises plan to use the model to code their agents. The company said the model improves long-horizon work through compaction, offering strong performance on extensive code changes. It also features improved performance on Windows. In benchmark testing, GPT-5.2-Codex performed the best on accuracy compared to its previous versions. "With these improvements, Codex is more capable at working in large repositories over extended sessions with full context intact. It can more reliably complete complex tasks like large refactors, code migrations, and feature builds -- continuing to iterate without losing track, even when plans change or attempts fail," OpenAI said. Since it launched in previews in May, Codex has helped usher in acceptance of agentic and vibe coding in the enterprise AI builder space. Along with Windsurf, Cursor, Claude Code and the many coding agents from Google, the platform moved LLMs from simple code completion to generating and starting asynchronous coding projects for users.
[2]
OpenAI's GPT-5.2-Codex advances software engineering with better reasoning and context understanding - SiliconANGLE
OpenAI's GPT-5.2-Codex advances software engineering with better reasoning and context understanding OpenAI Group PBC has released a new version of GPT-Codex, its agentic artificial intelligence coding model that's designed to automate complex software engineering tasks. The latest version, GPT-5.2-Codex, builds upon the capabilities of GPT-5.2, adding improvements in context compaction, large code refactoring, Windows environment performance and cybersecurity, the company said. According to OpenAI's blog post, GPT-5.2-Codex achieved an unmatched score on the SWE-Bench Pro benchmark, with 56.4% accuracy, besting all other coding models launched so far. It also racked up a score of 64% on the Terminal-Bench 2.0 benchmark, outperforming earlier versions of Codex. It's aided by stronger vision capabilities that allow it to better interpret screenshots, technical diagrams and user interfaces, so it can translate software design mockups into functional prototypes. OpenAI said GPT-5.2-Codex is meant to advance software engineering, which is the process of designing, developing, testing and maintaining applications by combining engineering principles with programming knowledge. The goal is to create high-quality, reliable and maintainable software that's able to evolve to meet user's needs. The new model's ability to tackle time-consuming tasks makes it especially good at "refactoring", which is a key element of software engineering that involves adapting an application's codebase, not to add new features, but to enhance its quality. For instance, it can tweak an application's codebase to reduce its memory usage or increase its response times, OpenAI said. GPT-5.2-Codex represents the culmination of several iterative advances in OpenAI's generative AI coding capabilities. Earlier models such as GPT-5-Codex and GPT-5.1-Codex-Max progressively improved aspects such as multistep reasoning, long-context understanding and tool integration within coding environments, and GPT-5.2-Codex builds on this work in various ways. For instance, OpenAI said it performs better at long-range task execution thanks to its context compaction capabilities, which allow it to undertake sustained, multistep coding tasks without forgetting context. It's also better at large-scale code management, improving its code refactoring, migration and feature-building capabilities, the company said. Moreover, it shows improved performance in Windows-based coding environments, and there are more advanced cybersecurity features that enable AI-assisted bug detection, testing and mitigation. OpenAI said the focus on improving security is critical to AI-driven software engineering, because modern enterprise infrastructures demand reliable software. Developers and security teams need all the help they can get when it comes to uncovering and fixing complex software vulnerabilities, and they also need to be sure that whatever AI coding tools are being used don't create more. Codex's ability to fix software was highlighted earlier this month when the security researcher Andrew MacPherson used GPT-5.1-Codex-Max to examine the CVE-2025-55182 vulnerability in React. In a blog post, he explained how the model used a combination of iterative assessments, fuzz testing and exploit analysis to mitigate the issue, while also surfacing and mitigating previously unknown vulnerabilities in the process. OpenAI said the improvements introduced in GPT-5.2-Codex will have real implications for enterprises, enabling them to automate the most complex and repetitive software engineering tasks and integrate more sophisticated features in their applications. By simultaneously supporting cybersecurity operations, it can help organizations to improve efficiency, reduce human error and maintain a competitive advantage in software engineering, the company promised. The company said GPT-5.2-Codex is available from today to all paid ChatGPT users. It's planning to extend access to application programming interface users in the coming week, and will also launch an invite-only trusted access pilot program for vetted security professionals focused on defensive cybersecurity.
[3]
OpenAI rolls out GPT‑5.2-Codex for advanced coding and cybersecurity workflows
OpenAI has released GPT‑5.2-Codex, the most advanced agentic coding model yet for complex, real-world software engineering. It is designed to handle long-horizon tasks, large code changes, and cybersecurity workflows. GPT‑5.2-Codex is a version of GPT‑5.2 further optimized for agentic coding in Codex. Key improvements include long-horizon work via context compaction, stronger performance on large code changes like refactors and migrations, improved reliability in Windows environments, and significantly stronger cybersecurity capabilities. The company said that as models advance along the intelligence frontier, the improvements also lead to capability gains in specialized domains such as cybersecurity. For example, a security researcher recently used GPT‑5.1-Codex-Max with Codex CLI to identify and responsibly disclose a React vulnerability that could expose source code. While GPT‑5.2-Codex has stronger cybersecurity capabilities than previous models, it does not yet reach a 'High' level under OpenAI's Preparedness Framework, and its deployment is structured to accommodate future capability growth. 1. Pushing the frontier on real-world software engineering GPT‑5.2-Codex builds on GPT‑5.2's strengths in professional knowledge work and GPT‑5.1-Codex-Max's frontier agentic coding and terminal-using capabilities. It now offers improved long-context understanding, reliable tool calling, enhanced factuality, and native context compaction, making it a dependable partner for long-running coding tasks while remaining token-efficient. Performance on cybersecurity evaluations shows a sharp capability increase from GPT‑5-Codex to GPT‑5.1-Codex-Max, and now to GPT‑5.2-Codex. OpenAI evaluates models as if they could reach 'High' cybersecurity capability in the future and has added safeguards to manage dual-use risks. Modern society depends on software reliability in sectors like banking, healthcare, communications, and essential services. Vulnerabilities may exist long before detection, and identifying, validating, and fixing them relies on engineers and independent security researchers. On December 11, 2025, the React team disclosed three security vulnerabilities affecting React Server Components. Andrew MacPherson, a principal security engineer at Privy (a Stripe company), used GPT‑5.1-Codex-Max with Codex CLI to study a prior critical React vulnerability, React2Shell (CVE-2025-55182). MacPherson first attempted zero-shot analyses, then higher-volume iterative prompting. When these failed, he guided Codex through standard defensive security workflows, including setting up a local test environment, reasoning through attack surfaces, and fuzzing malformed inputs. Codex surfaced unexpected behaviors, leading to the discovery of previously unknown vulnerabilities, which were responsibly disclosed to the React team. These cases show how advanced AI can accelerate defensive security work, while also highlighting the dual-use risk of misuse by bad actors. 4. Empowering cyberdefense through trusted access Security teams often face restrictions when emulating threat actors, analyzing malware, or stress-testing infrastructure. OpenAI is piloting a trusted access program to reduce friction for qualifying users and organizations, enabling them to use frontier AI capabilities for defensive purposes. The invite-only pilot is for vetted security professionals with a history of responsible disclosure and organizations with clear cybersecurity use cases. Participants receive access to advanced models to conduct legitimate dual-use work. OpenAI encourages qualified professionals to express interest and provide feedback. GPT‑5.2-Codex advances real-world software engineering and cybersecurity workflows. By gradually rolling out the model with safeguards, access controls, and collaboration with the security community, OpenAI aims to maximize defensive impact while reducing misuse risk. Insights from this release will guide future expansions as software and cyber frontiers evolve. GPT‑5.2-Codex is available today across all Codex surfaces for paid ChatGPT users. API access is expected in the coming weeks. The invite-only trusted access pilot for vetted cybersecurity professionals and organizations is running in parallel, balancing accessibility with safety.
Share
Share
Copy Link
OpenAI released GPT-5.2-Codex, its most advanced agentic coding model designed for complex software engineering tasks. The model achieves 56.4% accuracy on SWE-Bench Pro and introduces stronger cybersecurity capabilities, including an 87% score on CVE-Bench. Available to paid ChatGPT users, it features context compaction for long-horizon work and a trusted access program for vetted security professionals.
OpenAI has released GPT-5.2-Codex, positioning it as the most advanced agentic coding model for handling complex software engineering tasks in real-world environments
1
. The model represents a significant evolution from its predecessors, GPT-5-Codex and Codex-Max, with optimizations specifically targeting long-horizon work with agents and enterprise AI applications2
. Available today to all paid ChatGPT users across Codex surfaces, the model will extend API access to users in the coming weeks3
.
Source: VentureBeat
GPT-5.2-Codex achieved an unmatched 56.4% accuracy on the SWE-Bench Pro benchmark, outperforming all other coding models released to date
2
. The model also scored 64% on Terminal-Bench 2.0, demonstrating substantial improvements over earlier versions. These gains stem from enhanced reasoning capabilities, stronger vision features for interpreting technical diagrams and user interfaces, and improved long-context understanding that enables sustained multistep coding tasks without losing track of objectives2
.A defining feature of GPT-5.2-Codex is context compaction, which allows the model to work coherently across multiple context windows during extended sessions
1
. This capability proves essential for large-scale software refactors, code migrations, and feature builds where developers need the model to maintain full context even when plans change or initial attempts fail. The model can now reliably complete time-consuming refactoring tasks that enhance code quality without adding new features, such as reducing memory usage or increasing response times2
.
Source: SiliconANGLE
OpenAI notes that with these improvements, agentic coding becomes more practical in large repositories over extended sessions, addressing a critical need in enterprise software development
1
. The model also demonstrates improved reliability in Windows environments, expanding its utility across different development platforms3
.OpenAI calls GPT-5.2-Codex its strongest cybersecurity model yet, with performance gains across multiple security benchmarks
1
. The model scored 87% on CVE-Bench, outperforming other models including GPT-5.1-Codex-Max, which came in second1
. This improvement proves valuable for vulnerability discovery tasks and running commands with an almost brute-force approach to testing tools. In Capture-the-Flag evaluations, GPT-5.2-Codex became OpenAI's strongest-performing model, attributed to its compaction abilities1
.The model's cybersecurity capabilities were validated in real-world scenarios. Andrew MacPherson, a principal security engineer at Privy, used GPT-5.1-Codex-Max to assess vulnerability research capabilities and instead surfaced unexpected behavior that led to discovering a React source code exposure vulnerability
1
. MacPherson guided the model through defensive security workflows, including setting up test environments, analyzing attack surfaces, and fuzzing malformed inputs, which ultimately led to the discovery of previously unknown software vulnerabilities that were responsibly disclosed to the React team3
.Related Stories
Recognizing the dual-use nature of advanced cybersecurity capabilities, OpenAI is launching a trusted access program for vetted security professionals and organizations focused on defensive cybersecurity
1
. The invite-only pilot aims to remove friction that security researchers face when emulating threat actors, analyzing malware, or stress-testing critical infrastructure3
. Participants with a history of responsible disclosure will receive access to more permissive models for legitimate dual-use work.While GPT-5.2-Codex does not reach a "High" level of cyber capability under OpenAI's Preparedness Framework, the company is structuring deployment to accommodate future capability growth
3
. This measured approach reflects the company's awareness that improvements along the intelligence frontier translate to capability jumps in specialized domains like cybersecurity1
. Security researchers and organizations interested in the program can express interest and provide feedback to help shape future expansions.The release carries significant implications for enterprises seeking to automate complex software engineering tasks while maintaining security standards. Modern society depends on software reliability across banking, healthcare, communications, and essential services, where software vulnerabilities may exist long before detection
3
. By simultaneously supporting code completion, complex refactoring, and cybersecurity operations, GPT-5.2-Codex offers organizations tools to improve efficiency, reduce human error, and maintain competitive advantages in software engineering2
.Since launching in previews in May, Codex has helped drive acceptance of agentic and vibe coding in the enterprise AI builder space
1
. Alongside platforms like Windsurf, Cursor, and Claude Code, the platform has moved large language models from simple code completion to generating and starting asynchronous coding projects for users. As OpenAI works toward safely enabling API access in the coming weeks, developers should watch for how the model performs in production environments and whether the trusted access program successfully balances accessibility with safety concerns.Summarized by
Navi
[1]
[2]