5 Sources
5 Sources
[1]
OpenAI's Codex Max solves one of my biggest AI coding annoyances - and adds dramatically faster performance
First Windows-trained Codex enhances cross-platform development tasks. Following a week of major AI programming announcements from Microsoft and Google, OpenAI has joined in the fun. Today, OpenAI is announcing a new version of Codex, its programming-focused AI model. While the announcement was today, the actual GPT-5.1-Codex-Max capability will be available tomorrow in Codex for ChatGPT Plus, Pro, Business, Edu, and Enterprise users. API access is "coming soon." OpenAI says the new Max model "replaces GPT-5.1-Codex as the recommended model for agentic coding tasks in Codex and Codex-like environments." Also: Linus Torvalds is surprisingly optimistic about vibe coding - except for this one 'horrible' use (Disclosure: Ziff Davis, ZDNET's parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.) The big news is that the new Max model can work on bigger assignments. AIs have a context window, which is roughly the amount of information and processing an AI can handle in one shot. In a human, think of it as attention span, or as how much work somebody can get done before needing a new cup of coffee. Also: You'll code faster and spend less with OpenAI's new GPT-5.1 update - here's how Internally, the size of a context window is really how many tokens an AI can handle before running out. Tokens are super-tiny chunks of information. They don't directly correspond to words, letters, or lines of code, but instead are memory representations of those things. Codex has a fairly large context window, but it does get overwhelmed. For example, I've found that when I coded using Codex, it could do very large project assignments without crying. But if I fed it a giant dump from a code crash with a ton of text in the dump, it ran out of context window fairly quickly. That's because the tokens weren't being consumed in project processing. They were being consumed in handling the big data dump. The above-the-fold feature of GPT-5.1-Codex-Max is that it can handle much larger context windows and operate across context windows by using a process called compaction. Compaction is the process that a model can use to shrink or compress portions of the conversation or code context when the overall token window is getting full. You know how when you're talking and talking and talking to a friend and their eyes glaze over, but then you clap your hands together, exclaim "Snap out of it," and regain their attention? That's compaction. What? It can't just be me. Also: The best free AI for coding in 2025 - only 3 make the cut now In the AI, it means that Codex Max can work on much larger tasks, like very complex systemwide refactors (finding, changing, and fixing cross-references). It also allows the AI to work on a single task for hours at a time. OpenAI says Codex can handle a 24-hour task. Compaction isn't new. I've bumped into it in Claude Code on my $100/month Max plan. One difference is Claude has a context window of about 200,000 tokens. At one point during my coding, Claude informed me we had used up quite a bit and recommended either I start a new session or it would run a compaction, which took about five minutes. By contrast, OpenAI says Max can "coherently work over millions of tokens in a single task." The SWE-Bench Verified evaluation is basically a test of AI processing accuracy. It measures how well the AI solves a set of coding problems. According to OpenAI, GPT-5.1-Codex-Max "reaches the same performance" as the previous model, GPT-5.1-Codex. Also: Google's Antigravity puts coding productivity before AI hype - and the result is astonishing In other words, the AI didn't deteriorate in its AIness. But where it gets interesting is that Max can sustain that performance using 30% fewer thinking tokens and run 27% to 42% faster on real-world coding tasks. In my imagination, I picture some engineer over at OpenAI raising a fist and exclaiming, "Darn it, I was trying for 43%, but no." This has some real-world implications. You may recall that the $20/month ChatGPT Plus plan has fairly high restrictions on Codex use, allowing about 5 hours of run before running out of tokens. With Max using 30% fewer tokens, you might get an extra hour of programming in for the same price. OpenAI provided some examples of model performance compared to the non-Max version. In one example, Max used 27,000 tokens compared to 37,000, generated 707 lines of code instead of 864, and ran 27% faster. Also: How to vibe code your first iPhone app with AI - no experience necessary Let's take a moment to focus on that lines of code mention. If you can get code to work in fewer lines of code, it's usually easier to maintain and often runs faster. While you can get crazy making concise code (I'm looking at the few remaining Perl coders out there), fewer lines for the same routine is generally a measure of better programming practice or better algorithms. So if Codex is saving lines, that's generally a good thing. Let's look at a few other examples: Obviously, every task will be different, but faster, better, cheaper is always good. Ever since GPT-5 was released earlier in the year, OpenAI incorporated cybersecurity-specific monitoring to detect and disrupt malicious activity. As you might imagine, if you're letting your agent run free with access to the command line for hours and hours, it could be a juicy target for hackers. OpenAI says the GPT-5.1-Codex-Max model performs "significantly better" on sustained, "long-horizon" reasoning. This sustained performance helps the model improve on cybersecurity as well. Also: How to use ChatGPT to write code - and my top trick for debugging what it generates Codex runs in a secure sandbox where file writing can only take place in a defined workspace and network access is disabled, unless a coder decides to dance with the Devil in the pale moonlight and turn it on. The company says, "We recommend keeping Codex in this restricted-access mode, since enabling internet or web search can introduce prompt-injection risks from untrusted content." Codex works really well on the Mac. It was trained to do so. Many OpenAI developers use Macs for coding. However, GPT-5.1-Codex-Max also works well in Windows. OpenAI reports, "It's also the first model we've trained to operate effectively in Windows environments, with training tasks that make it a better collaborator in the Codex CLI." Also: Is ChatGPT Plus still worth $20? How it compares to the Free and Pro plans Considering OpenAI's growing relationship with Microsoft, it makes sense that OpenAI would give Windows a little more love. Well, that wraps it for this announcement. Also: 10 ChatGPT Codex secrets I only learned after 60 hours of pair programming with it As Codex Max pushes into larger context windows, long-running tasks, and new Windows-specific training, what stands out to you? Do you see compaction and multi-million-token workflows changing how you approach big coding projects? Are the speed and token-efficiency gains enough to shift your everyday development? And if you use Windows, are you planning to try the new Windows-trained model in your workflow? Let me know what you think in the comments below.
[2]
OpenAI debuts GPT‑5.1-Codex-Max coding model that completed a 24-hour task internally
OpenAI has introduced GPT‑5.1-Codex-Max, a new frontier agentic coding model now available in its Codex developer environment. The release marks a significant step forward in AI-assisted software engineering, offering improved long-horizon reasoning, efficiency, and real-time interactive capabilities. GPT‑5.1-Codex-Max will now replace GPT‑5.1-Codex as the default model across Codex-integrated surfaces. The new model is designed to serve as a persistent, high-context software development agent, capable of managing complex refactors, debugging workflows, and project-scale tasks across multiple context windows. It comes on the heels of Google releasing its powerful new Gemini 3 Pro model yesterday, yet still outperforms or matches it on key coding benchmarks: On SWE-Bench Verified, GPT‑5.1-Codex-Max achieved 77.9% accuracy at extra-high reasoning effort, edging past Gemini 3 Pro's 76.2%. It also led on Terminal-Bench 2.0, with 58.1% accuracy versus Gemini's 54.2%, and matched Gemini's score of 2,439 on LiveCodeBench Pro, a competitive coding Elo benchmark. When measured against Gemini 3 Pro's most advanced configuration -- its Deep Thinking model -- Codex-Max holds a slight edge in agentic coding benchmarks, as well. Performance Benchmarks: Incremental Gains Across Key Tasks GPT‑5.1-Codex-Max demonstrates measurable improvements over GPT‑5.1-Codex across a range of standard software engineering benchmarks. On SWE-Lancer IC SWE, it achieved 79.9% accuracy, a significant increase from GPT‑5.1-Codex's 66.3%. In SWE-Bench Verified (n=500), it reached 77.9% accuracy at extra-high reasoning effort, outperforming GPT‑5.1-Codex's 73.7%. Performance on Terminal Bench 2.0 (n=89) showed more modest improvements, with GPT‑5.1-Codex-Max achieving 58.1% accuracy compared to 52.8% for GPT‑5.1-Codex. All evaluations were run with compaction and extra-high reasoning effort enabled. These results indicate that the new model offers a higher ceiling on both benchmarked correctness and real-world usability under extended reasoning loads. Technical Architecture: Long-Horizon Reasoning via Compaction A major architectural improvement in GPT‑5.1-Codex-Max is its ability to reason effectively over extended input-output sessions using a mechanism called compaction. This enables the model to retain key contextual information while discarding irrelevant details as it nears its context window limit -- effectively allowing for continuous work across millions of tokens without performance degradation. The model has been internally observed to complete tasks lasting more than 24 hours, including multi-step refactors, test-driven iteration, and autonomous debugging. Compaction also improves token efficiency. At medium reasoning effort, GPT‑5.1-Codex-Max used approximately 30% fewer thinking tokens than GPT‑5.1-Codex for comparable or better accuracy, which has implications for both cost and latency. Platform Integration and Use Cases GPT‑5.1-Codex-Max is currently available across multiple Codex-based environments, which refer to OpenAI's own integrated tools and interfaces built specifically for code-focused AI agents. These include: * Codex CLI, OpenAI's official command-line tool (@openai/codex), where GPT‑5.1-Codex-Max is already live. * IDE extensions, likely developed or maintained by OpenAI, though no specific third-party IDE integrations were named. * Interactive coding environments, such as those used to demonstrate frontend simulation apps like CartPole or Snell's Law Explorer. * Internal code review tooling, used by OpenAI's engineering teams. For now, GPT‑5.1-Codex-Max is not yet available via public API, though OpenAI states this is coming soon. Users who wish to work with the model in terminal environments today can do so by installing and using the Codex CLI. It is not currently confirmed whether or how the model will integrate into third-party IDEs unless they are built on top of the CLI or future API. The model is capable of interacting with live tools and simulations. Examples shown in the release include: * An interactive CartPole policy gradient simulator, which visualizes reinforcement learning training and activations. * A Snell's Law optics explorer, supporting dynamic ray tracing across refractive indices. These interfaces exemplify the model's ability to reason in real time while maintaining an interactive development session -- effectively bridging computation, visualization, and implementation within a single loop. Cybersecurity and Safety Constraints While GPT‑5.1-Codex-Max does not meet OpenAI's "High" capability threshold for cybersecurity under its Preparedness Framework, it is currently the most capable cybersecurity model OpenAI has deployed. It supports use cases such as automated vulnerability detection and remediation, but with strict sandboxing and disabled network access by default. OpenAI reports no increase in scaled malicious use but has introduced enhanced monitoring systems, including activity routing and disruption mechanisms for suspicious behavior. Codex remains isolated to a local workspace unless developers opt-in to broader access, mitigating risks like prompt injection from untrusted content. Deployment Context and Developer Usage GPT‑5.1-Codex-Max is currently available to users on ChatGPT Plus, Pro, Business, Edu, and Enterprise plans. It will also become the new default in Codex-based environments, replacing GPT‑5.1-Codex, which was a more general-purpose model. OpenAI states that 95% of its internal engineers use Codex weekly, and since adoption, these engineers have shipped ~70% more pull requests on average -- highlighting the tool's impact on internal development velocity. Despite its autonomy and persistence, OpenAI stresses that Codex-Max should be treated as a coding assistant, not a replacement for human review. The model produces terminal logs, test citations, and tool call outputs to support transparency in generated code. Outlook GPT‑5.1-Codex-Max represents a significant evolution in OpenAI's strategy toward agentic development tools, offering greater reasoning depth, token efficiency, and interactive capabilities across software engineering tasks. By extending its context management and compaction strategies, the model is positioned to handle tasks at the scale of full repositories, rather than individual files or snippets. With continued emphasis on agentic workflows, secure sandboxes, and real-world evaluation metrics, Codex-Max sets the stage for the next generation of AI-assisted programming environments -- while underscoring the importance of oversight in increasingly autonomous systems.
[3]
OpenAI says its new coding model can work for 24 hours straight
OpenAI has announced a new version of its programming-focused AI model, GPT-5.1-Codex-Max, which is designed to handle larger and more complex coding tasks. The new model will be available tomorrow for ChatGPT Plus, Pro, Business, Edu, and Enterprise users, replacing the previous GPT-5.1-Codex as the recommended model for agentic coding. The standout feature of Codex Max is its improved context handling through a process called "compaction." This allows the model to shrink or compress parts of a conversation or code context when its memory fills up, enabling it to "coherently work over millions of tokens in a single task." OpenAI claims this allows Codex Max to work on a single assignment for up to 24 hours, making it suitable for massive system-wide refactors. In addition to handling larger workloads, the model is more efficient. According to OpenAI, Codex Max maintains the same accuracy as its predecessor on the SWE-Bench Verified evaluation but uses 30% fewer thinking tokens and runs 27% to 42% faster on real-world tasks. This efficiency could translate to longer usage times for users on capped plans. The new model is also the first from OpenAI to be specifically trained for Windows environments, improving its collaboration capabilities in the Windows command-line interface (CLI). On the security front, OpenAI notes that Codex Max's sustained reasoning capabilities enhance its cybersecurity monitoring, though the company still recommends keeping the AI in a restricted-access mode with disabled network access to prevent prompt-injection risks.
[4]
ChatGPT 5.1 Codex Max : AI Coder Handles Massive PRs, Reviews & Debugging at Scale
What if your next coding partner could work tirelessly for 24 hours straight, never lose focus, and consistently deliver high-quality results? Enter OpenAI's new Codex Max, an AI model that redefines the boundaries of software development. Designed to tackle the most demanding workflows, this advanced model doesn't just assist, it transforms. From debugging intricate systems to managing multifile projects without breaking a sweat, Codex Max is engineered to handle the kind of complexity that would exhaust even the most seasoned developers. With its ability to maintain context and continuity over extended periods, this isn't just an upgrade; it's a paradigm shift for coding as we know it. In this overview Universe of AI explore how Codex Max is reshaping professional software development by combining unparalleled memory optimization with exceptional accuracy and cost efficiency. Discover how this AI marvel can generate cleaner code, streamline workflows, and even create professional-grade applications like solar system simulators or Kanban boards, all while reducing operational costs. Whether you're curious about its real-world applications or intrigued by its ability to sustain performance during 24-hour coding marathons, Codex Max offers insights into a future where developers and AI work seamlessly together. Could this be the tool that redefines your approach to coding? Codex Max is purpose-built to address the practical demands of modern software engineering. It excels in critical tasks such as code reviews, debugging, and generating pull requests (PRs) with exceptional accuracy. A defining feature of Codex Max is its ability to handle long-running workflows, including 24-hour coding sessions, without encountering memory limitations. This capability ensures seamless performance when managing multifile projects, performing large-scale refactoring, or working on intricate systems. By maintaining context and continuity over extended periods, Codex Max proves to be an invaluable asset for developers working on complex, real-world applications. Codex Max delivers an impressive 80% accuracy rate on coding benchmarks, representing a 14% improvement over its predecessor. This advancement translates into the generation of cleaner, more concise code, reducing the number of tokens and lines required for execution. These improvements not only accelerate reasoning and execution speeds but also enhance overall productivity. Developers can complete tasks more efficiently without incurring additional operational costs, making Codex Max a practical and cost-effective choice for software development teams. Learn more about OpenAI Codex with the help of our in-depth articles and helpful guides. One of the standout innovations in Codex Max is its advanced "compaction" process, which optimizes memory usage during extended workflows. This feature enables the model to summarize and discard unnecessary details while retaining essential information. By doing so, Codex Max ensures that developers can execute continuous operations without losing context, even in complex, long-duration tasks. This capability is particularly beneficial for scenarios such as agent loops, iterative development processes, and other workflows that demand sustained performance and contextual awareness. Codex Max has already demonstrated its versatility by creating professional-grade applications that are both functional and visually appealing. Examples include a solar system simulator, a Kanban board, and a Snell's law visualizer. These projects showcase the model's ability to meet high standards in software engineering, making it a reliable tool for developers tackling a wide range of challenges. Whether you're building interactive simulations, managing project workflows, or visualizing complex concepts, Codex Max provides the tools needed to achieve exceptional results. Integrating Codex Max into your development workflow is straightforward, thanks to its compatibility with a variety of tools and platforms. Developers can access the model through command-line interfaces (CLI), integrated development environment (IDE) extensions, and cloud platforms. OpenAI has also announced plans to expand API access, further enhancing the model's usability across diverse environments. These integration options are designed to simplify the development process, allowing you to focus on creating high-quality software while boosting productivity. Codex Max is designed not only for performance but also for affordability. By using 30% fewer tokens while maintaining or improving output quality, the model significantly reduces development costs. This efficiency is particularly valuable for developers and organizations looking to optimize both time and budget without compromising on results. Codex Max's cost-effective approach ensures that high-quality software development remains accessible to a broad range of users, from individual developers to large-scale enterprises. OpenAI's GPT 5.1 Codex Max represents a significant advancement in the field of software engineering. Its ability to handle extended workflows, optimize memory usage, and deliver cost-effective solutions positions it as an indispensable tool for developers. Whether you're debugging complex systems, automating repetitive tasks, or managing multifile projects, Codex Max enables you to achieve more with less effort. By combining innovative technology with practical applications, Codex Max sets a new benchmark for AI-powered coding solutions.
[5]
GPT-5.1 Codex Max explained: OpenAI's most powerful coding model yet
GPT-5.1 Codex Max transforms AI-assisted software development with full-stack reasoning If you've been watching the AI world this week, you probably noticed something interesting: OpenAI dropped GPT-5.1 Codex Max almost immediately after Google unveiled Antigravity, its agentic, developer-focused AI platform. It feels like the two biggest players in AI are now openly battling for the future of software development, and Codex Max is OpenAI's answer to Google's big swing. So let's walk through what this new model actually is, what makes it different, and why developers are already paying attention. Also read: Google Antigravity IDE explained: Free tool for code development Codex Max is OpenAI's newest and most advanced coding model, built on top of the GPT-5.1 architecture. Earlier Codex versions could write functions, generate boilerplate, or help with debugging. But Codex Max works at an entirely different scale. It can understand whole repositories, reason about architecture, and keep track of relationships across dozens of files at once. In simple terms: older models felt like autocomplete on steroids. Codex Max feels like pairing with a senior engineer who never loses context, even in a huge codebase. Google's Antigravity showed a future where AI agents manage development tasks across workspaces: editing code, generating components, running tests, and iterating like a teammate. Releasing Codex Max right after that is no coincidence. OpenAI clearly wants to show that it can match and in some cases exceed Google's agentic capabilities. Codex Max is designed to be the model behind OpenAI's own agent-driven development workflows, offering deep context, better reasoning, and more reliable code generation. So yes, this is absolutely part of an escalating AI race. The name is earned. Here's what stands out: Also read: Inside Snapdragon X2 Elite's Oryon: The CPU challenging Intel, AMD and Apple Instead of writing disconnected blocks of code, it produces work that looks like it was written by the same person. This is the part that feels most like OpenAI's counterpunch to Google Antigravity. Codex Max can work across your project like an intelligent teammate. You can say: "Refactor the authentication system, add 2FA support, and make sure it doesn't break existing sessions." And the model can navigate the repo, update the right files, create new modules where necessary, preserve logic and dependencies and explain everything it changed It's not just answering prompts, it's acting with a plan. This is the same direction Antigravity is pushing in, but Codex Max feels more like a model built for deep reasoning across code rather than task orchestration alone. Codex Max is noticeably better at analyzing long logs, finding root causes, suggesting minimal patches and avoiding overconfident but incorrect fixes It behaves less like a chatbot and more like a methodical engineer.The model understands practical constraints: deployment issues, API limits, performance bottlenecks, and security needs. So its answers feel more grounded, more production-oriented, and more like something a real dev team would implement. Codex Max won't replace you. But it will absolutely change your workflow. You'll likely find that you: Beginners get a patient tutor. Seniors get a force multiplier. Codex Max isn't just "another model." It's OpenAI stepping firmly into the agentic development race, right alongside Google's Antigravity. If GPT-4 Codex felt like an assistant, GPT-5.1 Codex Max feels like the first version of a true AI co-developer. And with Google and OpenAI now going head-to-head in real time, this space is about to accelerate faster than anyone expected.
Share
Share
Copy Link
OpenAI unveils GPT-5.1-Codex-Max, a breakthrough AI coding model featuring advanced compaction technology that enables continuous 24-hour development sessions while using 30% fewer tokens and delivering 27-42% faster performance than its predecessor.
OpenAI has announced the release of GPT-5.1-Codex-Max, a groundbreaking AI coding model that represents a significant leap forward in AI-assisted software development. The model's most notable innovation is its advanced "compaction" technology, which allows it to work continuously on complex coding tasks for up to 24 hours without losing context or performance
1
2
.
Source: Digit
The compaction process enables the model to shrink or compress portions of conversations and code context when approaching token limits, effectively allowing it to "coherently work over millions of tokens in a single task" . This breakthrough addresses a longstanding limitation in AI coding assistants, where context windows would become overwhelmed during large-scale development tasks.

Source: VentureBeat
GPT-5.1-Codex-Max demonstrates remarkable performance improvements across multiple benchmarks while maintaining exceptional efficiency. The model achieves an impressive 80% accuracy rate on coding benchmarks, representing a 14% improvement over its predecessor
4
. More significantly, it accomplishes this enhanced performance while using 30% fewer thinking tokens and running 27% to 42% faster on real-world coding tasks1
.
Source: Geeky Gadgets
In competitive evaluations, Codex-Max outperforms Google's recently released Gemini 3 Pro model on key coding benchmarks. On SWE-Bench Verified, GPT-5.1-Codex-Max achieved 77.9% accuracy compared to Gemini 3 Pro's 76.2%, while on Terminal-Bench 2.0, it scored 58.1% versus Gemini's 54.2% . The model also matched Gemini's score of 2,439 on LiveCodeBench Pro, a competitive coding benchmark.
The timing of Codex-Max's release appears strategically coordinated with Google's recent unveiling of Antigravity, its agentic developer-focused AI platform. Industry observers note that OpenAI's announcement came almost immediately after Google's reveal, suggesting an intensifying competition between the two AI giants for dominance in software development tools
5
.Codex-Max is designed to function as a persistent, high-context software development agent capable of managing complex refactors, debugging workflows, and project-scale tasks across multiple context windows. This positions it as OpenAI's direct answer to Google's agentic capabilities, offering deep context understanding, enhanced reasoning, and more reliable code generation .
Related Stories
The model excels in critical software engineering tasks including code reviews, debugging, and generating pull requests with exceptional accuracy. Codex-Max has already demonstrated its versatility by creating professional-grade applications such as solar system simulators, Kanban boards, and Snell's law visualizers
4
. These examples showcase the model's ability to meet high standards in software engineering across diverse application domains.Integration options include command-line interfaces, IDE extensions, and cloud platforms, with OpenAI announcing plans to expand API access in the near future. The model is currently available across Codex-based environments, including the Codex CLI and various integrated development tools .
Codex-Max represents the first OpenAI model specifically trained for Windows environments, improving collaboration capabilities within Windows command-line interfaces
3
. While the model doesn't meet OpenAI's "High" capability threshold for cybersecurity under its Preparedness Framework, it currently stands as the company's most capable cybersecurity model, supporting automated vulnerability detection and remediation with strict sandboxing and disabled network access by default .Summarized by
Navi
[2]
1
Science and Research

2
Technology

3
Policy and Regulation
