3 Sources
3 Sources
[1]
OpenAI's Codex Max solves one of my biggest AI coding annoyances - and adds dramatically faster performance
First Windows-trained Codex enhances cross-platform development tasks. Following a week of major AI programming announcements from Microsoft and Google, OpenAI has joined in the fun. Today, OpenAI is announcing a new version of Codex, its programming-focused AI model. While the announcement was today, the actual GPT-5.1-Codex-Max capability will be available tomorrow in Codex for ChatGPT Plus, Pro, Business, Edu, and Enterprise users. API access is "coming soon." OpenAI says the new Max model "replaces GPT-5.1-Codex as the recommended model for agentic coding tasks in Codex and Codex-like environments." Also: Linus Torvalds is surprisingly optimistic about vibe coding - except for this one 'horrible' use (Disclosure: Ziff Davis, ZDNET's parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.) The big news is that the new Max model can work on bigger assignments. AIs have a context window, which is roughly the amount of information and processing an AI can handle in one shot. In a human, think of it as attention span, or as how much work somebody can get done before needing a new cup of coffee. Also: You'll code faster and spend less with OpenAI's new GPT-5.1 update - here's how Internally, the size of a context window is really how many tokens an AI can handle before running out. Tokens are super-tiny chunks of information. They don't directly correspond to words, letters, or lines of code, but instead are memory representations of those things. Codex has a fairly large context window, but it does get overwhelmed. For example, I've found that when I coded using Codex, it could do very large project assignments without crying. But if I fed it a giant dump from a code crash with a ton of text in the dump, it ran out of context window fairly quickly. That's because the tokens weren't being consumed in project processing. They were being consumed in handling the big data dump. The above-the-fold feature of GPT-5.1-Codex-Max is that it can handle much larger context windows and operate across context windows by using a process called compaction. Compaction is the process that a model can use to shrink or compress portions of the conversation or code context when the overall token window is getting full. You know how when you're talking and talking and talking to a friend and their eyes glaze over, but then you clap your hands together, exclaim "Snap out of it," and regain their attention? That's compaction. What? It can't just be me. Also: The best free AI for coding in 2025 - only 3 make the cut now In the AI, it means that Codex Max can work on much larger tasks, like very complex systemwide refactors (finding, changing, and fixing cross-references). It also allows the AI to work on a single task for hours at a time. OpenAI says Codex can handle a 24-hour task. Compaction isn't new. I've bumped into it in Claude Code on my $100/month Max plan. One difference is Claude has a context window of about 200,000 tokens. At one point during my coding, Claude informed me we had used up quite a bit and recommended either I start a new session or it would run a compaction, which took about five minutes. By contrast, OpenAI says Max can "coherently work over millions of tokens in a single task." The SWE-Bench Verified evaluation is basically a test of AI processing accuracy. It measures how well the AI solves a set of coding problems. According to OpenAI, GPT-5.1-Codex-Max "reaches the same performance" as the previous model, GPT-5.1-Codex. Also: Google's Antigravity puts coding productivity before AI hype - and the result is astonishing In other words, the AI didn't deteriorate in its AIness. But where it gets interesting is that Max can sustain that performance using 30% fewer thinking tokens and run 27% to 42% faster on real-world coding tasks. In my imagination, I picture some engineer over at OpenAI raising a fist and exclaiming, "Darn it, I was trying for 43%, but no." This has some real-world implications. You may recall that the $20/month ChatGPT Plus plan has fairly high restrictions on Codex use, allowing about 5 hours of run before running out of tokens. With Max using 30% fewer tokens, you might get an extra hour of programming in for the same price. OpenAI provided some examples of model performance compared to the non-Max version. In one example, Max used 27,000 tokens compared to 37,000, generated 707 lines of code instead of 864, and ran 27% faster. Also: How to vibe code your first iPhone app with AI - no experience necessary Let's take a moment to focus on that lines of code mention. If you can get code to work in fewer lines of code, it's usually easier to maintain and often runs faster. While you can get crazy making concise code (I'm looking at the few remaining Perl coders out there), fewer lines for the same routine is generally a measure of better programming practice or better algorithms. So if Codex is saving lines, that's generally a good thing. Let's look at a few other examples: Obviously, every task will be different, but faster, better, cheaper is always good. Ever since GPT-5 was released earlier in the year, OpenAI incorporated cybersecurity-specific monitoring to detect and disrupt malicious activity. As you might imagine, if you're letting your agent run free with access to the command line for hours and hours, it could be a juicy target for hackers. OpenAI says the GPT-5.1-Codex-Max model performs "significantly better" on sustained, "long-horizon" reasoning. This sustained performance helps the model improve on cybersecurity as well. Also: How to use ChatGPT to write code - and my top trick for debugging what it generates Codex runs in a secure sandbox where file writing can only take place in a defined workspace and network access is disabled, unless a coder decides to dance with the Devil in the pale moonlight and turn it on. The company says, "We recommend keeping Codex in this restricted-access mode, since enabling internet or web search can introduce prompt-injection risks from untrusted content." Codex works really well on the Mac. It was trained to do so. Many OpenAI developers use Macs for coding. However, GPT-5.1-Codex-Max also works well in Windows. OpenAI reports, "It's also the first model we've trained to operate effectively in Windows environments, with training tasks that make it a better collaborator in the Codex CLI." Also: Is ChatGPT Plus still worth $20? How it compares to the Free and Pro plans Considering OpenAI's growing relationship with Microsoft, it makes sense that OpenAI would give Windows a little more love. Well, that wraps it for this announcement. Also: 10 ChatGPT Codex secrets I only learned after 60 hours of pair programming with it As Codex Max pushes into larger context windows, long-running tasks, and new Windows-specific training, what stands out to you? Do you see compaction and multi-million-token workflows changing how you approach big coding projects? Are the speed and token-efficiency gains enough to shift your everyday development? And if you use Windows, are you planning to try the new Windows-trained model in your workflow? Let me know what you think in the comments below.
[2]
OpenAI debuts GPTβ5.1-Codex-Max coding model that completed a 24-hour task internally
OpenAI has introduced GPTβ5.1-Codex-Max, a new frontier agentic coding model now available in its Codex developer environment. The release marks a significant step forward in AI-assisted software engineering, offering improved long-horizon reasoning, efficiency, and real-time interactive capabilities. GPTβ5.1-Codex-Max will now replace GPTβ5.1-Codex as the default model across Codex-integrated surfaces. The new model is designed to serve as a persistent, high-context software development agent, capable of managing complex refactors, debugging workflows, and project-scale tasks across multiple context windows. It comes on the heels of Google releasing its powerful new Gemini 3 Pro model yesterday, yet still outperforms or matches it on key coding benchmarks: On SWE-Bench Verified, GPTβ5.1-Codex-Max achieved 77.9% accuracy at extra-high reasoning effort, edging past Gemini 3 Pro's 76.2%. It also led on Terminal-Bench 2.0, with 58.1% accuracy versus Gemini's 54.2%, and matched Gemini's score of 2,439 on LiveCodeBench Pro, a competitive coding Elo benchmark. When measured against Gemini 3 Pro's most advanced configuration -- its Deep Thinking model -- Codex-Max holds a slight edge in agentic coding benchmarks, as well. Performance Benchmarks: Incremental Gains Across Key Tasks GPTβ5.1-Codex-Max demonstrates measurable improvements over GPTβ5.1-Codex across a range of standard software engineering benchmarks. On SWE-Lancer IC SWE, it achieved 79.9% accuracy, a significant increase from GPTβ5.1-Codex's 66.3%. In SWE-Bench Verified (n=500), it reached 77.9% accuracy at extra-high reasoning effort, outperforming GPTβ5.1-Codex's 73.7%. Performance on Terminal Bench 2.0 (n=89) showed more modest improvements, with GPTβ5.1-Codex-Max achieving 58.1% accuracy compared to 52.8% for GPTβ5.1-Codex. All evaluations were run with compaction and extra-high reasoning effort enabled. These results indicate that the new model offers a higher ceiling on both benchmarked correctness and real-world usability under extended reasoning loads. Technical Architecture: Long-Horizon Reasoning via Compaction A major architectural improvement in GPTβ5.1-Codex-Max is its ability to reason effectively over extended input-output sessions using a mechanism called compaction. This enables the model to retain key contextual information while discarding irrelevant details as it nears its context window limit -- effectively allowing for continuous work across millions of tokens without performance degradation. The model has been internally observed to complete tasks lasting more than 24 hours, including multi-step refactors, test-driven iteration, and autonomous debugging. Compaction also improves token efficiency. At medium reasoning effort, GPTβ5.1-Codex-Max used approximately 30% fewer thinking tokens than GPTβ5.1-Codex for comparable or better accuracy, which has implications for both cost and latency. Platform Integration and Use Cases GPTβ5.1-Codex-Max is currently available across multiple Codex-based environments, which refer to OpenAI's own integrated tools and interfaces built specifically for code-focused AI agents. These include: * Codex CLI, OpenAI's official command-line tool (@openai/codex), where GPTβ5.1-Codex-Max is already live. * IDE extensions, likely developed or maintained by OpenAI, though no specific third-party IDE integrations were named. * Interactive coding environments, such as those used to demonstrate frontend simulation apps like CartPole or Snell's Law Explorer. * Internal code review tooling, used by OpenAI's engineering teams. For now, GPTβ5.1-Codex-Max is not yet available via public API, though OpenAI states this is coming soon. Users who wish to work with the model in terminal environments today can do so by installing and using the Codex CLI. It is not currently confirmed whether or how the model will integrate into third-party IDEs unless they are built on top of the CLI or future API. The model is capable of interacting with live tools and simulations. Examples shown in the release include: * An interactive CartPole policy gradient simulator, which visualizes reinforcement learning training and activations. * A Snell's Law optics explorer, supporting dynamic ray tracing across refractive indices. These interfaces exemplify the model's ability to reason in real time while maintaining an interactive development session -- effectively bridging computation, visualization, and implementation within a single loop. Cybersecurity and Safety Constraints While GPTβ5.1-Codex-Max does not meet OpenAI's "High" capability threshold for cybersecurity under its Preparedness Framework, it is currently the most capable cybersecurity model OpenAI has deployed. It supports use cases such as automated vulnerability detection and remediation, but with strict sandboxing and disabled network access by default. OpenAI reports no increase in scaled malicious use but has introduced enhanced monitoring systems, including activity routing and disruption mechanisms for suspicious behavior. Codex remains isolated to a local workspace unless developers opt-in to broader access, mitigating risks like prompt injection from untrusted content. Deployment Context and Developer Usage GPTβ5.1-Codex-Max is currently available to users on ChatGPT Plus, Pro, Business, Edu, and Enterprise plans. It will also become the new default in Codex-based environments, replacing GPTβ5.1-Codex, which was a more general-purpose model. OpenAI states that 95% of its internal engineers use Codex weekly, and since adoption, these engineers have shipped ~70% more pull requests on average -- highlighting the tool's impact on internal development velocity. Despite its autonomy and persistence, OpenAI stresses that Codex-Max should be treated as a coding assistant, not a replacement for human review. The model produces terminal logs, test citations, and tool call outputs to support transparency in generated code. Outlook GPTβ5.1-Codex-Max represents a significant evolution in OpenAI's strategy toward agentic development tools, offering greater reasoning depth, token efficiency, and interactive capabilities across software engineering tasks. By extending its context management and compaction strategies, the model is positioned to handle tasks at the scale of full repositories, rather than individual files or snippets. With continued emphasis on agentic workflows, secure sandboxes, and real-world evaluation metrics, Codex-Max sets the stage for the next generation of AI-assisted programming environments -- while underscoring the importance of oversight in increasingly autonomous systems.
[3]
GPT-5.1 Codex Max explained: OpenAI's most powerful coding model yet
GPT-5.1 Codex Max transforms AI-assisted software development with full-stack reasoning If you've been watching the AI world this week, you probably noticed something interesting: OpenAI dropped GPT-5.1 Codex Max almost immediately after Google unveiled Antigravity, its agentic, developer-focused AI platform. It feels like the two biggest players in AI are now openly battling for the future of software development, and Codex Max is OpenAI's answer to Google's big swing. So let's walk through what this new model actually is, what makes it different, and why developers are already paying attention. Also read: Google Antigravity IDE explained: Free tool for code development Codex Max is OpenAI's newest and most advanced coding model, built on top of the GPT-5.1 architecture. Earlier Codex versions could write functions, generate boilerplate, or help with debugging. But Codex Max works at an entirely different scale. It can understand whole repositories, reason about architecture, and keep track of relationships across dozens of files at once. In simple terms: older models felt like autocomplete on steroids. Codex Max feels like pairing with a senior engineer who never loses context, even in a huge codebase. Google's Antigravity showed a future where AI agents manage development tasks across workspaces: editing code, generating components, running tests, and iterating like a teammate. Releasing Codex Max right after that is no coincidence. OpenAI clearly wants to show that it can match and in some cases exceed Google's agentic capabilities. Codex Max is designed to be the model behind OpenAI's own agent-driven development workflows, offering deep context, better reasoning, and more reliable code generation. So yes, this is absolutely part of an escalating AI race. The name is earned. Here's what stands out: Also read: Inside Snapdragon X2 Elite's Oryon: The CPU challenging Intel, AMD and Apple Instead of writing disconnected blocks of code, it produces work that looks like it was written by the same person. This is the part that feels most like OpenAI's counterpunch to Google Antigravity. Codex Max can work across your project like an intelligent teammate. You can say: "Refactor the authentication system, add 2FA support, and make sure it doesn't break existing sessions." And the model can navigate the repo, update the right files, create new modules where necessary, preserve logic and dependencies and explain everything it changed It's not just answering prompts, it's acting with a plan. This is the same direction Antigravity is pushing in, but Codex Max feels more like a model built for deep reasoning across code rather than task orchestration alone. Codex Max is noticeably better at analyzing long logs, finding root causes, suggesting minimal patches and avoiding overconfident but incorrect fixes It behaves less like a chatbot and more like a methodical engineer.The model understands practical constraints: deployment issues, API limits, performance bottlenecks, and security needs. So its answers feel more grounded, more production-oriented, and more like something a real dev team would implement. Codex Max won't replace you. But it will absolutely change your workflow. You'll likely find that you: Beginners get a patient tutor. Seniors get a force multiplier. Codex Max isn't just "another model." It's OpenAI stepping firmly into the agentic development race, right alongside Google's Antigravity. If GPT-4 Codex felt like an assistant, GPT-5.1 Codex Max feels like the first version of a true AI co-developer. And with Google and OpenAI now going head-to-head in real time, this space is about to accelerate faster than anyone expected.
Share
Share
Copy Link
OpenAI unveils GPT-5.1 Codex Max, a breakthrough AI coding model featuring advanced compaction technology, million-token context handling, and 30% improved efficiency. The release directly challenges Google's Antigravity platform in the escalating AI development race.
OpenAI has unveiled GPT-5.1 Codex Max, a groundbreaking AI coding model that addresses one of the most persistent challenges in AI-assisted programming: context window limitations
1
. The new model introduces a sophisticated compaction mechanism that allows it to "coherently work over millions of tokens in a single task," representing a dramatic leap from traditional AI coding assistants2
.
Source: Digit
The compaction process enables Codex Max to shrink or compress portions of conversation or code context when the overall token window approaches capacity, similar to how humans might refocus attention during lengthy conversations
1
. This breakthrough has been internally demonstrated through tasks lasting more than 24 hours, including multi-step refactors, test-driven iteration, and autonomous debugging2
.GPT-5.1 Codex Max delivers substantial performance improvements across multiple coding benchmarks while maintaining accuracy standards. On SWE-Bench Verified, the model achieved 77.9% accuracy at extra-high reasoning effort, surpassing Google's Gemini 3 Pro at 76.2%
2
. The model also demonstrated superior performance on Terminal-Bench 2.0 with 58.1% accuracy versus Gemini's 54.2%.Perhaps more significantly for practical applications, Codex Max operates with remarkable efficiency improvements. The model uses approximately 30% fewer thinking tokens than its predecessor while running 27% to 42% faster on real-world coding tasks
1
. In one documented example, Max used 27,000 tokens compared to 37,000 for the previous version, generated 707 lines of code instead of 864, and completed tasks 27% faster1
.The timing of Codex Max's release appears strategically calculated, arriving immediately after Google unveiled its Antigravity agentic development platform
3
. This represents an escalating competition between the two AI giants for dominance in software development assistance, with both companies pushing toward agentic AI capabilities that can manage complex, multi-step development workflows.
Source: ZDNet
Unlike previous coding models that functioned primarily as sophisticated autocomplete tools, Codex Max operates more like "pairing with a senior engineer who never loses context, even in a huge codebase"
3
. The model can understand entire repositories, reason about architecture, and maintain relationships across dozens of files simultaneously, representing a fundamental shift toward true AI co-development capabilities.Related Stories
GPT-5.1 Codex Max is currently available across multiple Codex-based environments, including the Codex CLI, IDE extensions, and interactive coding environments
2
. The model will be accessible tomorrow for ChatGPT Plus, Pro, Business, Edu, and Enterprise users, with API access coming soon1
.The model demonstrates advanced capabilities in interactive development sessions, including real-time tool interaction and simulation management. Examples include an interactive CartPole policy gradient simulator for reinforcement learning visualization and a Snell's Law optics explorer supporting dynamic ray tracing
2
. These capabilities bridge computation, visualization, and implementation within single development loops.While GPT-5.1 Codex Max represents OpenAI's most capable cybersecurity model to date, it operates under strict safety constraints. The model supports automated vulnerability detection and remediation but functions with mandatory sandboxing and disabled network access by default
2
. OpenAI reports no increase in scaled malicious use and has implemented enhanced monitoring systems to maintain security standards.Summarized by
Navi
[2]