3 Sources
3 Sources
[1]
A new version of OpenAI's Codex is powered by a new dedicated chip | TechCrunch
On Thursday, OpenAI announced the release of a light-weight version of its agentic coding tool Codex, the latest model of which OpenAI launched earlier this month. GPT-5.3-Codex-Spark is described by the company as a "smaller version" of that model, one that is designed for faster inference. To power that inference, OpenAI has brought in a dedicated chip from its hardware partner Cerebras, marking a new level of integration in the company's physical infrastructure. The partnership between Cerebras and OpenAI was announced last month, when OpenAI said that it had reached a multi-year agreement with the firm worth over $10 billion. "Integrating Cerebras into our mix of compute solutions is all about making our AI respond much faster," the company said at the time. Now, OpenAI calls Spark the "first milestone" in that relationship. Spark, which OpenAI says is designed for swift, real-time collaboration and "rapid iteration," will be powered by Cerebras' Wafer Scale Engine 3. The WSE-3 is Cerebras' third-generation waferscale megachip, decked out with 4 trillion transistors. OpenAI describes the new lightweight tool as a "daily productivity driver, helping users with rapid prototyping" rather than the longer, heavier tasks that the original 5.3 is designed for. Spark is currently enjoying a research preview for ChatGPT Pro users in the Codex app. In a tweet in advance of the announcement, CEO Sam Altman seemed to hint at the new model. "We have a special thing launching to Codex users on the Pro plan later today," Altman tweeted. "It sparks joy for me." In its official statement, OpenAI emphasized Spark as designed for the lowest possible latency on Codex. "Codex-Spark is the first step toward a Codex that works in two complementary modes: real-time collaboration when you want rapid iteration, and long-running tasks when you need deeper reasoning and execution," OpenAI shared. The company added that Cerebras' chips excel at assisting "workflows that demand extremely low latency." Cerebras has been around for over a decade but, in the AI era, it has enjoyed an increasingly prominent role in the tech industry. Just last week, the company announced that it had raised $1 billion in fresh capital at a valuation of $23 billion. The company has previously announced its intentions to pursue an IPO. "What excites us most about GPT-5.3-Codex-Spark is partnering with OpenAI and the developer community to discover what fast inference makes possible -- new interaction patterns, new use cases, and a fundamentally different model experience," Sean Lie, CTO and Co-Founder of Cerebras, said in a statement. "This preview is just the beginning."
[2]
OpenAI's new Spark model codes 15x faster than GPT-5.3-Codex - but there's a catch
Runs on Cerebras WSE-3 chips for a latency-first Codex serving tier. The Codex team at OpenAI is on fire. Less than two weeks after releasing a dedicated agent-based Codex app for Macs, and only a week after releasing the faster and more steerable GPT-5.3-Codex language model, OpenAI is counting on lightning striking for a third time. Also: OpenAI's new GPT-5.3-Codex is 25% faster and goes way beyond coding now - what's new Today, the company has announced a research preview of GPT-5.3-Codex-Spark, a smaller version of GPT-5.3-Codex built for real-time coding in Codex. The company reports that it generates code 15 times faster while "remaining highly capable for real-world coding tasks." There is a catch, and I'll talk about that in a minute. Also: OpenAI's Codex just got its own Mac app - and anyone can try it for free now Codex-Spark will initially be available only to $200/mo Pro tier users, with separate rate limits during the preview period. If it follows OpenAI's usual release strategy for Codex releases, Plus users will be next, with other tiers gaining access fairly quickly. (Disclosure: Ziff Davis, ZDNET's parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.) OpenAI says Codex-Spark is its "first model designed specifically for working with Codex in real-time -- making targeted edits, reshaping logic, or refining interfaces and seeing results immediately." Let's deconstruct this briefly. Most agentic AI programming tools take a while to respond to instructions. In my programming work, I can give an instruction (and this applies to both Codex and Claude Code) and go off and work on something else for a while. Sometimes it's just a few minutes. Other times, it can be long enough to get lunch. Also: I got 4 years of product development done in 4 days for $200, and I'm still stunned Codex-Spark is apparently able to respond much faster, allowing for quick and continuous work. This could speed up development considerably, especially for simpler prompts and queries. I know that I've been occasionally frustrated when I've asked an AI a super simple question that should have generated an immediate response, but instead I still had to wait five minutes for an answer. By making responsiveness a core feature, the model supports more fluid, conversational coding. Sometimes, using coding agents feels more like old-school batch style coding. This is designed to overcome that feeling. GPT-5.3-Codex-Spark isn't intended to replace the base GPT-5.3-Codex. Instead, Spark was designed to complement high-performance AI models built for long-running, autonomous tasks lasting hours, days, or weeks. The Codex-Spark model is intended for work where responsiveness matters as much as intelligence. It supports interruption and redirection mid-task, enabling tight iteration loops. This is something that appeals to me, because I always think of something more to tell the AI ten seconds after I've given it an assignment. Also: I used Claude Code to vibe code a Mac app in 8 hours, but it was more work than magic The Spark model defaults to lightweight, targeted edits, making quick tweaks rather than taking big swings. It also doesn't automatically run tests unless requested. OpenAI has been able to reduce latency (faster turnaround) across the full request-response pipeline. It says that overhead per client/server roundtrip has been reduced by 80%. Per-token overhead has been reduced by 30%. The time-to-first-token has been reduced by 50% through session initialization and streaming optimizations. Another mechanism that improves responsiveness during iteration is the introduction of a persistent WebSocket connection, so the connection doesn't have to continually be renegotiated. In January, OpenAI announced a partnership with AI chipmaker Cerebras. We've been covering Cerebras for a while. We've covered its inference service, its work with DeepSeek, its work boosting the performance of Meta's Llama model, and Cerebras' announcement of a really big AI chip, meant to double LLM performance. GPT-5.3-Codex-Spark is the first milestone for the OpenAI/Cerebras partnership announced last month. The Spark model runs on Cerebras' Wafer Scale Engine 3, which is a high-performance AI chip architecture that boosts speed by putting all the compute resources on a single wafer-scale processor the size of a pancake. Also: 7 ChatGPT settings tweaks that I can no longer work without - and I'm a power user Usually, a semiconductor wafer contains a whole bunch of processors, which later in the production process get cut apart and put into their own packaging. The Cerebras wafer contains just one chip, making it a very, very big processor with very, very closely coupled connections. According to Sean Lie, CTO and co-founder of Cerebras, "What excites us most about GPT-5.3-Codex-Spark is partnering with OpenAI and the developer community to discover what fast inference makes possible -- new interaction patterns, new use cases, and a fundamentally different model experience. This preview is just the beginning." Now, here are the gotchas. First, OpenAI says that "when demand is high, you may see slower access or temporary queuing as we balance reliability across users." So, fast, unless too many people want to go fast. Here's the kicker. The company says, "On SWE-Bench Pro and Terminal-Bench 2.0, two benchmarks evaluating agentic software engineering capability, GPT-5.3-Codex-Spark underperforms GPT-5.3-Codex, but can accomplish the tasks in a fraction of the time." Last week, in the GPT-5.3-Codex announcement, OpenAI said that GPT-5.3-Codex was the first model it classifies as "high capability" for cybersecurity, according to its published Preparedness Framework. On the other hand, the company admitted that GPT-5.3-Codex-Spark "does not have a plausible chance of reaching our Preparedness Framework threshold for high capability in cybersecurity." Also: I stopped using ChatGPT for everything: These AI models beat it at research, coding, and more Think on these statements, dear reader. This AI isn't as smart, but it does do those not-as-smart things a lot faster. 15x speed is certainly nothing to sneeze at. But do you really want an AI to make coding mistakes 15 times faster and produce code that is less secure? Let me tell you this. "Eh, it's good enough" isn't really good enough when you have thousands of pissed off users coming at you with torches and pitchforks because you suddenly broke their software with a new release. Ask me how I know. Last week, we learned that OpenAI uses Codex to write Codex. We also know that it uses it to be able to build code much faster. So the company clearly has a use case for something that's way faster, but not as smart. As I get a better handle on what that is and where Spark fits, I'll let you know. OpenAI shared that it is working toward dual modes of reasoning and real-time work for its Codex models. The company says, "Codex-Spark is the first step toward a Codex with two complementary modes: longer-horizon reasoning and execution, and real-time collaboration for rapid iteration. Over time, the modes will blend." The workflow model it envisions is interesting. According to OpenAI, the intent is that eventually "Codex can keep you in a tight interactive loop while delegating longer-running work to sub-agents in the background, or fanning out tasks to many models in parallel when you want breadth and speed, so you don't have to choose a single mode up front." Also: I tried a Claude Code rival that's local, open source, and completely free - how it went Essentially, it's working toward the best of both worlds. But for now, you can choose fast or accurate. That's a tough choice. But the accurate is getting more accurate, and now, at least, you can opt for fast when you want it (as long as you keep the trade-offs in mind and you're paying for the Pro tier). What about you? Would you trade some intelligence and security capability for 15x faster coding responses? Does the idea of a real-time, interruptible AI collaborator appeal to you, or do you prefer a more deliberate, higher-accuracy model for serious development work? How concerned are you about the cybersecurity distinction between Codex-Spark and the full GPT-5.3-Codex model? And if you're a Pro user, do you see yourself switching between "fast" and "smart" modes depending on the task? Let us know in the comments below.
[3]
OpenAI deploys Cerebras chips for 15x faster code generation in first major move beyond Nvidia
OpenAI on Thursday launched GPT-5.3-Codex-Spark, a stripped-down coding model engineered for near-instantaneous response times, marking the company's first significant inference partnership outside its traditional Nvidia-dominated infrastructure. The model runs on hardware from Cerebras Systems, a Sunnyvale-based chipmaker whose wafer-scale processors specialize in low-latency AI workloads. The partnership arrives at a pivotal moment for OpenAI. The company finds itself navigating a frayed relationship with longtime chip supplier Nvidia, mounting criticism over its decision to introduce advertisements into ChatGPT, a newly announced Pentagon contract, and internal organizational upheaval that has seen a safety-focused team disbanded and at least one researcher resign in protest. "GPUs remain foundational across our training and inference pipelines and deliver the most cost effective tokens for broad usage," an OpenAI spokesperson told VentureBeat. "Cerebras complements that foundation by excelling at workflows that demand extremely low latency, tightening the end-to-end loop so use cases such as real-time coding in Codex feel more responsive as you iterate." The careful framing -- emphasizing that GPUs "remain foundational" while positioning Cerebras as a "complement" -- underscores the delicate balance OpenAI must strike as it diversifies its chip suppliers without alienating Nvidia, the dominant force in AI accelerators. Codex-Spark represents OpenAI's first model purpose-built for real-time coding collaboration. The company claims the model delivers generation speeds 15 times faster than its predecessor, though it declined to provide specific latency metrics such as time-to-first-token or tokens-per-second figures. "We aren't able to share specific latency numbers, however Codex-Spark is optimized to feel near-instant -- delivering 15x faster generation speeds while remaining highly capable for real-world coding tasks," the OpenAI spokesperson said. The speed gains come with acknowledged capability tradeoffs. On SWE-Bench Pro and Terminal-Bench 2.0 -- two industry benchmarks that evaluate AI systems' ability to perform complex software engineering tasks autonomously -- Codex-Spark underperforms the full GPT-5.3-Codex model. OpenAI positions this as an acceptable exchange: developers get responses fast enough to maintain creative flow, even if the underlying model cannot tackle the most sophisticated multi-step programming challenges. The model launches with a 128,000-token context window and supports text only -- no image or multimodal inputs. OpenAI has made it available as a research preview to ChatGPT Pro subscribers through the Codex app, command-line interface, and Visual Studio Code extension. A small group of enterprise partners will receive API access to evaluate integration possibilities. "We are making Codex-Spark available in the API for a small set of design partners to understand how developers want to integrate Codex-Spark into their products," the spokesperson explained. "We'll expand access over the coming weeks as we continue tuning our integration under real workloads." The technical architecture behind Codex-Spark tells a story about inference economics that increasingly matters as AI companies scale consumer-facing products. Cerebras's Wafer Scale Engine 3 -- a single chip roughly the size of a dinner plate containing 4 trillion transistors -- eliminates much of the communication overhead that occurs when AI workloads spread across clusters of smaller processors. For training massive models, that distributed approach remains necessary and Nvidia's GPUs excel at it. But for inference -- the process of generating responses to user queries -- Cerebras argues its architecture can deliver results with dramatically lower latency. Sean Lie, Cerebras's CTO and co-founder, framed the partnership as an opportunity to reshape how developers interact with AI systems. "What excites us most about GPT-5.3-Codex-Spark is partnering with OpenAI and the developer community to discover what fast inference makes possible -- new interaction patterns, new use cases, and a fundamentally different model experience," Lie said in a statement. "This preview is just the beginning." OpenAI's infrastructure team did not limit its optimization work to the Cerebras hardware. The company announced latency improvements across its entire inference stack that benefit all Codex models regardless of underlying hardware, including persistent WebSocket connections and optimizations within the Responses API. The results: 80 percent reduction in overhead per client-server round trip, 30 percent reduction in per-token overhead, and 50 percent reduction in time-to-first-token. The Cerebras partnership takes on additional significance given the increasingly complicated relationship between OpenAI and Nvidia. Last fall, when OpenAI announced its Stargate infrastructure initiative, Nvidia publicly committed to investing $100 billion to support OpenAI as it built out AI infrastructure. The announcement appeared to cement a strategic alliance between the world's most valuable AI company and its dominant chip supplier. Five months later, that megadeal has effectively stalled, according to multiple reports. Nvidia CEO Jensen Huang has publicly denied tensions, telling reporters in late January that there is "no drama" and that Nvidia remains committed to participating in OpenAI's current funding round. But the relationship has cooled considerably, with friction stemming from multiple sources. OpenAI has aggressively pursued partnerships with alternative chip suppliers, including the Cerebras deal and separate agreements with AMD and Broadcom. From Nvidia's perspective, OpenAI may be using its influence to commoditize the very hardware that made its AI breakthroughs possible. From OpenAI's perspective, reducing dependence on a single supplier represents prudent business strategy. "We will continue working with the ecosystem on evaluating the most price-performant chips across all use cases on an ongoing basis," OpenAI's spokesperson told VentureBeat. "GPUs remain our priority for cost-sensitive and throughput-first use cases across research and inference." The statement reads as a careful effort to avoid antagonizing Nvidia while preserving flexibility -- and reflects a broader reality that training frontier AI models still requires exactly the kind of massive parallel processing that Nvidia GPUs provide. The Codex-Spark launch comes as OpenAI navigates a series of internal challenges that have intensified scrutiny of the company's direction and values. Earlier this week, reports emerged that OpenAI disbanded its mission alignment team, a group established in September 2024 to promote the company's stated goal of ensuring artificial general intelligence benefits humanity. The team's seven members have been reassigned to other roles, with leader Joshua Achiam given a new title as OpenAI's "chief futurist." OpenAI previously disbanded another safety-focused group, the superalignment team, in 2024. That team had concentrated on long-term existential risks from AI. The pattern of dissolving safety-oriented teams has drawn criticism from researchers who argue that OpenAI's commercial pressures are overwhelming its original non-profit mission. The company also faces fallout from its decision to introduce advertisements into ChatGPT. Researcher Zoë Hitzig resigned this week over what she described as the "slippery slope" of ad-supported AI, warning in a New York Times essay that ChatGPT's archive of intimate user conversations creates unprecedented opportunities for manipulation. Anthropic seized on the controversy with a Super Bowl advertising campaign featuring the tagline: "Ads are coming to AI. But not to Claude." Separately, the company agreed to provide ChatGPT to the Pentagon through Genai.mil, a new Department of Defense program that requires OpenAI to permit "all lawful uses" without company-imposed restrictions -- terms that Anthropic reportedly rejected. And reports emerged that Ryan Beiermeister, OpenAI's vice president of product policy who had expressed concerns about a planned explicit content feature, was terminated in January following a discrimination allegation she denies. Despite the surrounding turbulence, OpenAI's technical roadmap for Codex suggests ambitious plans. The company envisions a coding assistant that seamlessly blends rapid-fire interactive editing with longer-running autonomous tasks -- an AI that handles quick fixes while simultaneously orchestrating multiple agents working on more complex problems in the background. "Over time, the modes will blend -- Codex can keep you in a tight interactive loop while delegating longer-running work to sub-agents in the background, or fanning out tasks to many models in parallel when you want breadth and speed, so you don't have to choose a single mode up front," the OpenAI spokesperson told VentureBeat. This vision would require not just faster inference but sophisticated task decomposition and coordination across models of varying sizes and capabilities. Codex-Spark establishes the low-latency foundation for the interactive portion of that experience; future releases will need to deliver the autonomous reasoning and multi-agent coordination that would make the full vision possible. For now, Codex-Spark operates under separate rate limits from other OpenAI models, reflecting constrained Cerebras infrastructure capacity during the research preview. "Because it runs on specialized low-latency hardware, usage is governed by a separate rate limit that may adjust based on demand during the research preview," the spokesperson noted. The limits are designed to be "generous," with OpenAI monitoring usage patterns as it determines how to scale. The Codex-Spark announcement arrives amid intense competition for AI-powered developer tools. Anthropic's Claude Cowork product triggered a selloff in traditional software stocks last week as investors considered whether AI assistants might displace conventional enterprise applications. Microsoft, Google, and Amazon continue investing heavily in AI coding capabilities integrated with their respective cloud platforms. OpenAI's Codex app has demonstrated rapid adoption since launching ten days ago, with more than one million downloads and weekly active users growing 60 percent week-over-week. More than 325,000 developers now actively use Codex across free and paid tiers. But the fundamental question facing OpenAI -- and the broader AI industry -- is whether speed improvements like those promised by Codex-Spark translate into meaningful productivity gains or merely create more pleasant experiences without changing outcomes. Early evidence from AI coding tools suggests that faster responses encourage more iterative experimentation. Whether that experimentation produces better software remains contested among researchers and practitioners alike. What seems clear is that OpenAI views inference latency as a competitive frontier worth substantial investment, even as that investment takes it beyond its traditional Nvidia partnership into untested territory with alternative chip suppliers. The Cerebras deal is a calculated bet that specialized hardware can unlock use cases that general-purpose GPUs cannot cost-effectively serve. For a company simultaneously battling competitors, managing strained supplier relationships, and weathering internal dissent over its commercial direction, it is also a reminder that in the AI race, standing still is not an option. OpenAI built its reputation by moving fast and breaking conventions. Now it must prove it can move even faster -- without breaking itself.
Share
Share
Copy Link
OpenAI has released GPT-5.3-Codex-Spark, a lightweight version of its AI coding tool designed for near-instantaneous responses. The model runs on Cerebras' Wafer Scale Engine 3 chips, delivering code generation 15 times faster than its predecessor. This marks OpenAI's first major inference partnership beyond Nvidia, signaling a strategic shift in its hardware infrastructure.
OpenAI has launched GPT-5.3-Codex-Spark, a streamlined version of its agentic AI coding tool designed specifically for faster code generation and real-time coding collaboration
1
. The new language model generates code 15 times faster than the full GPT-5.3-Codex while maintaining capability for real-world coding tasks2
. Described as a "smaller version" optimized for swift inference, Codex-Spark represents the first milestone in the OpenAI and Cerebras partnership announced last month1
.
Source: ZDNet
The model currently operates as a research preview exclusively for ChatGPT Pro subscribers at $200 per month, accessible through the Codex app, command-line interface, and Visual Studio Code extension
2
3
. A select group of enterprise partners will receive API access to evaluate integration possibilities3
. CEO Sam Altman hinted at the launch in a tweet, stating the new model "sparks joy" for him1
.
Source: TechCrunch
The speed breakthrough comes from running Codex-Spark on Cerebras' Wafer Scale Engine 3, a third-generation waferscale megachip containing 4 trillion transistors
1
. This hardware partnership marks OpenAI's first significant inference partnership outside its traditional Nvidia-dominated infrastructure3
. The Wafer Scale Engine architecture eliminates communication overhead by placing all compute resources on a single processor roughly the size of a dinner plate, rather than distributing AI workloads across clusters of smaller processors3
.The partnership between Cerebras and OpenAI was formalized through a multi-year agreement worth over $10 billion announced in January
1
. Sean Lie, CTO and co-founder of Cerebras, emphasized the potential for discovering "new interaction patterns, new use cases, and a fundamentally different model experience" through fast inference1
. Cerebras recently raised $1 billion at a $23 billion valuation and has announced intentions to pursue an IPO1
.While Codex-Spark delivers dramatically faster responses, OpenAI acknowledges performance tradeoffs. On SWE-Bench Pro and Terminal-Bench 2.0, industry benchmarks evaluating AI systems' ability to perform complex software engineering tasks autonomously, Codex-Spark underperforms the full GPT-5.3-Codex model
3
. The company positions this as an acceptable exchange, with developers gaining responses fast enough to maintain creative flow even if the model cannot tackle the most sophisticated multi-step programming challenges3
.OpenAI describes Codex-Spark as a "daily productivity driver" for rapid prototyping rather than the longer, heavier tasks the original model handles
1
. The model defaults to lightweight, targeted edits and doesn't automatically run tests unless requested2
. It supports interruption and redirection mid-task, enabling tight iteration loops that overcome the batch-style feeling often associated with coding agents2
. The model features a 128,000-token context window and supports text only, with no image or multimodal inputs3
.
Source: VentureBeat
Related Stories
OpenAI's infrastructure team implemented latency improvements across its entire inference stack that benefit all Codex models regardless of underlying hardware. These optimizations include an 80 percent reduction in overhead per client-server round trip, a 30 percent reduction in per-token overhead, and a 50 percent reduction in time-to-first-token through session initialization and streaming optimizations
2
3
. The introduction of persistent WebSocket connections eliminates the need for continual connection renegotiation, further improving responsiveness during iteration2
.An OpenAI spokesperson emphasized that "GPUs remain foundational across our training and inference pipelines and deliver the most cost effective tokens for broad usage," while positioning Cerebras as complementing that foundation by excelling at low-latency AI workloads
3
. This careful framing underscores the delicate balance OpenAI must strike as it diversifies chip suppliers without alienating Nvidia, the dominant force in AI accelerators3
. The developer community will watch closely as OpenAI expands access over coming weeks while tuning integration under real workloads3
.Summarized by
Navi
19 Dec 2025•Technology

05 Jun 2025•Technology

19 Nov 2025•Technology

1
Technology

2
Science and Research

3
Policy and Regulation
