OpenAI launches Codex-Spark with Cerebras chips for 15x faster code generation

Reviewed byNidhi Govil

3 Sources

Share

OpenAI has released GPT-5.3-Codex-Spark, a lightweight version of its AI coding tool designed for near-instantaneous responses. The model runs on Cerebras' Wafer Scale Engine 3 chips, delivering code generation 15 times faster than its predecessor. This marks OpenAI's first major inference partnership beyond Nvidia, signaling a strategic shift in its hardware infrastructure.

OpenAI Unveils Codex-Spark for Real-Time Coding

OpenAI has launched GPT-5.3-Codex-Spark, a streamlined version of its agentic AI coding tool designed specifically for faster code generation and real-time coding collaboration

1

. The new language model generates code 15 times faster than the full GPT-5.3-Codex while maintaining capability for real-world coding tasks

2

. Described as a "smaller version" optimized for swift inference, Codex-Spark represents the first milestone in the OpenAI and Cerebras partnership announced last month

1

.

Source: ZDNet

Source: ZDNet

The model currently operates as a research preview exclusively for ChatGPT Pro subscribers at $200 per month, accessible through the Codex app, command-line interface, and Visual Studio Code extension

2

3

. A select group of enterprise partners will receive API access to evaluate integration possibilities

3

. CEO Sam Altman hinted at the launch in a tweet, stating the new model "sparks joy" for him

1

.

Source: TechCrunch

Source: TechCrunch

Cerebras WSE-3 Chip Powers Low-Latency AI Workloads

The speed breakthrough comes from running Codex-Spark on Cerebras' Wafer Scale Engine 3, a third-generation waferscale megachip containing 4 trillion transistors

1

. This hardware partnership marks OpenAI's first significant inference partnership outside its traditional Nvidia-dominated infrastructure

3

. The Wafer Scale Engine architecture eliminates communication overhead by placing all compute resources on a single processor roughly the size of a dinner plate, rather than distributing AI workloads across clusters of smaller processors

3

.

The partnership between Cerebras and OpenAI was formalized through a multi-year agreement worth over $10 billion announced in January

1

. Sean Lie, CTO and co-founder of Cerebras, emphasized the potential for discovering "new interaction patterns, new use cases, and a fundamentally different model experience" through fast inference

1

. Cerebras recently raised $1 billion at a $23 billion valuation and has announced intentions to pursue an IPO

1

.

Balancing Speed with Capability Tradeoffs

While Codex-Spark delivers dramatically faster responses, OpenAI acknowledges performance tradeoffs. On SWE-Bench Pro and Terminal-Bench 2.0, industry benchmarks evaluating AI systems' ability to perform complex software engineering tasks autonomously, Codex-Spark underperforms the full GPT-5.3-Codex model

3

. The company positions this as an acceptable exchange, with developers gaining responses fast enough to maintain creative flow even if the model cannot tackle the most sophisticated multi-step programming challenges

3

.

OpenAI describes Codex-Spark as a "daily productivity driver" for rapid prototyping rather than the longer, heavier tasks the original model handles

1

. The model defaults to lightweight, targeted edits and doesn't automatically run tests unless requested

2

. It supports interruption and redirection mid-task, enabling tight iteration loops that overcome the batch-style feeling often associated with coding agents

2

. The model features a 128,000-token context window and supports text only, with no image or multimodal inputs

3

.

Source: VentureBeat

Source: VentureBeat

Infrastructure Optimizations Beyond Hardware

OpenAI's infrastructure team implemented latency improvements across its entire inference stack that benefit all Codex models regardless of underlying hardware. These optimizations include an 80 percent reduction in overhead per client-server round trip, a 30 percent reduction in per-token overhead, and a 50 percent reduction in time-to-first-token through session initialization and streaming optimizations

2

3

. The introduction of persistent WebSocket connections eliminates the need for continual connection renegotiation, further improving responsiveness during iteration

2

.

An OpenAI spokesperson emphasized that "GPUs remain foundational across our training and inference pipelines and deliver the most cost effective tokens for broad usage," while positioning Cerebras as complementing that foundation by excelling at low-latency AI workloads

3

. This careful framing underscores the delicate balance OpenAI must strike as it diversifies chip suppliers without alienating Nvidia, the dominant force in AI accelerators

3

. The developer community will watch closely as OpenAI expands access over coming weeks while tuning integration under real workloads

3

.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo