OpenAI launches Codex-Spark on Cerebras chips, marking first move away from Nvidia hardware

Reviewed byNidhi Govil

14 Sources

Share

OpenAI released GPT-5.3-Codex-Spark, its first production AI model running on non-Nvidia hardware from Cerebras Systems. The lightweight coding model delivers over 1,000 tokens per second—roughly 15 times faster than its predecessor—using Cerebras' wafer-scale chips packed with 4 trillion transistors. The release marks the first milestone in OpenAI's $10 billion partnership with the AI chipmaker.

OpenAI Deploys First AI Coding Model on Non-Nvidia Hardware

OpenAI released GPT-5.3-Codex-Spark on Thursday, marking a significant shift in the company's infrastructure strategy. The new AI coding model represents OpenAI's first production deployment on non-Nvidia hardware, running instead on chips from Cerebras Systems

1

. The lightweight model delivers code at more than 1,000 tokens per second, which OpenAI reports is roughly 15 times faster than its predecessor

3

. To put this speed in context, OpenAI's fastest models on Nvidia hardware deliver significantly lower throughput: GPT-4o reaches roughly 147 tokens per second, o3-mini hits about 167, and GPT-4o mini clocks around 52

1

.

Source: ZDNet

Source: ZDNet

Codex-Spark is available as a research preview to ChatGPT Pro subscribers at $200 per month through the Codex app, command-line interface, and VS Code extension

1

. OpenAI is also rolling out API access to select design partners. The model ships with a 128,000-token context window and handles text only at launch.

Cerebras Hardware Partnership Powers Faster Inference

The release represents the first milestone in OpenAI's hardware partnership with Cerebras, announced last month as a multi-year agreement worth over $10 billion

2

. Codex-Spark runs on Cerebras' Wafer Scale Engine 3, the company's third-generation waferscale megachip decked out with 4 trillion transistors

2

. The dinner-plate-sized AI accelerators feature some of the world's fastest on-chip memory, using SRAM that is roughly 1,000 times faster than the HBM4 found on Nvidia's upcoming Rubin GPUs

5

.

Source: The Register

Source: The Register

"Cerebras has been a great engineering partner, and we're excited about adding fast inference as a new platform capability," said Sachin Katti, head of compute at OpenAI

1

. Sean Lie, CTO and Co-Founder of Cerebras, emphasized the potential for discovering "new interaction patterns, new use cases, and a fundamentally different model experience" through fast inference

2

.

Real-Time Coding Designed for Low-Latency Workflows

OpenAI built Codex-Spark specifically for real-time coding rather than the heavyweight agentic tasks handled by the full GPT-5.3-Codex model launched earlier this month

1

. The company tuned the model for speed over depth of knowledge, creating what it describes as a "daily productivity driver" for rapid prototyping

2

. OpenAI reduced overhead per client-server roundtrip by 80 percent, per-token overhead by 30 percent, and time-to-first-token by 50 percent through session initialization and streaming optimizations

3

.

Source: SiliconANGLE

Source: SiliconANGLE

The model supports interruption and redirection mid-task, enabling tight iteration loops for developers who need to adjust instructions quickly

3

. It defaults to lightweight, targeted edits and doesn't automatically run tests unless requested. OpenAI says Cerebras' chips excel at assisting "workflows that demand extremely low latency"

2

.

OpenAI Strategy to Diversify Hardware Suppliers Beyond Nvidia

The Cerebras deployment is part of OpenAI's broader push to diversify hardware suppliers and meet growing computing needs . OpenAI struck a blockbuster agreement in October with AMD to deploy 6 gigawatts' worth of GPU over multiple years, and agreed to buy custom chips and networking components from Broadcom . An OpenAI spokesperson emphasized that the company's partnership with Nvidia remains "foundational" and that OpenAI is "anchoring on Nvidia as the core of our training and inference stack, while deliberately expanding the ecosystem around it" .

OpenAI noted that "GPUs remain foundational across our training and inference pipelines and deliver the most cost effective tokens for broad usage" while Cerebras complements that foundation for low-latency workflows

5

. The company suggests that as Cerebras brings more compute online, it will bring larger models to the platform for users willing to pay a premium for high-speed inference

5

.

Performance Benchmarks and Market Competition

On SWE-Bench Pro and Terminal-Bench 2.0, two benchmarks for evaluating software engineering ability, Codex-Spark reportedly outperforms the older GPT-5.1-Codex-mini while completing tasks in a fraction of the time

1

. The model delivers greater accuracy than GPT-5.1-Codex-Mini in Terminal-Bench 2.0 while being much faster than the smarter GPT-5.3-Codex model

5

. The new model marks OpenAI's latest attempt to compete with AI rivals such as Alphabet's Google and Anthropic, which are vying for dominance in the rapidly growing market for AI coding assistants . Codex has more than 1 million weekly active users, OpenAI said .

Cerebras raised $1 billion in fresh capital at a valuation of $23 billion last week and has announced intentions to pursue an IPO

2

. Sam Altman hinted at the launch in a tweet, saying "We have a special thing launching to Codex users on the Pro plan later today. It sparks joy for me"

2

.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo