OpenAI Launches New High-Speed Coding Model

2026-02-13

OpenAI has introduced GPT-5.3-Codex-Spark, a compact iteration of its GPT-5.3 Codex coding model specifically engineered for real-time programming. Operating on Cerebras chips, it processes over 1,000 tokens per second.

Codex-Spark marks the first product launch from OpenAI's collaboration with Cerebras, announced in January. This model runs on Cerebras's Wafer Scale Engine 3, an AI accelerator designed for rapid inference.

A research preview is now accessible to ChatGPT Pro users via the Codex application, CLI, and VS Code extension. OpenAI has indicated plans to broaden access in the coming weeks. Given the model's operation on specialized hardware, the company notes there will be separate rate limits, which may be adjusted during peak demand periods.

Codex-Spark Prioritizes Speed Over Autonomy

OpenAI's larger frontier models, such as the newly released Codex 5.3, are designed to operate autonomously for minutes or hours to accomplish complex programming tasks. Codex-Spark adopts a different approach: OpenAI states the model is optimized for interactive work, where latency is as crucial as intelligence. Developers can interrupt and redirect the model in real-time and see results immediately.

According to OpenAI, Codex-Spark is intentionally conservative in its operational method. Compared to larger models, it makes fewer changes by default and will not initiate automated tests unless explicitly requested. The model features a 128k context window and processes text only.

Accuracy Decreases, Time Decreases Significantly

OpenAI reports that Codex-Spark achieved strong results on the SWE-Bench Pro and Terminal-Bench 2.0 benchmarks, which evaluate agent-based software engineering capabilities, but with drastically reduced task completion times compared to GPT-5.3-Codex. On SWE-Bench Pro, Codex-Spark reaches similar precision in approximately two to three minutes, whereas GPT-5.3-Codex requires about 15 to 17 minutes for the same tasks.

On Terminal-Bench 2.0, Codex-Spark achieved an accuracy rate of 58.4%. The larger GPT-5.3-Codex reached 77.3%, while the older GPT-5.1-Codex-mini scored 46.1%. Both smaller models traded some accuracy for speed.

Building Codex-Spark compelled OpenAI to accelerate not just the model itself. To meet latency targets, the company rewrote critical portions of the inference stack, optimized the response flow between client and server, and redesigned session initiation for faster display of the first token. The result: round-trip overhead decreased by 80%, per-token overhead fell by 30%, and time-to-first-token was cut in half, OpenAI stated. These enhancements are applied by default for Codex-Spark and will soon be extended to all models.

OpenAI Aims to Merge Real-Time and Reasoning Modes in the Future

OpenAI describes Codex-Spark as the inaugural model in a planned family of "ultra-fast" models. More features are forthcoming, including larger models, extended context windows, and support for multimodal inputs.

Long-term, the company is working towards providing Codex with two complementary modes: one for extended reasoning and autonomous execution, and another for real-time collaboration. OpenAI plans to converge these modes over time, keeping developers in a fast interactive loop while offloading longer tasks to sub-agents in the background or distributing work across multiple models running in parallel.