OpenAI Releases GPT-5.1-Codex-Max for Round-the-Clock Engineering Tasks

2025-11-20

OpenAI is rolling out GPT-5.1-Codex-Max, a new model designed to handle extensive context windows and tackle complex engineering tasks that can take hours to complete.

The company has launched its latest “agent” coding model, GPT-5.1-Codex-Max, which it says is engineered for “long-running, meticulous work.” This model replaces the previous GPT-5.1-Codex as the standard across all Codex interfaces.

On the SWE-Bench validation benchmark, GPT-5.1-Codex-Max is projected to achieve a top score of 77.9%, surpassing recent offerings from Anthropic and Google’s Gemini 3. Internally, on the “SWE-Lancer IC SWE” benchmark, performance jumped from 66.3% to 79.9%.

According to OpenAI, the new model uses 30% fewer “thinking tokens” than its predecessor while maintaining equivalent output quality. Real-world task execution is also 27% to 42% faster. For latency-insensitive workflows, a new “ultra-high” reasoning mode allocates additional time for deeper analysis.

OpenAI notes that GPT-5.1-Codex-Max is the first model specifically trained to operate effectively in Windows environments, enhancing its command-line task capabilities. The company reports that 95% of engineers use Codex weekly, and pull request volume has risen by 70% since the tool’s introduction.

Access is now available to ChatGPT Plus, Pro, Team, Edu, and Enterprise users. The Max version will become the default, with the prior model being deprecated within days of this release. That short-lived predecessor was priced at $1.25 per million input tokens and $10 per million output tokens; pricing for the new model has not yet been disclosed. API availability is expected soon.

ChatGPT Plus users are limited to 45–225 local messages and 10–60 cloud tasks every five hours. Professional-tier users receive significantly higher quotas, ranging from 300–1,500 local messages and 50–400 cloud tasks over the same period.

New Capabilities Enable All-Day Coding Sessions

OpenAI states the model can maintain focus on a single task for “over 24 hours” in internal testing—handling scenarios like fixing test failures or iteratively refining implementations. While specific workload details remain undisclosed, this claim aligns with Anthropic’s recent statements about Sonnet 4.5’s extended runtime abilities.

To manage these prolonged sessions, OpenAI employs a technique called “compression.” When the context window fills up, the system automatically condenses the conversation history—summarizing essential information and discarding non-critical details. This allows the AI to retain core objectives and key steps across millions of tokens. GPT-5.1-Codex-Max is the first model natively trained to operate seamlessly across multiple context windows.