OpenAI Launches Flex Processing Options to Reduce AI Task Costs

2025-04-18

To strengthen its market competitiveness and counter rivals like Google, OpenAI has recently introduced the Flex processing option. This new feature reduces the cost of using AI models to attract users but comes with trade-offs such as longer response times and potential temporary resource unavailability.

Currently in beta testing, the Flex processing option is only compatible with OpenAI’s recently launched o3 and o4-mini inference models. It is mainly designed for low-priority or non-production tasks, including model evaluation, data augmentation, and asynchronous workloads.

The Flex processing option significantly cuts API costs by half. For the o3 model, input token costs drop from $10 per million tokens to $5, while output token costs decrease from $40 per million tokens to $20. Similarly, the o4-mini model sees a reduction, with input token costs falling from $1.10 per million tokens to $0.55, and output token costs dropping from $4.40 per million tokens to $2.20.

As the costs of cutting-edge AI technologies continue to rise, competitors are launching budget-friendly models with better value propositions. To maintain its competitive edge, OpenAI introduced this move. A recent example is Google's Gemini 2.5 Flash inference model, released on Thursday, which performs comparably or even better than DeepSeek’s R1 model, yet with lower input token costs.

In the announcement email regarding the Flex processing option, OpenAI stated that developers at usage tiers 1 to 3 must complete a newly introduced identity verification process to access the o3 model. Additionally, features such as inference summaries and streaming API support for the o3 model and other models also require identity verification.

OpenAI previously explained that the introduction of identity verification aims to prevent policy violations and misuse of its services by unauthorized users.