OpenAI has unveiled significant enhancements with its new GPT-Image 1.5 model, delivering improved prompt accuracy, superior detail retention, and dramatically faster image generation speeds.
The latest iteration generates images up to four times quicker than its predecessor, allowing users to generate visuals while others are still waiting in queue. The upgraded model is now available to all ChatGPT users and accessible via API.
Fidji Simo, CEO of the OpenAI application, views this advancement as part of a broader evolution: transforming ChatGPT from a passive text-based assistant into a fully generative interface that dynamically integrates components based on user intent.
Enhanced consistency in lighting, composition, and facial rendering during edits
The updated model handles image editing with greater precision, making targeted adjustments without compromising other elements of the scene. Lighting, compositional balance, and facial expressions remain more consistent compared to earlier versions. OpenAI highlights its ability to add, remove, blend, combine, and remix visual elements seamlessly.
Use cases span photo editing, virtual try-ons for clothing and hairstyles, and artistic style transfers. Demonstrations by OpenAI include merging people and a dog from separate photos into one cohesive scene, or converting a standard photograph into a vintage Hollywood-style movie poster.
Image Source: OpenAI
Superior adherence to complex prompts
The new model demonstrates marked improvement in following intricate instructions. In a test requiring a precise 6x6 grid layout with specific objects per cell, the updated version correctly arranged all elements, whereas the previous model failed. This enables creation of images where exact positioning is critical.
OpenAI tested the model with a highly detailed prompt: a 6x6 grid containing 36 distinct items such as the Greek letter beta, a beach ball, a praying mantis, a bathtub, the word "miracle," a mute symbol, and a Canada goose. The new model rendered the arrangement flawlessly. | Image Source: OpenAI
Text rendering capabilities have also advanced significantly. The model can now produce denser and smaller text, enabling clear depictions of article excerpts, short tables, or labeled infographics. However, OpenAI acknowledges ongoing challenges with lengthy paragraphs, unusual fonts, multiple faces in one image, and multilingual content generation.
A benchmark test involving a surreal and unprecedented scenario—a horse riding an astronaut—was used to evaluate performance. Previous models struggled with such abstract concepts, but the latest generation, including Flux 2, handled it effectively. GPT-Image 1.5 performs competitively against Google’s Nano Banana Pro, substantially outperforming earlier iterations.
Test Prompt: “Widescreen 16:9, an ultra-realistic DSLR photo. In the foreground, a monkey holding a pink banana sits atop a tiger. In the background, a horse rides an astronaut. The astronaut functions as a living 'spacesuit saddle,' with the horse clearly in control as the rider. To be absolutely clear: the horse is the rider, and the astronaut is being ridden—not the other way around. High resolution, sharp focus, realistic lighting.”
Previous GPT-4o image model with the same prompt. The output appears more artificial, failing to grasp the conceptual reversal of rider and mount.
Nano Banana Pro also managed the complex prompt well, producing a more natural-looking image, though results may vary depending on phrasing.
Initial impressions suggest that ChatGPT's image outputs appear more vivid and stylized compared to Nano Banana Pro, which tends toward literal interpretations resulting in casual, snapshot-like aesthetics rather than polished compositions. That said, subtle differences in prompting could influence these outcomes.
GPT-1.5 results within ChatGPT using input: a photo of Max, with the prompt: “Dress him as Santa Claus and place him in a winter wonderland filled with dachshunds. He’s holding two of them. 16:9”
Nano Banana Pro interpreted the Santa concept more literally.
When prompted to “zoom out and make it look more natural, like an everyday photo taken with a smartphone camera,” ChatGPT produced a more authentic, lifelike result.
Despite performance gains, API pricing drops by 20%
Developers can access the model through the API under the name GPT-Image 1.5. OpenAI reports a 20% reduction in cost for both image input and output tokens compared to prior versions. Pricing is set at $8 per million input tokens and $32 per million output tokens for images. Text processing remains at $5 per million input tokens and $10 per million output tokens. Previously, GPT-1 image generation costs ranged from $0.02 to $0.19 per image depending on quality settings.
According to OpenAI, the model shows improved fidelity in preserving brand logos and key visual assets—making it particularly valuable for marketing and e-commerce applications. The earlier version of ChatGPT’s image generation remains available for use via custom GPTs.