ByteDance Dream Image 4.0 Released: Effortless Image Editing with Multimodal Image Generation Support

2025-09-08

ByteDance's Jimeng AI announced on September 5 that Jimeng Image 4.0 has officially launched.

The latest version introduces multi-modal image generation for the first time, enabling text-to-image creation, image editing, and group image generation within the same model, while allowing flexible control over visual details through natural language.

  • Text-to-Image: Enhanced instruction following, higher resolution support, and faster generation speed
  • Image Editing: Input one or multiple images along with natural language instructions to perform any type of editing
  • Group Image Generation: Generate multiple related images at once, ideal for creative brainstorming

Jimeng AI stated that Jimeng Image 4.0 is currently being rolled out gradually and will become fully available to all users in the coming days—users are advised to stay tuned.

According to official documentation, Jimeng Image 4.0 features five key advantages: precise instruction editing, high feature retention, deep intent understanding, multi-image input/output capabilities, and ultra-fast, ultra-high-definition performance, catering to a wide range of creative scenarios.

Users can simply describe their needs in plain language to achieve effects such as adding, modifying, replacing, or referencing elements within images.

Multiple images can be input at once, supporting complex editing operations such as merging, transferring, replacing, and deriving, enabling high-difficulty compositions. The system can also output multiple related images in one go.

In terms of speed, Jimeng Image 4.0 generates 2K images in under 1.8 seconds and supports up to 4K resolution output.