Vidu Launches AI Image Generation Update to Create Imaginative Realism through Reference Images

2025-09-09

Vidu, an artificial intelligence-driven company in China known as Shengshu Technologies, has announced a major update to its platform. This enhancement empowers users to upload multiple reference images and combine them into dynamic, highly consistent visuals using an AI model, effectively “redefining photography.”

The company, best known for its generative AI video platform and foundational models, allows users to create short scenes by inputting natural language prompts along with reference images. The AI model can then generate elements and objects within scenes based on these images, maintaining strong consistency across different scenes.

Vidu has now applied similar technology to image generation, a process it calls “reference-to-image,” enabling users to gain greater control and uniformity in their output. Users may upload up to seven reference images to refine the results.

When utilizing this update, the company’s AI interprets the relationships between multiple images through what it describes as “semantic understanding,” thereby achieving a higher level of consistency. This capability has long posed a challenge for AI models, until recently when breakthroughs—such as those seen in Google LLC’s Gemini 2.5 Flash Image, also known as “Nano Banana”—made the technology more accessible.

For example, users can generate entirely new images from scratch using text prompts and multiple individual reference images through Vidu’s reference-to-image feature. According to the company, this enables rapid photo editing while maintaining a high level of consistency.

Photographers, for instance, could add elements like bouquets when editing wedding photos, change floral arrangements on tables, or adjust lighting—even on cloudy or rainy days. Users can modify self-portraits that don’t meet expectations, swap logos on shirts, or place themselves in entirely different settings. Marketers and advertisers can quickly generate AI-created “photos” that include products or replace product models in existing ad shoots.

Vidu claims that its instant image editing capabilities have significantly improved, positioning it as a competitor to current editing platforms. Typically, users seeking AI-based image synthesis rely on editing platforms or advanced workflow tools like the open-source ComfyUI to achieve consistency and control.

With this new feature, Vidu offers editing tools such as remixing, partial or full object replacement, and object addition. Users can import multiple images and freely synthesize them into a single output, which the company describes as “highly consistent” in both visual realism and coherence when compared to other models on the market. Users can adjust the appearance of objects via partial or full replacement—such as changing the color of clothing or umbrellas—or substitute them entirely with different items.

Vidu’s latest model competes directly with Google’s Nano Banana and Flux from Black Forest Labs Inc. in terms of image generation and editing performance. The company asserts that its model stands out by delivering “unparalleled image and character consistency, along with natural image blending for richer, more realistic detail,” including the ability to accurately extract and embed visual and textual information from reference images. Many modern generative AI image models still struggle to render text accurately, even with reference materials.