ByteDance Makes Significant Progress in AI Video with Seedance 2.0

2026-02-10

ByteDance has launched Seedance 2.0 to a limited group of users. The previous model was already one of the most powerful AI video generators. The new version takes it a step further.

This multimodal video generation model can process up to four types of inputs simultaneously: images, videos, audio, and text. Users can combine a maximum of nine images, three videos, and three audio files, totaling up to twelve files. The generated videos range from 4 to 15 seconds in length and automatically include sound effects or music.

These demonstration videos come directly from ByteDance and are almost certainly curated from a vast number of generated clips. It remains unknown how consistently this model achieves this quality standard in practical use, what the associated costs are, or the time required for generation. Therefore, what we are seeing likely represents the best-case scenario. Even though the capabilities appear outstanding on paper, significant hurdles remain in integrating them into professional workflows, such as maintaining consistency. Nonetheless, the demonstrated quality is genuinely impressive.

Prompt: The camera follows a man in black clothing as he swiftly flees. A crowd chases him from behind. The shot switches to a lateral chase perspective. The man panics and knocks over a roadside fruit stall, then gets up and continues running. Excited shouts from the crowd are heard in the background.

Prompt: A girl gracefully hangs laundry. After washing, she takes the next piece of clothing from the bucket and gives it a vigorous shake.

According to ByteDance, the most prominent new feature is the reference function: the model can extract shots, movements, and special effects from uploaded reference videos, replace characters, and seamlessly extend existing clips. Video editing tasks, such as replacing or adding characters, are also a significant focus.

Users only need to write simple text commands, for example: "Take @image1 as the first image of the scene. First-person perspective. Take the camera movement from @Video1. The scene above is based on @Frame2, the scene on the left is @Frame3, and the scene on the right is @Frame4."

The user records the camera movement...

...and the AI model transcribes it, along with other elements, into the generated video.

For compliance reasons, real human faces in uploaded materials are currently obscured. Seedance 2.0 is currently available in beta on the official Jimong website at jimeng.jianying.com.

Prompt: The person in the photo wears a guilty expression, eyes darting left and right, then leans out of the picture frame. She quickly withdraws her hand from the frame, picks up a bottle of cola, takes a sip, and shows a satisfied expression. At this moment, footsteps are heard. The person in the photo hurriedly places the cola back. A western cowboy walks over, takes the cola from the cup, and walks away. Finally, the camera pushes forward, the background slowly darkens, with only a spotlight above illuminating a can of cola. A cleverly designed subtitle appears at the bottom of the screen, accompanied by a voiceover: "A sip of cola—you must try it!"