Tencent's HunYuan Image-to-Video Model Goes Open Source! One-Click Creative Short Video Generation

2025-03-06

Tencent HunYuan has unveiled its latest image-to-video model and made it open source. This model allows users to upload a single image along with brief descriptions of motion and camera direction requirements, generating 5-second video clips complete with background sound effects.

In addition, the model features "lip-sync" and "motion-driven" capabilities. By uploading a portrait image and providing desired text or audio for lip-syncing, users can make characters in the image appear as if they are "talking" or "singing." Meanwhile, the "motion-driven" function enables users to create dancing videos instantly.

Currently, the public can try out the model through HunYuan's AI Video official website, while businesses and developers can apply for API access via Tencent Cloud. This open-source image-to-video model is an extension of HunYuan's text-to-video model, with a total parameter count of 13 billion. It supports various character and scene generation types, including realistic videos, anime characters, and CGI roles.

The open-source content includes weights, inference code, and LoRA training code, empowering developers to train customized derivative models based on HunYuan. The model is now available for download on mainstream developer platforms like Github and HuggingFace.

According to HunYuan's open-source technical report, its video generation model demonstrates flexible scalability. Both image-to-video and text-to-video models undergo pre-training on identical datasets. While maintaining ultra-realistic quality, smooth rendering of large-scale motions, and native camera transitions, the model captures rich visual and semantic information. It integrates multiple input conditions such as images, text, audio, and poses to achieve multi-dimensional control over generated videos.

Since its open-source release, HunYuan's video generation model has gained significant attention. In December last year, it topped HuggingFace’s overall trending list, and currently boasts over 8.9K stars on Github. Numerous developers have independently created plugins and derivative models based on the community's HunYuanVideo, resulting in more than 900 derivative versions. The earlier open-sourced HunYuan DiT text-to-image model has also inspired over 1,600 derivative models globally.

To date, HunYuan's open-source series of models have comprehensively covered multiple modalities including text, image, video, and 3D generation, accumulating over 23,000 stars and developer engagements on Github.