Tencent has officially launched its latest AI video generation model, boasting an impressive 13 billion parameter size, making it the largest open-source video generation model in terms of parameters to date. To benefit a wider range of developers, Tencent has fully open-sourced the model by uploading its weights, inference code, algorithms, and other key resources to GitHub and Hugging Face platforms without any restrictions.
Currently, the video generation model is available on Tencent Yuanyou APP, where users can apply for trials in the "AI Video" section of the AI applications to experience the cutting-edge technology firsthand. Additionally, Tencent has launched API testing interfaces, allowing developers to seamlessly integrate via Tencent Cloud and further expand its application scenarios.
Tencent's Hunyuan video generation model sets a new trend in video creation technology with its four standout features. Firstly, its hyper-realistic image quality delivers high-definition, lifelike visual experiences, fully catering to industrial-grade commercial needs such as advertising and creative video production. Secondly, its high semantic consistency allows users to meticulously define generated content, enabling the model to accurately interpret textual intentions and providing substantial creative flexibility. Additionally, the smooth motion rendering and native camera transition capabilities enhance the narrative and aesthetic appeal of the videos.
To assist developers in leveraging the model effectively, Tencent has provided comprehensive prompt usage tips. Users can tailor their prompt combinations based on diverse creative requirements, such as "subject + scene + motion" or "subject (description) + scene (description) + motion (description) + (camera language) + (ambiance description) + (style expression)," enabling the generation of a wide variety of video outcomes.
From a technical standpoint, Tencent's Hunyuan video generation model also excels. Official evaluations indicate that the model leads in various aspects, including text-video consistency, motion quality, and image clarity. Furthermore, the model features three key technological innovations: first, the text encoder has been adapted to a new generation of multimodal large language models, enhancing its semantic tracking capabilities; second, the visual encoder supports mixed image/video training, significantly improving compression and reconstruction performance; third, employing a unified full-attention mechanism ensures smoother video generation and enables consistent subject representation across multiple camera angles.
For more videos generated by Tencent’s Hunyuan model and comparative results with Sora using the same prompts, users can participate in platforms like Quantum Bit during the beta testing phase.