Tencent Releases New Generation Fast-Thinking Model HunYuan Turbo S

2025-02-28

Tencent has recently unveiled its latest fast-thinking model, HunYuan Turbo S. Unlike previous slow-thinking models such as Deepseek R1 and HunYuan T1, this new model boasts instant responsiveness, enabling "immediate replies" with a typing speed twice as fast as before and a 44% reduction in first-character latency.

In areas like knowledge, reasoning, and content creation, HunYuan Turbo S demonstrates excellent performance. Research shows that 90% to 95% of human daily decision-making relies on intuition. Fast-thinking models, much like human intuition, empower large-scale models to respond quickly in general scenarios. In contrast, slow-thinking models focus more on rational thinking by breaking down problems logically to provide solutions.

By combining the strengths of both fast and slow thinking, large models can solve various problems more intelligently and efficiently. HunYuan Turbo S achieves significant overall performance improvements by integrating short and long thought chains. While maintaining rapid responses for humanities-related queries, it leverages proprietary data from the HunYuan T1 slow-thinking model's extended thought chains to notably enhance reasoning capabilities in scientific fields.

In multiple industry-standard public benchmark tests, HunYuan Turbo S performs comparably to leading models such as DeepSeek V3, GPT-4o, and Claude across knowledge, mathematics, and reasoning domains.

In terms of architecture, HunYuan Turbo S innovatively adopts a Hybrid-Mamba-Transformer fusion model, effectively reducing computational complexity associated with traditional Transformer structures and minimizing KV-Cache memory usage, thereby lowering training and inference costs. This new fusion approach not only capitalizes on Mamba’s efficiency in handling long sequences but also retains Transformer’s strength in capturing complex contextual relationships, creating a hybrid architecture optimized for memory and computational efficiency.

Notably, this marks the first successful application of the Mamba architecture in an ultra-large-scale Mixture-of-Experts (MoE) model without any loss of performance. Thanks to these architectural innovations, the deployment cost of HunYuan Turbo S has been significantly reduced, helping lower the barrier for applying large models.

As the flagship model in Tencent's HunYuan series, HunYuan Turbo S will serve as the core foundation for derivative models, providing essential capabilities for models focused on reasoning, long-form text, coding, and more. Based on HunYuan Turbo S, Tencent has also launched T1, a reasoning model with deep-thinking capabilities, now available to all users on the Tencent Yuanbao platform.

Currently, developers and enterprise users can access the HunYuan Turbo S model via Tencent Cloud’s API, with a free trial available for one week starting today. Pricing-wise, input costs are set at 0.8 yuan per million tokens, while output costs are 2 yuan per million tokens, representing a substantial price drop compared to previous models. Tencent Yuanbao will gradually roll out HunYuan Turbo S, allowing users to experience it by selecting the “HunYuan” model and disabling the deep-thinking feature within the Yuanbao interface.