Meituan Open-Sources LongCat-Flash-Chat Large Model: 560B Parameters, Superior Performance in Agent Tasks

2025-09-01

Mt. today officially released and open-sourced LongCat-Flash-Chat.

LongCat-Flash features an innovative Mixture-of-Experts (MoE) architecture, with a total of 560B parameters and active parameters ranging from 18.6B to 31.3B (average of 27B), achieving dual optimization of computational efficiency and performance.

At the architectural level, LongCat-Flash introduces the "Zero-Computation Experts" mechanism. While maintaining a total parameter count of 560B, each token dynamically activates only 18.6B–31.3B parameters based on contextual demands, enabling on-demand allocation and efficient utilization of computational resources. To manage total computational costs, a PID controller is employed during training to fine-tune expert biases in real time, keeping the average activation per token around 27B.

Additionally, cross-layer channels are implemented in the model architecture, significantly enhancing the parallelism between MoE communication and computation, thereby improving both training and inference efficiency. Combined with customized low-level optimizations, LongCat-Flash completed training within 30 days and achieved an inference speed of over 100 tokens per second per user on H800 hardware. The model also incorporates enhancements to common large model components and training methodologies, utilizing hyperparameter transfer and model layer stacking strategies to ensure training stability through multiple approaches.

For agentic capabilities, LongCat-Flash developed a proprietary Agentic evaluation dataset to guide data strategies, with comprehensive optimizations applied throughout the training pipeline—including multi-agent methods to generate diverse, high-quality trajectory data—resulting in superior agentic performance.

Leveraging joint algorithmic and engineering design, LongCat-Flash significantly outperforms industry models of similar or even smaller scale in terms of theoretical cost and speed. Through system-level optimizations, it achieves a generation speed of 100 tokens per second on H800 hardware, delivering ultra-fast inference while maintaining a remarkably low output cost of just 5 RMB per million tokens.

As a foundational model not specifically optimized for reasoning tasks, LongCat-Flash-Chat performs on par with leading mainstream models despite activating only a small fraction of its parameters. Notably, it excels in agentic tasks and, due to its efficiency-focused design and innovation, demonstrates significantly faster inference speeds, making it ideal for long-duration and complex agentic applications.

In general domain knowledge, LongCat-Flash scored 86.50 on the ArenaHard-V2 benchmark, ranking second among all evaluated models. It achieved an MMLU score of 89.71 and a CEval score of 90.44, placing it among the top-performing models in China while operating with fewer parameters than models such as DeepSeek-V3.1 and Kimi-K2.

In agentic tool usage, LongCat-Flash demonstrates a clear advantage: even against models with larger parameter counts, it outperforms others on τ2-Bench (agentic tool usage benchmark). In high-complexity scenarios, it leads the VitaBench (complex scenario agentic benchmark) with a score of 24.30.

In programming, LongCat-Flash ranks second on TerminalBench (command-line task benchmark) with a score of 39.51 and scores 60.4 on SWE-Bench-Verified (software engineering verification benchmark).

In instruction following, LongCat-Flash leads the IFEval (instruction-following evaluation benchmark) with a score of 89.65. It also achieved top scores on COLLIE (Chinese instruction-following benchmark) and Meeseeks-zh (Chinese multi-scenario instruction benchmark), scoring 57.10 and 43.03 respectively, highlighting its superior handling of high-difficulty instruction sets in both Chinese and English.

Currently, LongCat-Flash-Chat has been simultaneously open-sourced on GitHub and Hugging Face platforms.