MiniMax Releases M2.1 AI Model to Enhance Multilingual Programming Diversity

2025-12-24


Chinese AI startup MiniMax has unveiled M2.1, a significant upgrade enhancing performance and agent capabilities in handling real-world complex tasks, multiple programming languages, and office-centric scenarios.


The latest iteration showcases substantially improved coding proficiency across diverse programming languages, including Rust, Java, Golang, C++, Kotlin, Objective-C, TypeScript, and JavaScript. Additionally, M2.1 demonstrates enhanced aesthetic design comprehension and interpretation of user interfaces for web, Android, and iOS platforms.


Advancements in systematic problem-solving enable the model to execute not only technically accurate code but also instructions involving intricate or layered directives. According to the company, this refinement increases practical usability in enterprise environments where seemingly simple rules often involve nuanced complexities.


To achieve broader functionality, MiniMax emphasized improvements beyond coding—extending to conversational fluency and written expression. The model excels in everyday dialogue, technical documentation, and structured content generation, delivering well-organized and context-aware responses.


"Our users have come to rely on cutting-edge coding assistance from MiniMax at a fraction of the cost. Early testing indicates M2.1 performs exceptionally across the development lifecycle—from architectural planning and orchestration to code review and deployment," said Scott Breitenother, Co-Founder and CEO of Kilo Code Inc., an open-source AI agent platform for automated coding.


MiniMax M2 was initially launched in late October. The updated M2.1 shows marked improvement in multilingual contexts, outperforming Anthropic PBC's Claude Sonnet 4.5 and approaching the performance level of the more powerful Claude Opus 4.5.



As part of its evaluation framework, MiniMax introduced VIBE (Visual and Interactive Benchmark for Execution), a novel benchmark suite assessing five core competencies: web applications, simulation environments, Android, iOS, and backend development. Unlike conventional benchmarks, VIBE employs language-guided agents as evaluators, enabling assessment of both interactive logic and visual design quality in generated applications.


M2.1 achieved what the company describes as "outstanding results" on the VIBE benchmark, with an average score of 88.6. It particularly excelled in the VIBE-Web and VIBE-Android subsets, scoring 91.5 and 89.7 respectively.


The model was also tested against leading models from major providers such as Anthropic, Google LLC, OpenAI Group PBC, and DeepSeek across comprehensive industry benchmarks covering coding proficiency and general knowledge, including MMLU-Pro, Human-eval (Last Exam), and Toolathon—specifically designed for evaluating AI agents.


M2.1 demonstrated consistently high performance in tool utilization by AI agents, real-world knowledge application, and multi-step reasoning. It scored 22.0 on HLE w/o tools, a rigorous academic benchmark featuring thousands of graduate-level, cross-disciplinary multimodal problems. On MMLU, a broad measure of academic knowledge, it achieved a score of 88—on par with or close to top-tier frontier models.


The model is available via MiniMax’s application programming interface and will be accessible through Hugging Face with open weights. While the download page is not yet live, the company’s flagship offering, MiniMax Agent, is now powered by the upgraded MiniMax-2.1.