Mistral Returns to the AI Forefront, Taking on DeepSeek Head-to-Head

2025-12-04

French AI startup Mistral, often seen as Europe’s underdog in a field dominated by U.S. giants and rising Chinese firms, has made a bold leap forward: on Tuesday, it unveiled its most ambitious release yet—one that puts pressure on open-source rivals.

The new lineup includes four model variants, ranging from compact assistants to a cutting-edge system with 675 billion parameters—all released under the permissive Apache 2.0 open-source license. These models are publicly downloadable, enabling anyone with suitable hardware to run, modify, fine-tune, or build applications on top of them locally.

The flagship model, Mistral Large 3, employs a sparse mixture-of-experts architecture, activating only 41 billion of its total 675 billion parameters per token. This design choice allows it to compete with heavyweight frontier models while maintaining inference compute requirements closer to those of a 40-billion-parameter model.

Mistral Large 3 was trained from scratch on 3,000 NVIDIA H200 GPUs and debuted on the LMArena leaderboard, securing second place among open-weight non-reasoning models.

Benchmark comparisons with DeepSeek tell a nuanced story. According to Mistral’s own evaluations, its top model outperforms DeepSeek V3.1 across several metrics—but trails the newer V3.2 by a few points on LMArena.

The Mistral series shines in general knowledge and expert reasoning tasks. Where DeepSeek holds an edge is raw coding speed and mathematical logic—a predictable outcome, as this release does not include reasoning-optimized models, meaning chain-of-thought capabilities aren’t baked into their architecture.

The smaller “Ministral” models hold particular appeal for developers. Available in three sizes—3B, 8B, and 14B parameters—each comes in both base and instruction-tuned variants. Notably, all natively support visual input. The 3B version caught the attention of AI researcher Simon Willison, who highlighted its ability to run entirely in-browser via WebGPU.

For hands-on experimentation, Hugging Face Spaces enables local loading and interaction using a webcam as input.

A vision-capable AI model weighing just ~3GB opens new possibilities for efficiency-conscious developers and hobbyists alike—applications range from drones and robotics to offline-capable laptops and embedded systems in vehicles.

Early testing reveals a split personality across the series. In quick trials, Mistral Large 3 demonstrated strong conversational fluency—sometimes echoing stylistic traits reminiscent of GPT-5 (including similar phrasing and emoji preferences), yet with a more natural pacing.

Mistral Large 3 also adopts a relatively lenient moderation stance, making it a better fit for rapid role-playing scenarios compared to ChatGPT, Claude, or Gemini.

For natural language tasks, creative writing, and role-play, users found the 14B instruction-tuned variant solid—though not exceptional. On Reddit’s r/LocalLLaMA, testers noted occasional repetition and reliance on fixed phrases inherited from training data, but praised its ability to generate long-form content, especially given its size.

Developers running local inference reported that the 3B and 8B models sometimes loop or produce formulaic outputs, particularly during creative tasks.

Nevertheless, the 3B model is so lightweight it can operate on modest hardware like smartphones and be fine-tuned for specific use cases. Currently, its only real competitor in this niche is the smallest variant of Google’s Gemma 3.

Enterprise adoption is already underway. HSBC announced a multi-year partnership with Mistral on Monday, planning to deploy generative AI across its operations. The bank will self-host models on its own infrastructure, combining internal technical capabilities with Mistral’s expertise—an arrangement especially attractive for financial institutions handling sensitive customer data under GDPR, given Mistral’s EU headquarters and open-weight licensing.

In collaboration with NVIDIA, Mistral developed an NVFP4-compressed checkpoint that enables Large 3 to run on just one of eight nodes of NVIDIA’s top-tier cards. NVIDIA claims the Ministral 3B processes approximately 385 tokens per second on an RTX 5090 and over 50 tokens per second on Jetson Thor for robotics—highlighting its inference efficiency without compromising output quality.

An optimized inference version of Large 3 is expected soon. Until then, models like DeepSeek R1, GLM, and Qwen Thinking retain some advantage in explicit reasoning tasks. But for enterprises seeking state-of-the-art performance, open weights, strong multilingual support across European languages, and freedom from U.S. or Chinese national security laws, the choice has gone from zero to one.