Baidu's Latest ERNIE Model Brings Visual Reasoning to Open-Source AI AI NEWS

Home
AInews
Baidu's Latest ERNIE Model Brings Visual Reasoning to Open-Source AI

Baidu's Latest ERNIE Model Brings Visual Reasoning to Open-Source AI

2025-11-13

Chinese tech giant Baidu has unveiled ERNIE-4.5-VL-28B-A3B-Thinking, a new multimodal AI model capable of incorporating image processing directly into its reasoning workflow.

The company asserts that the model outperforms leading commercial systems such as Google’s Gemini 2.5 Pro and OpenAI’s GPT-5 High across multiple multimodal benchmarks. Despite utilizing only 3 billion active parameters—thanks to its routed architecture, which brings the total parameter count to 28 billion—the model delivers robust performance and can run efficiently on a single 80 GB GPU, such as the Nvidia A100.

Released under the Apache 2.0 license, ERNIE-4.5-VL-28B-A3B-Thinking is freely available for commercial use, although its reported capabilities have not yet been independently verified.

A standout feature of the model is its “thinking with images” capability, which enables dynamic image cropping to focus on critical visual details. In one demonstration, the system automatically zoomed in on a blue sign and accurately extracted its text content.

Additional evaluations show that the model can precisely locate individuals within images and return their bounding box coordinates, solve mathematical problems by analyzing circuit diagrams, and recommend optimal visiting times based on data visualizations. For video inputs, it extracts subtitles and aligns scenes with specific timestamps. Moreover, it can leverage external tools like web-based image search to identify unfamiliar objects.

While Baidu highlights the model’s ability to crop and process images during inference, this approach isn’t entirely novel. In April 2025, OpenAI introduced similar functionality in its o3 and o4-mini models, which natively integrate images into their internal reasoning chains and employ built-in operations like zooming, cropping, and rotating to tackle visual tasks—setting new standards for agent-like reasoning and problem-solving.

What’s particularly noteworthy now is that these advanced visual reasoning capabilities—once exclusive to proprietary Western models—are rapidly emerging in open-source Chinese alternatives, appearing just months after their debut in Western AI systems.

Vizcom AI

Transform sketches into 3D models and edit them

Keploy

Automated testing made easy with AI technology

Figma Make

Create prototype apps from existing designs

Doctronic

AI platform providing personalized health guidance

3D Look AI

AI body scanner for accurate body measurements

VulnZap

AI code vulnerability scanner

The Furnisher

AI room design tool for quick makeovers

RECENT AI TOOLS

Plaud

Vizcom AI

Keploy

Figma Make

Doctronic

RECENT AI NEWS

New Deepseek Technique Balances Signal Flow and Learning Capability in Large AI Models

Lightricks Open-Sources AI Video Model LTX-2 to Challenge Sora and Veo

Motional Puts AI at Core of Robotaxi Revival, Targeting 2026 Launch

Google Announces New Agreement to Drive Business Activities with AI Agents

Google Removes AI Overviews for Certain Medical Queries

Musk to Launch xAI's First AI-Powered Coding Tool, Grok Build, Next Month

RECENT AI TOOLS