IBM Releases Compact Open-Source Granite 4 Model for Mobile Devices and Browsers

2025-10-30

IBM today unveiled Granite 4 Nano, a new family of ultra-compact generative AI models engineered specifically for on-device, edge, or browser-based deployment.

According to the company, these models deliver exceptional performance relative to their size, marking them as IBM’s smallest large language models (LLMs) to date.

The Granite 4.0 Nano series comprises four instruction-tuned models along with their base counterparts, ranging from 350 million to 1.5 billion parameters. Parameters are internal values learned during training that enable LLMs to interpret the context of user queries and generate coherent responses.

Larger LLMs demand substantial computational resources and energy, driving up operational costs and necessitating specialized hardware such as high-end GPUs and extensive memory. In contrast, smaller models like those in the Nano series require significantly less compute and memory, enabling them to run efficiently on everyday consumer devices—including laptops, desktops, and smartphones.

While reducing model size can sometimes compromise accuracy or contextual knowledge, IBM leverages advanced compression techniques to retain robust capabilities within a compact footprint.

Ultra-small LLMs enhance privacy and security by enabling offline inference, offering full control over deployment, and supporting extensive customization. By keeping sensitive data local and avoiding cloud transmission, these models also eliminate recurring cloud expenses, improving cost efficiency.

The lineup includes Granite 4.0 H 1B and 350M—1.5 billion and 350 million parameter models—featuring a hybrid architecture unique to the family, as well as two conventional transformer-based variants designed for compatibility in environments where hybrid workloads may lack optimized support.

Granite 4 models incorporate a specialized architecture developed by IBM that augments the standard transformer design with additional algorithms. While transformers rely on attention mechanisms to focus on the most relevant parts of input text, IBM integrates components based on the Mamba neural network architecture, which offers superior hardware efficiency compared to traditional transformers.

The sub-billion to near-one-billion parameter model segment is highly competitive, with developers prioritizing performance and functionality. Key rivals include Alibaba Group’s Qwen series, Liquid AI Inc.’s Liquid Foundation Models, and Google’s Gemma models.

IBM reports that Granite Nano models outperform several similarly sized competitors across multiple benchmarks in general knowledge, mathematics, coding, and security. They also excel in agent workflows, particularly in instruction-following and tool-use tasks, as measured by IFEval (Instruction-Following Evaluation) and the Berkeley Function Calling Leaderboard v3.

Specifically, Granite 4.0 H 1B achieved a top score of 78.5 on IFEval, surpassing Qwen3 1.7B at 73.1 and Gemma 3 1B at 59.3. In tool-calling performance on the Berkeley leaderboard, the same model scored 54.8, ahead of Qwen3’s 52.2 and Gemma 3’s 16.3.

All Granite 4 Nano models are released under the permissive Apache 2.0 open-source license, which supports broad commercial usage and includes favorable provisions for academic and research applications.