Microsoft Unveals 1-Bit LLM: Empowering AI on Legacy Hardware AI NEWS

Home
AInews
Microsoft Unveals 1-Bit LLM: Empowering AI on Legacy Hardware

Microsoft Unveals 1-Bit LLM: Empowering AI on Legacy Hardware

2025-04-19

Microsoft has released its BitNet b1.58 2B4T model on Hugging Face, but it does not run on GPUs and requires a specialized framework for support.

Researchers from Microsoft claim to have developed the first large language model with 2 billion parameters in 1-bit format. This model, BitNet b1.58 2B4T, can operate on commercial CPUs like Apple's M2 chip.

"The model was trained on a corpus of 4 trillion tokens, demonstrating how native 1-bit LLMs can match the performance of leading open-weight full-precision models of similar size while offering significant advantages in computational efficiency (memory, energy, latency)," Microsoft wrote in the project's Hugging Face repository.

What sets the BitNet model apart?

BitNet, or 1-bit LLM, is a compressed version of large language models. The original model with 2 billion parameters trained on 40 billion tokens has been reduced to a version that significantly lowers memory requirements. All weights are represented as one of three values: -1, 0, or 1, whereas other LLMs might use 32-bit or 16-bit floating-point formats.

See also: Threat actors can inject malicious packages into AI models during 'ambient coding.'

In their research paper, published as an ongoing work on Arxiv, researchers detailed how they created BitNet. While other teams have previously worked on BitNet, most efforts either applied post-training quantization (PTQ) methods to pre-trained full-precision models or developed native 1-bit models from scratch, initially on smaller scales. BitNet b1.58 2B4T represents a large-scale training of a native 1-bit LLM; it occupies only 400MB, compared to other "small models" that can reach up to 4.8GB.

Performance, purpose, and limitations of the BitNet b1.58 2B4T model

Performance comparison with other AI models

According to Microsoft, BitNet b1.58 2B4T outperforms other 1-bit models. With a maximum sequence length of 4,096 tokens, Microsoft claims it surpasses smaller models like Meta's Llama 3.2 1B or Google's Gemma 3 1B.

Researchers' goals for BitNet

Microsoft aims to make LLMs more accessible to a broader audience by creating versions that can run on edge devices, in resource-constrained environments, or for real-time applications.

However, BitNet b1.58 2B4T is not easy to run; it requires hardware compatible with Microsoft's bitnet.cpp framework. Running it on standard transformers libraries will not provide any benefits in speed, latency, or energy consumption. Unlike most AI models, BitNet b1.58 2B4T does not operate on GPUs.

What’s next?

Microsoft researchers plan to explore training larger native 1-bit models (7B, 13B parameters, and more). They noted that most current AI infrastructure lacks suitable hardware for 1-bit models, so they intend to explore "co-designing future hardware accelerators" specifically tailored for compressed AI. Researchers are also focusing on:

Increasing context length.
Improving performance on long-context chain reasoning tasks.
Adding support for multiple languages beyond English.
Integrating 1-bit models into multimodal architectures.
Better understanding the theoretical reasons behind the efficiency gains achieved through large-scale 1-bit training.

Marvel

Marvel - Interactive prototyping tool for seamless team collaboration

Coolors

Coolors - Generate custom color palettes

Khroma

Khroma - AI tool for generating personalized color palettes

Kiro AI

Kiro AI - AI IDE transforming prompts into actionable specs

Watermark Remover

Watermark Remover - AI tool for automatic watermark removal

Geo Finder AI

Geo Finder AI - AI tool for identifying locations in media

Mailteorite

Mailteorite - AI email generator that reflects your brand

Performance comparison with other AI models

Researchers' goals for BitNet

What’s next?

RECENT AI TOOLS

Visual Electric

Marvel

Coolors

Khroma

Kiro AI

RECENT AI NEWS

AWS Launches Vector Capabilities on Amazon S3

Google Launches Opal, a No-Code Tool for Building AI Mini-Apps

Qwen Launches Qwen3-Coder: Large Agent-Based Coding Model with Open Tools

New ChatGPT Agent Enables Booking, Browsing, and Form Filling—But Trust It Carefully

Trump Reveals Consideration of Splitting NVIDIA During AI Plan Speech

Cognition's AI Developer 'Devin' Eyes $10 Billion Valuation

Leena AI Introduces Voice-Functional AI 'Colleague' to Enhance Workplace Collaboration

Elon Musk Announces AI-Powered Reboot of Vine

RECENT AI TOOLS