Microsoft has released its BitNet b1.58 2B4T model on Hugging Face, but it does not run on GPUs and requires a specialized framework for support.
Researchers from Microsoft claim to have developed the first large language model with 2 billion parameters in 1-bit format. This model, BitNet b1.58 2B4T, can operate on commercial CPUs like Apple's M2 chip.
"The model was trained on a corpus of 4 trillion tokens, demonstrating how native 1-bit LLMs can match the performance of leading open-weight full-precision models of similar size while offering significant advantages in computational efficiency (memory, energy, latency)," Microsoft wrote in the project's Hugging Face repository.
What sets the BitNet model apart?
BitNet, or 1-bit LLM, is a compressed version of large language models. The original model with 2 billion parameters trained on 40 billion tokens has been reduced to a version that significantly lowers memory requirements. All weights are represented as one of three values: -1, 0, or 1, whereas other LLMs might use 32-bit or 16-bit floating-point formats.
See also: Threat actors can inject malicious packages into AI models during 'ambient coding.'
In their research paper, published as an ongoing work on Arxiv, researchers detailed how they created BitNet. While other teams have previously worked on BitNet, most efforts either applied post-training quantization (PTQ) methods to pre-trained full-precision models or developed native 1-bit models from scratch, initially on smaller scales. BitNet b1.58 2B4T represents a large-scale training of a native 1-bit LLM; it occupies only 400MB, compared to other "small models" that can reach up to 4.8GB.
Performance, purpose, and limitations of the BitNet b1.58 2B4T model
Performance comparison with other AI models
According to Microsoft, BitNet b1.58 2B4T outperforms other 1-bit models. With a maximum sequence length of 4,096 tokens, Microsoft claims it surpasses smaller models like Meta's Llama 3.2 1B or Google's Gemma 3 1B.
Researchers' goals for BitNet
Microsoft aims to make LLMs more accessible to a broader audience by creating versions that can run on edge devices, in resource-constrained environments, or for real-time applications.
However, BitNet b1.58 2B4T is not easy to run; it requires hardware compatible with Microsoft's bitnet.cpp framework. Running it on standard transformers libraries will not provide any benefits in speed, latency, or energy consumption. Unlike most AI models, BitNet b1.58 2B4T does not operate on GPUs.
What’s next?
Microsoft researchers plan to explore training larger native 1-bit models (7B, 13B parameters, and more). They noted that most current AI infrastructure lacks suitable hardware for 1-bit models, so they intend to explore "co-designing future hardware accelerators" specifically tailored for compressed AI. Researchers are also focusing on:
- Increasing context length.
- Improving performance on long-context chain reasoning tasks.
- Adding support for multiple languages beyond English.
- Integrating 1-bit models into multimodal architectures.
- Better understanding the theoretical reasons behind the efficiency gains achieved through large-scale 1-bit training.