Curated by THEOUTPOST
On Thu, 17 Apr, 4:02 PM UTC
4 Sources
[1]
Microsoft researchers say they've developed a hyper-efficient AI model that can run on CPUs | TechCrunch
Microsoft researchers claim they've developed the largest-scale 1-bit AI model, also known as a "bitnet," to date. Called BitNet b1.58 2B4T, it's openly available under an MIT license and can run on CPUs, including Apple's M2. Bitnets are essentially compressed models designed to run on lightweight hardware. In standard models, weights, the values that define the internal structure of a model, are often quantized so the models perform well on a wide range of machines. Quantizing the weights lowers the number of bits -- the smallest units a computer can process -- needed to represent those weights, enabling models to run on chips with less memory, faster. Bitnets quantize weights into just three values: -1, 0, and 1. In theory, that makes them far more memory- and computing-efficient than most models today. The Microsoft researchers say that BitNet b1.58 2B4T is the first bitnet with 2 billion parameters, "parameters" being largely synonymous with "weights." Trained on a data set of 4 trillion tokens -- equivalent to about 33 million books, by one estimate -- BitNet b1.58 2B4T outperforms traditional models of similar sizes, the researchers claim. BitNet b1.58 2B4T doesn't sweep the floor with rival 2 billion-parameter models, to be clear, but it seemingly holds its own. According to the researchers' testing, the model surpasses Meta's Llama 3.2 1B, Google's Gemma 3 1B, and Alibaba's Qwen 2.5 1.5B on benchmarks including GSM8K (a collection of grade-school-level math problems) and PIQA (which tests physical commonsense reasoning skills). Perhaps more impressively, BitNet b1.58 2B4T is speedier than other models of its size -- in some cases, twice the speed -- while using a fraction of the memory. There is a catch, however. Achieving that performance requires using Microsoft's custom framework, bitnet.cpp, which only works with certain hardware at the moment. Absent from the list of supported chips are GPUs, which dominate the AI infrastructure landscape. That's all to say that bitnets may hold promise, particularly for resource-constrained devices. But compatibility is -- and will likely remain -- a big sticking point.
[2]
Microsoft's BitNet shows what AI can do with just 400MB and no GPU
Serving tech enthusiasts for over 25 years. TechSpot means tech analysis and advice you can trust. What just happened? Microsoft has introduced BitNet b1.58 2B4T, a new type of large language model engineered for exceptional efficiency. Unlike conventional AI models that rely on 16- or 32-bit floating-point numbers to represent each weight, BitNet uses only three discrete values: -1, 0, or +1. This approach, known as ternary quantization, allows each weight to be stored in just 1.58 bits. The result is a model that dramatically reduces memory usage and can run far more easily on standard hardware, without requiring the high-end GPUs typically needed for large-scale AI. The BitNet b1.58 2B4T model was developed by Microsoft's General Artificial Intelligence group and contains two billion parameters - internal values that enable the model to understand and generate language. To compensate for its low-precision weights, the model was trained on a massive dataset of four trillion tokens, roughly equivalent to the contents of 33 million books. This extensive training allows BitNet to perform on par with - or in some cases, better than - other leading models of similar size, such as Meta's Llama 3.2 1B, Google's Gemma 3 1B, and Alibaba's Qwen 2.5 1.5B. In benchmark tests, BitNet b1.58 2B4T demonstrated strong performance across a variety of tasks, including grade-school math problems and questions requiring common sense reasoning. In certain evaluations, it even outperformed its competitors. What truly sets BitNet apart is its memory efficiency. The model requires just 400MB of memory, less than a third of what comparable models typically need. As a result, it can run smoothly on standard CPUs, including Apple's M2 chip, without relying on high-end GPUs or specialized AI hardware. This level of efficiency is made possible by a custom software framework called bitnet.cpp, which is optimized to take full advantage of the model's ternary weights. The framework ensures fast and lightweight performance on everyday computing devices. Standard AI libraries like Hugging Face's Transformers don't offer the same performance advantages as BitNet b1.58 2B4T, making the use of the custom bitnet.cpp framework essential. Available on GitHub, the framework is currently optimized for CPUs, but support for other processor types is planned in future updates. The idea of reducing model precision to save memory isn't new as researchers have long explored model compression. However, most past attempts involved converting full-precision models after training, often at the cost of accuracy. BitNet b1.58 2B4T takes a different approach: it is trained from the ground up using only three weight values (-1, 0, and +1). This allows it to avoid many of the performance losses seen in earlier methods. This shift has significant implications. Running large AI models typically demands powerful hardware and considerable energy, factors that drive up costs and environmental impact. Because BitNet relies on extremely simple computations - mostly additions instead of multiplications - it consumes far less energy. Microsoft researchers estimate it uses 85 to 96 percent less energy than comparable full-precision models. This could open the door to running advanced AI directly on personal devices, without the need for cloud-based supercomputers. That said, BitNet b1.58 2B4T does have some limitations. It currently supports only specific hardware and requires the custom bitnet.cpp framework. Its context window - the amount of text it can process at once - is smaller than that of the most advanced models. Researchers are still investigating why the model performs so well with such a simplified architecture. Future work aims to expand its capabilities, including support for more languages and longer text inputs.
[3]
Microsoft introduces an AI model that runs on regular CPUs
A group of computer scientists at Microsoft Research, working with a colleague from the University of Chinese Academy of Sciences, has introduced Microsoft's new AI model that runs on a regular CPU instead of a GPU. The researchers have posted a paper on the arXiv preprint server outlining how the new model was built, its characteristics and how well it has done thus far during testing. Over the past several years, LLMs have become all the rage. Models such as ChatGPT have been made available to users around the globe, introducing the idea of intelligent chatbots. One thing most of them have in common is that they are trained and run on GPU chips. This is because of the massive amount of computing power they need when trained on massive amounts of data. In more recent times, concerns have been raised about the huge amounts of energy being used by data centers to support all the chatbots being used for various purposes. In this new effort, the team has found what it describes as a smarter way to process this data, and they have built a model to prove it. One of the most energy-intensive parts of running AI models involves the way weights are used and stored -- typically as 8- or 16-bit floating numbers. Such an approach involves a lot of memory and CPU processing, which in turn requires a lot of energy. In their new approach, the researchers have done away with using floating point numbers altogether and instead propose the use of what they describe as a 1-bit architecture. In their innovation, weights are stored and processed using only three values: -1, 0 and 1. This allows for using nothing more than simple addition and subtraction during processing -- operations that are easily done using a CPU-based computer. Testing of the new model type showed it was able to hold its own against GPU-based models in its class size and even outperformed some of them -- all while using far less memory and, in the end, much less energy. To run such a model, the team created a runtime environment for it. The new environment is called bitnet.cpp and was designed to make the best use of the 1-bit architecture. If the claims made by the team hold up, the development of BitNet b1.58 2B4T could be a game-changer. Instead of relying on massive data farms, users could soon run a chatbot on their computer or perhaps, their phone. In addition to reducing energy demands, localizing LLM processing would greatly improve privacy and allow for working without even being connected to the Internet.
[4]
Microsoft Unveils 1-Bit Compact LLM that Runs on CPUs | AIM Media House
Microsoft has released the model weights on Hugging Face, along with open-source code for running it. Microsoft Research has introduced BitNet b1.58 2B4T, a new 2-billion parameter language model that uses only 1.58 bits per weight instead of the usual 16 or 32. Despite its compact size, it matches the performance of full-precision models and runs efficiently on both GPUs and CPUs. The model was trained on a large dataset containing 4 trillion tokens and performs well across a wide range of tasks, including language understanding, math, coding, and conversation. Microsoft has released the model weights on Hugging Face, along with open-source code for running it. In the technical report, Microsoft said that "BitNet b1.58 2B4T achieves performance on par with leading open-weight, full-precision LLMs of similar size, while offering significant advantages in computational efficiency, including substantially reduced memory footprint, energy consumption, and decoding latency." The model's architecture is "derived from the standard Transformer model... incorporating significant modifications based on the BitNet framework". The central innovation is "replacing the standard full-precision linear layers with custom BitLinear layers", where "model weights are quantised to 1.58 bits during the forward pass". This quantisation uses an "absolute mean (absmean) quantisation scheme, which maps weights to ternary values {-1, 0, +1}." Activations are quantised to 8-bit integers with an "absolute maximum (absmax) quantisation strategy, applied per token". Subln normalisation is incorporated to further enhance training stability. The feed-forward network (FFN) sub-layers employ squared ReLU (ReLU²) activation. Rotary Position Embeddings (RoPE) inject positional information. Consistent with architectures like LLaMA, all bias terms are removed from the linear layers and normalisation layers. The tokeniser developed for LLaMA 3 implements a byte-level Byte-Pair Encoding (BPE) scheme with a vocabulary size of 128,256 tokens. The training process for BitNet b1.58 2B4T consists of three phases, pre-training, supervised fine-tuning (SFT), and direct preference optimisation (DPO). BitNet b1.58 2B4T demonstrates that it's possible to dramatically reduce the computational requirements of large language models without giving up performance. With its compact architecture and competitive results, it represents a meaningful step forward in making AI models more efficient and accessible.
Share
Share
Copy Link
Microsoft researchers have developed BitNet b1.58 2B4T, a highly efficient AI model that can run on CPUs, challenging the GPU-dominated AI landscape with its innovative 1-bit architecture.
Microsoft researchers have unveiled BitNet b1.58 2B4T, a groundbreaking AI model that challenges the status quo of GPU-dependent large language models (LLMs). This innovative 2-billion parameter model uses a mere 1.58 bits per weight, compared to the standard 16 or 32 bits, while maintaining performance comparable to full-precision models of similar size 12.
BitNet's architecture employs a ternary quantization approach, using only three discrete values (-1, 0, and +1) to represent weights. This radical simplification allows the model to operate with exceptional efficiency:
The model's efficiency is further enhanced by a custom software framework, bitnet.cpp, which optimizes performance on everyday computing devices 2.
Despite its compact design, BitNet b1.58 2B4T demonstrates impressive capabilities:
The development of BitNet could have far-reaching implications for the AI industry:
However, challenges remain, including limited hardware support and a smaller context window compared to cutting-edge models 2.
Microsoft has made BitNet b1.58 2B4T openly available under an MIT license, with model weights released on Hugging Face and open-source code for implementation 14.
As researchers continue to investigate the model's effectiveness and expand its capabilities, BitNet represents a significant step towards more efficient and accessible AI technology. Its success could pave the way for a new generation of resource-conscious AI models that can operate effectively on a wider range of devices.
Reference
[1]
[3]
[4]
Researchers at BitEnergy AI have developed a new algorithm called Linear-Complexity Multiplication (L-Mul) that could potentially reduce AI energy consumption by up to 95% without significant performance loss. This breakthrough could address growing concerns about AI's increasing energy demands.
5 Sources
5 Sources
Microsoft has released a new series of Phi-3.5 AI models, showcasing impressive performance despite their smaller size. These models are set to compete with offerings from OpenAI and Google, potentially reshaping the AI landscape.
4 Sources
4 Sources
Major tech companies are developing smaller AI models to improve efficiency, reduce costs, and address environmental concerns, while still maintaining the capabilities of larger models for complex tasks.
2 Sources
2 Sources
Innovative developers have successfully adapted Meta's Llama 2 AI model to run on outdated hardware, including a Windows 98 Pentium II PC and an Xbox 360 console, showcasing the potential for AI accessibility on diverse platforms.
2 Sources
2 Sources
Hugging Face introduces SmolVLM-256M and SmolVLM-500M, the world's smallest vision-language AI models capable of running on consumer devices while outperforming larger counterparts, potentially transforming AI accessibility and efficiency.
5 Sources
5 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved