Curated by THEOUTPOST
On Wed, 9 Oct, 4:02 PM UTC
5 Sources
[1]
Integer addition algorithm could reduce energy needs of AI by 95%
A team of engineers at AI inference technology company BitEnergy AI reports a method to reduce the energy needs of AI applications by 95%. The group has published a paper describing their new technique on the arXiv preprint server. As AI applications have gone mainstream, their use has risen dramatically, leading to a notable rise in energy needs and costs. LLMs such as ChatGPT require a lot of computing power, which in turn means a lot of electricity is needed to run them. As just one example, ChatGPT now requires roughly 564 MWh daily, or enough to power 18,000 American homes. As the science continues to advance and such apps become more popular, critics have suggested that AI applications might be using around 100 TWh annually in just a few years, on par with Bitcoin mining operations. In this new effort, the team at BitEnergy AI claims that they have found a way to dramatically reduce the amount of computing required to run AI apps that does not result in reduced performance. The new technique is basic -- instead of using complex floating-point multiplication (FPM), the method uses integer addition. Apps use FPM to handle extremely large or small numbers, allowing applications to carry out calculations using them with extreme precision. It is also the most energy-intensive part of AI number crunching. The researchers call their new method Linear-Complexity Multiplication -- it works by approximating FPMs using integer addition. They claim that testing, thus far, has shown that the new approach reduces electricity demand by 95%. The one drawback it has is that it requires different hardware than that currently in use. But the research team also notes that the new type of hardware has already been designed, built and tested. How such hardware would be licensed, however, is still unclear -- currently, GPU maker Nvidia dominates the AI hardware market. How they respond to this new technology could have a major impact on the pace at which it is adopted -- if the company's claims are verified.
[2]
This New Technique Slashes AI Energy Use by 95% - Decrypt
A new technique could put AI models on a strict energy diet, potentially cutting power consumption by up to 95% without compromising quality. Researchers at BitEnergy AI, Inc. have developed Linear-Complexity Multiplication (L-Mul), a method that replaces energy-intensive floating-point multiplications with simpler integer additions in AI computations. For those unfamiliar with the term, floating-point is a mathematical shorthand that allows computers to handle very large and very small numbers efficiently by adjusting the placement of the decimal point. You can think of it like scientific notation, in binary. They are essential for many calculations in AI models, but they require a lot of energy and computing power. The bigger the number, the better the model is -- and the more computing power it requires. Fp32 is generally a full precision model, with developers reducing precision to fp16, fp8, and even fp4, so their models can run on local hardware. AI's voracious appetite for electricity has become a growing concern. ChatGPT alone gobbles up 564 MWh daily -- enough to power 18,000 American homes. The overall AI industry is expected to consume 85-134 TWh annually by 2027, roughly the same as Bitcoin mining operations, according to estimations shared by the Cambridge Centre for Alternative Finance. L-Mul tackles the AI energy problem head-on by reimagining how AI models handle calculations. Instead of complex floating-point multiplications, L-Mul approximates these operations using integer additions. So, for example, instead of multiplying 123.45 by 67.89, L-Mul breaks it down into smaller, easier steps using addition. This makes the calculations faster and uses less energy, while still maintaining accuracy. The results seem promising. "Applying the L-Mul operation in tensor processing hardware can potentially reduce 95% energy cost by element wise floating point tensor multiplications and 80% energy cost of dot products," the researchers claim. Without getting overly complicated, what that means is simply this: If a model used this technique, it would require 95% less energy to think, and 80% less energy to come up with new ideas, according to this research. The algorithm's impact extends beyond energy savings. L-Mul outperforms current 8-bit standards in some cases, achieving higher precision while using significantly less bit-level computation. Tests across natural language processing, vision tasks, and symbolic reasoning showed an average performance drop of just 0.07% -- a negligible tradeoff for the potential energy savings. Transformer-based models, the backbone of large language models like GPT, could benefit greatly from L-Mul. The algorithm seamlessly integrates into the attention mechanism, a computationally intensive part of these models. Tests on popular models such as Llama, Mistral, and Gemma even revealed some accuracy gain on certain vision tasks. At an operational level, L-Mul's advantages become even clearer. The research shows that multiplying two float8 numbers (the way AI models would operate today) requires 325 operations, while L-Mul uses only 157 -- less than half. "To summarize the error and complexity analysis, L-Mul is both more efficient and more accurate than fp8 multiplication," the study concludes. But nothing is perfect and this technique has a major achilles heel: It requires a special type of hardware, so the current hardware isn't optimized to take full advantage of it. Plans for specialized hardware that natively supports L-Mul calculations may be already in motion. "To unlock the full potential of our proposed method, we will implement the L-Mul and L-Matmul kernel algorithms on hardware level and develop programming APIs for high-level model design," the researchers say. This could potentially lead to a new generation of AI models that are fast, accurate, and super cheap -- making energy-efficient AI a real possibility.
[3]
New AI algorithm promises to slash AI power consumption by 95 percent
A hot potato: As more companies jump on the AI bandwagon, the energy consumption of AI models is becoming an urgent concern. While the most prominent players - Nvidia, Microsoft, and OpenAI - have downplayed the situation, one company claims it has come up with the solution. Researchers at BitEnergy AI have developed a technique that could dramatically reduce AI power consumption without sacrificing too much accuracy and speed. The study claims that the method could cut energy usage by up to 95 percent. The team calls the breakthrough Linear-Complexity Multiplication or L-Mul for short. The computational process uses integer additions, which require much less energy and fewer steps than floating-point multiplications for AI-related tasks. Floating-point numbers are used extensively in AI computations when handling very large or very small numbers. These numbers are like scientific notation in binary form and allow AI systems to execute complex calculations precisely. However, this precision comes at a cost. The growing energy demands of the AI boom have reached a concerning level, with some models requiring vast amounts of electricity. For example, ChatGPT uses electricity equivalent to 18,000 US homes (564 MWh daily). Analysts at the Cambridge Centre for Alternative Finance estimate that the AI industry could consume between 85 and 134 TWh annually by 2027. The L-Mul algorithm addresses this excessive waste of energy by approximating complex floating-point multiplications with simpler integer additions. In testing, AI models maintained accuracy while reducing energy consumption by 95 percent for tensor multiplications and 80 percent for dot products. The L-Mul technique also delivers proportionally enhanced performance. The algorithm exceeds current 8-bit computational standards, achieving higher precision with fewer bit-level calculations. Tests covering various AI tasks, including natural language processing and machine vision, demonstrated only a 0.07-percent performance decrease - a small tradeoff when factored into the energy savings. Transformer-based models, like GPT, can benefit the most from L-Mul, as the algorithm integrates seamlessly into the attention mechanism, a crucial yet energy-intensive component of these systems. Tests on popular AI models, such as Llama and Mistral, have even shown improved accuracy with some tasks. However, there is good news and bad news. The bad news is that L-Mul currently requires specialized hardware. Contemporary AI processing is not optimized to take advantage of the technique. The good news is plans for developing specialized hardware and programming APIs are in the works, paving the way for more energy-efficient AI within a reasonable timeframe. The only other obstacle would be companies, notably Nvidia, hampering adoption efforts, which is a genuine possibility. The GPU manufacturer has made a reputation for itself as the go-to hardware developer for AI applications. It is doubtful it will throw its hands up to more energy-efficient hardware when it holds the lion's share of the market. Those who live for complex mathematical solutions, a preprint version of the study is posted on Rutgers University's "arXiv" library.
[4]
L-Mul algorithm breakthrough slashes AI energy consumption by 95%
L-Mul aims to make AI calculations simpler and more efficient. To ensure that the widespread adoption of AI is sustainable, a research team at BitEnergy AI has made a big stride. They have developed a promising technique to drastically reduce AI's energy use. Notably, the use of artificial intelligence is expanding quickly. But industry experts are worried because AI machines are energy-intensive. Therefore, many big companies are even considering harnessing fusion energy to meet their AI needs. In the latest event, Microsoft has restarted Three Mile Island to meet its AI power needs.
[5]
95% Less Energy Consumption in Neural Networks Can be Achieved. Here's How
Researchers have proposed a new technique called L-Mul, which solves the problem of energy-intensive floating point multiplications in LLMs. AI is booming, and so is energy consumption. According to reports, ChatGPT is probably using more than half a million kilowatt-hours of electricity to respond to some 200 million requests a day. In other words, ChatGPT consumes energy equivalent to powering 17,000 households in the USA daily. A research paper titled, 'Addition is All You Need: For Energy Efficient Language Models' mentioned that multiplying floating point numbers consumes significantly more energy than integer operations. The paper states that multiplying two 32-bit floating point numbers (fp32) costs four times more energy than adding two fp32 numbers and 37 times more than adding two 32-bit integers. The researchers have proposed a new technique called linear-complexity multiplication (L-Mul), which solves the problem of energy-intensive floating point multiplications in large neural networks. Before L-Mul, neural networks typically performed computations using standard floating-point multiplication, which is computationally expensive and energy-intensive, especially for LLMs, which typically run over billions of parameters. These operations consumed significant computational resources and energy, particularly in attention mechanisms and matrix multiplications. The best part of this approach is that it is not dependent on any specific architecture. Researchers have tested with real-world models like Llama 3.1 8b, Mistral-7 b, and Gemma2-2b to prove these numbers. After testing these models, researchers have concluded that the proposed method can replace different modules in Transformer layers under fine-tuning or training-free settings. As this approach is not limited to neural networks, the implementation of L-Mul should not be limited to LLMs only but can also be extended to the hardware to achieve energy efficiency over a larger spectrum. L-Mul is a new method that approximates floating-point multiplication using only simple integer additions. This makes it faster because the time it takes grows directly with the size of the numbers (linear complexity), unlike traditional methods that get much slower as numbers get bigger (quadratic complexity). L-Mul uses straightforward bit operations and additions to avoid complicated multiplication of parts of the numbers (mantissa) and tricky rounding steps. This approach not only reduces the computational cost but also potentially decreases energy consumption by up to 95% for element-wise floating-point tensor multiplications and 80% for dot products while maintaining comparable or even superior precision to 8-bit floating-point operations in many cases. This is why Google developed bfloat16, a truncated floating point format for machine learning. Meanwhile, NVIDIA has also created TensorFloat-32 specifically for AI applications on its GPUs. The energy consumption is not only related to LLMs but goes beyond that. A Reddit user mentioned that this research paper would probably lead to all CPU manufacturers deprecating normal 8-bit float multiplication routines to a legacy/compatibility mode. Instead, any FP8 multiplication could be natively performed using the L-Mat algorithm, potentially implemented in future hardware like the 6090 series GPUs, CPUs beyond the 9000-series, or Apple's M5 chips. "It might force companies like Intel, AMD, NVIDIA, and Apple to quickly and substantially widen the memory buses across their entire hardware lines. If they don't adapt, they risk being outpaced by alternatives. For instance, inexpensive RISC-V derivatives with extensive high-bandwidth memory (HBM) or even standard FPGAs with sufficient SRAM could potentially outperform NVIDIA's top-end GB200 cards. This disruption could occur before these established companies have time to develop and release competitive products, potentially reshaping the market dynamics within just a few months," he added. Further, he suggested that this research can fundamentally change how hardware is built for neural networks. While this approach sounds promising, users have raised some concerns. A Reddit user mentioned that integer addition might require more cycles than a single clock cycle on modern GPUs, especially if implemented through bit-level manipulations and approximations. And converting back and forth between floating-point and integer representations might introduce additional overhead. Also, if we consider speed, the proposed approach might not lead to a speed-up of current GPU architectures because the GPUs are optimised for native floating-point operations. The approximation approach might involve multiple steps or require more complex handling of integer operations, which could offset potential speed gains. The paper hints that specialised hardware designed to implement the L-Mul algorithm could lead to both speed and energy efficiency gains. However, on current GPU architectures that are designed for traditional floating-point operations, the method is more likely to achieve energy efficiency improvements rather than speed-ups. "If energy efficiency is the primary concern, then this method could be very valuable in reducing costs. For performance improvements (speed), the gains might be minimal without hardware specifically optimised for the new method," he added, further suggesting that specialised hardware is essential for speed up. L-Mul is performing on-par with the current standards while saving a large amount of power. So, even if we don't achieve better speeds, L-Mul should still be considered a great technique to reduce energy consumption of neural networks.
Share
Share
Copy Link
Researchers at BitEnergy AI have developed a new algorithm called Linear-Complexity Multiplication (L-Mul) that could potentially reduce AI energy consumption by up to 95% without significant performance loss. This breakthrough could address growing concerns about AI's increasing energy demands.
Researchers at BitEnergy AI have developed a groundbreaking algorithm that could potentially slash AI energy consumption by up to 95%. The new technique, called Linear-Complexity Multiplication (L-Mul), addresses growing concerns about the escalating energy demands of artificial intelligence applications 1.
As AI applications have become mainstream, their energy requirements have skyrocketed. For instance, ChatGPT alone consumes approximately 564 MWh daily, equivalent to powering 18,000 American homes. Industry projections suggest that AI could consume between 85-134 TWh annually by 2027, rivaling the energy consumption of Bitcoin mining operations 2.
The L-Mul algorithm tackles this energy challenge by reimagining how AI models handle calculations:
Initial tests of the L-Mul algorithm have shown promising results:
The L-Mul technique could have far-reaching implications for various AI applications:
While L-Mul shows great promise, there are some challenges to overcome:
The introduction of L-Mul could potentially disrupt the AI hardware market:
Reference
[4]
[5]
Researchers develop innovative methods to significantly reduce AI's energy consumption, potentially revolutionizing the industry's environmental impact and operational costs.
2 Sources
2 Sources
Researchers at the University of Michigan have developed Perseus, a software tool that can reduce energy consumption in AI training by up to 30% without compromising speed or performance, potentially saving enough energy to power 1.1 million U.S. homes by 2026.
3 Sources
3 Sources
Chinese startup DeepSeek claims to have created an AI model that matches the performance of established rivals at a fraction of the cost and carbon footprint. However, experts warn that increased efficiency might lead to higher overall energy consumption due to the Jevons paradox.
5 Sources
5 Sources
The rapid growth of artificial intelligence is causing a surge in energy consumption by data centers, challenging sustainability goals and straining power grids. This trend is raising concerns about the environmental impact of AI and the tech industry's ability to balance innovation with eco-friendly practices.
8 Sources
8 Sources
As artificial intelligence continues to advance, concerns grow about its energy consumption and environmental impact. This story explores the challenges and potential solutions in managing AI's carbon footprint.
5 Sources
5 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved