

2 Sources
2 Sources
[1]

Analog AI Startup Aims to Lower Gen AI's Power Needs
Machine learning chips that use analog circuits instead of digital ones have long promised huge energy savings. But in practice they've mostly delivered modest savings, and only for modest-sized neural networks. Silicon Valley startup Sageance says it has the technology to bring the promised power savings to tasks suited for massive generative AI models. The startup claims that its systems will be able to run the large language model Llama 2-70B at one-tenth the power of an Nvidia H100 GPU-based system, at one-twentieth the cost and in one-twentieth the space. "My vision was to create a technology that was very differentiated from what was being done for AI," says Sageance CEO and founder Vishal Sarin. Even back when the company was founded in 2018, he "realized power consumption would be a key impediment to the mass adoption of AI.... The problem has become many, many orders of magnitude worse as generative AI has caused the models to balloon in size." The core power-savings prowess for analog AI comes from two fundamental advantages: It doesn't have to move data around and it uses some basic physics to do machine learning's most important math. That math problem is multiplying vectors and then adding up the result, called multiply and accumulate.Early on, engineers realized that two foundational rules of electrical engineers did the same thing, more or less instantly. Ohm's Law -- voltage multiplied by conductance equals current -- does the multiplication if you use the neural network's "weight" parameters as the conductances. Kirchoff's Current Law -- the sum of the currents entering and exiting a point is zero -- means you can easily add up all those multiplications just by connecting them to the same wire. And finally, in analog AI, the neural network parameters don't need to be moved from memory to the computing circuits -- usually a bigger energy cost than computing itself -- because they are already embedded within the computing circuits. Sageance uses flash memory cells as the conductance values. The kind of flash cell typically used in data storage is a single transistor that can hold 3 or 4 bits, but Sageance has developed algorithms that let cells embedded in their chips hold 8 bits, which is the key level of precision for LLMs and other so-called transformer models. Storing an 8-bit number in a single transistor instead of the 48 transistors it would take in a typical digital memory cell is an important cost, area, and energy savings, says Sarin, who has been working on storing multiple bits in flash for 30 years. Adding to the power savings is that the flash cells are operated in a state called "deep subthreshold." That is, they are working in a state where they are barely on at all, producing very little current. That wouldn't do in a digital circuit, because it would slow computation to a crawl. But because the analog computation is done all at once, it doesn't hinder the speed. If all this sounds vaguely familiar, it should. Back in 2018 a trio of startups went after a version of flash-based analog AI. Syntiant eventually abandoned the analog approach for a digital scheme that's put six chips in mass production so far. Mythic struggled but stuck with it, as has Anaflash. Others, particularly IBM Research, have developed chips that rely on nonvolatile memories other than flash, such as phase-change memory or resistive RAM. Generally, analog AI has struggled to meet its potential, particularly when scaled up to a size that might be useful in datacenters. Among its main difficulties are the natural variation in the conductance cells; that might mean the same number stored in two different cells will result in two different conductances. Worse still, these conductances can drift over time and shift with temperature. This noise drowns out the signal representing the result, and the noise can be compounded stage after stage through the many layers of a deep neural network. Sageance's solution, Sarin explains, is a set of reference cells on the chip and a proprietary algorithm that uses them to calibrate the other cells and track temperature-related changes. Another source of frustration for those developing analog AI has been the need to digitize the result of the multiply and accumulate process in order to deliver it to the next layer of the neural network where it must then be turned back into an analog voltage signal. Each of those steps requires analog-to-digital and digital-to-analog converters, which take up area on the chip and soak up power. According to Sarin, Sageance has developed low-power versions of both circuits. The power demands of the digital-to-analog converter are helped by the fact that the circuit needs to deliver a very narrow range of voltages in order to operate the flash memory in deep subthreshold mode. Sageance's first product, to launch in 2025, will be geared toward vision systems, which are a considerably lighter lift than server-based LLMs. "That is a leapfrog product for us, to be followed very quickly [by] generative AI," says Sarin. The generative AI product would be scaled up from the vision chip mainly by vertically stacking analog AI chiplets atop a communications die. These stacks would be linked to a CPU die and to high-bandwidth memory DRAM in a single package called Delphi. In simulations, a system made up of Delphis would run Llama2-70B at 666,000 tokens per second consuming 59 kilowatts, versus a 624 kW for an Nvidia H100-based system, Sageance claims.
[2]

Sagence is building analog chips to run AI
Graphics processing units (GPUs), the chips on which most AI models run, are energy-hungry beasts. As a consequence of the accelerating incorporation of GPUs in data centers, AI will drive a 160% uptick in electricity demand by 2030, Goldman Sachs estimates. The trend isn't sustainable, argues Vishal Sarin, an analog and memory circuit designer. After working in the chip industry for over a decade, Sarin launched Sagence AI (it previously went by the name Analog Inference) to design energy-efficient alternatives to GPUs. "The applications that could make practical AI computing truly pervasive are restricted because the devices and systems processing the data cannot achieve the required performance," Sarin said. "Our mission is to break the performance and economics limitations, and in an environmentally responsible way." Sagence develops chips and systems for running AI models, as well as the software to program these chips. While there's no shortage of companies creating custom AI hardware, Sagence is somewhat unique in that its chips are analog, not digital. Most chips, including GPUs, store information digitally, as binary strings of ones and zeros. In contrast, analog chips can represent data using a range of different values. Analog chips aren't a new concept. They had their heyday from about 1935 to 1980, helping model the North American electrical grid, among other engineering feats. But the drawbacks of digital chips are making analog attractive once again. For one, digital chips require hundreds of components to perform certain calculations that analog chips can achieve with just a few modules. Digital chips also usually have to shuttle data back and forth from memory to processors, causing bottlenecks. "All the leading legacy suppliers of AI silicon use this old architectural approach, and this is blocking the progress of AI adoption," Sarin said. Analog chips like Sagence's, which are "in-memory" chips, don't transfer data from memory to processors, potentially enabling them to complete tasks faster. And, thanks to their ability to use a range of values to store data, analog chips can have higher data-density than their digital counterparts. Analog tech has its downsides, however. For example, it can be harder to achieve high precision with analog chips because they require more accurate manufacturing. They also tend to be tougher to program. But Sarin sees Sagence's chips complementing -- not replacing -- digital chips, for example, to accelerate specialized applications in servers and mobile devices. "Sagence products are designed to eliminate the power, cost and latency issues inherent in GPU hardware, while delivering high performance for AI applications," he said. Sagence, which plans to bring its chips to market in 2025, is engaged with "multiple" customers as it looks to compete with other AI analog chip ventures like EnCharge and Mythic, Sarin said. "We're currently packaging our core technology into system-level products and ensuring that we fit into existing infrastructure and deployment scenarios," he added. Sagence has secured investments from backers including Vinod Khosla, TDK Ventures, Cambium Capital, Blue Ivy Ventures, Aramco Ventures and New Science Ventures, raising a total of $58 million in the six years since its founding. Now, the startup is planning to raise capital again to expand its 75-person team. "Our cost structure is favorable because we're not chasing the performance goals by migrating to the newest [manufacturing processes] for our chips," Sarin said. "That's a big factor for us." The timing might just work in Sagence's favor. Per Crunchbase, funding to semiconductor startups appears to be bouncing back after a lackluster 2023. From January to July, VC-backed chip startups raised nearly $5.3 billion -- a number well ahead of last year, when such firms saw less than $8.8 billion raised in total. This being the case, chipmaking is an expensive proposition -- made all the more challenging by international sanctions and tariffs promised by the incoming Trump administration. Winning customers who've become "locked in" to ecosystems like Nvidia's is another uphill climb. Last year, AI chipmaker Graphcore, which raised nearly $700 million and was once valued at close to $3 billion, filed for insolvency after struggling to gain a strong foothold in the market. To have any chance at success, Sagence will have to prove that its chips do, indeed, draw dramatically less power and deliver higher efficiency than alternatives -- and raise enough venture funding to fabricate at scale.
Share
Share
Copy Link
Sageance, a Silicon Valley startup, is developing analog AI chips that could significantly reduce power consumption for large language models, potentially revolutionizing the AI hardware landscape.

Sageance, a Silicon Valley startup founded in 2018, is making waves in the AI hardware industry with its innovative analog AI chips. The company claims its technology can run large language models like Llama 2-70B at a fraction of the power, cost, and space compared to traditional GPU-based systems
1
2
.Analog AI chips have long held the promise of significant energy savings over their digital counterparts. Sageance's approach leverages two fundamental advantages:
1
.The company has developed several key technologies to make analog AI viable for large-scale applications:
1
.1
.1
.1
.The energy demands of AI are a growing concern. Goldman Sachs estimates that AI will drive a 160% increase in electricity demand by 2030
2
. Sageance's technology could potentially alleviate this issue:1
.1
.Related Stories
Sageance plans to launch its first product, focused on vision systems, in 2025. This will be followed by solutions for generative AI, utilizing a chiplet-based approach for scalability
1
2
.However, the company faces several challenges:
2
.2
.2
.The semiconductor startup landscape is showing signs of recovery, with VC-backed chip startups raising nearly $5.3 billion from January to July 2024
2
. Sageance has secured $58 million in funding over six years and is planning to raise additional capital to expand its 75-person team2
.As the AI industry grapples with increasing power demands, Sageance's analog AI chips could represent a significant breakthrough in energy-efficient computing for large language models and other AI applications.
Summarized by

Navi
[1]
[2]