DeepSeek Engram Cuts AI Compute Waste by 25%

DeepSeek introduces conditional memory to address AI efficiency challenges

DeepSeek has released a technical paper detailing Engram, a conditional memory-based approach that fundamentally changes how AI models handle knowledge retrieval and reasoning. Co-authored by DeepSeek CEO Liang Wenfeng, the research tackles a critical inefficiency in Transformer models: the wasteful use of GPU cycles for simple lookups that could be handled through direct memory access 1

. When enterprise AI systems retrieve basic information like product names or technical specifications, they currently use expensive GPU computation designed for complex reasoning tasks, wasting cycles millions of times per day and inflating infrastructure costs 3

The research arrives as organizations face mounting pressure to deploy more capable AI systems while navigating GPU memory constraints and rising hardware costs. This HBM bottleneck is widely recognized as a key reason DRAM prices rose by 5X in just 10 weeks, as hardware demand spiked to support large AI models 2

. For Chinese AI labs operating under US export controls on GPUs, Engram offers a potential path forward by optimizing algorithmic efficiency rather than relying on brute-force compute scaling 4

How Engram separates static lookups from dynamic reasoning

Engram works by decoupling compute power from memory storage, allowing models to efficiently look up essential information without overloading GPU memory 2

. The system uses N-grams—statistical sequences of words—integrated into the model's neural networks and placed into a queryable memory bank 1

. This enables models to remember facts rather than having to reason them out through computationally expensive neural computation each time.

Source: Geeky Gadgets

The mechanism relies on hash-based lookups for static information retrieval. Token combinations extracted from input text are hashed to retrieve embeddings from a pre-trained lookup table stored in system RAM 5

. A standard Mixture of Experts (MoE) model might have to reconstruct these pieces of data every time they're referenced in a query through conditional computation, calling on expert parameters to assemble and reason the data even when focusing on certain parts 1

. Engram allows the model to simply ask "Do I already have this data?" rather than accessing the parts of the model that deal with reasoning.

To prevent errors, Engram employs a context-aware gating mechanism that filters retrieved patterns. A hash lookup for "Apple" might collide with unrelated content, or the word might mean the fruit rather than the company 3

. The model's current understanding of context acts as a filter—if retrieved memory contradicts the current context, the gate suppresses it; if it fits, the gate lets it through. This design allows models to handle long-context handling more efficiently and supports system-level prefetching with minimal performance overhead [2](https://www.techradar.com/pro/deepseek-may-have-found-a-way-to-solve-the-ram-crisis-by-eliminating-the-need-for-expensive-hbm-for-ai-inference-and-training-yes, the very reason why dram prices went up by 5x in 10 weeks).

Optimal parameter allocation reduces computational waste

Through systematic experiments, DeepSeek discovered the optimal balance between computation and memory: 75-80% of sparse model capacity allocated to dynamic reasoning tasks and 20-25% to static lookups 3

. Tests showed that reallocating around 20-25% of the parameter budget to Engram yields better performance than pure MoE models, maintaining stable gains across different scales [2](https://www.techradar.com/pro/deepseek-may-have-found-a-way-to-solve-the-ram-crisis-by-eliminating-the-need-for-expensive-hbm-for-ai-inference-and-training-yes, the very reason why dram prices went up by 5x in 10 weeks).

Source: VentureBeat

An Engram-based model scaled to nearly 27 billion parameters demonstrated measurable improvements across standard industry benchmarks 1

. Complex reasoning benchmarks jumped from 70% to 74% accuracy, while knowledge-focused tests improved from 57% to 61%, with improvements measured across Big-Bench Hard, ARC-Challenge, and MMLU 3

. Memory slot expansion provides predictable improvements without additional computational cost, confirming the scalability of conditional memory as an independent axis for sparse models [2](https://www.techradar.com/pro/deepseek-may-have-found-a-way-to-solve-the-ram-crisis-by-eliminating-the-need-for-expensive-hbm-for-ai-inference-and-training-yes, the very reason why dram prices went up by 5x in 10 weeks).

Infrastructure implications and hardware efficiency gains

Engram's deterministic retrieval mechanism allows memory capacity to scale linearly across multiple GPUs while supporting asynchronous prefetching during inference [2](https://www.techradar.com/pro/deepseek-may-have-found-a-way-to-solve-the-ram-crisis-by-eliminating-the-need-for-expensive-hbm-for-ai-inference-and-training-yes, the very reason why dram prices went up by 5x in 10 weeks). The approach works with existing GPU and system memory architectures, potentially avoiding costly high-bandwidth memory (HBM) upgrades. It also aligns with emerging CXL (Compute Express Link) standards, which aim to overcome GPU memory constraints in large-scale AI workloads [2](https://www.techradar.com/pro/deepseek-may-have-found-a-way-to-solve-the-ram-crisis-by-eliminating-the-need-for-expensive-hbm-for-ai-inference-and-training-yes, the very reason why dram prices went up by 5x in 10 weeks).

The technique may relieve pressure on expensive memory hardware, particularly in regions where HBM access lags behind competitors like Samsung, SK Hynix, and Micron [2](https://www.techradar.com/pro/deepseek-may-have-found-a-way-to-solve-the-ram-crisis-by-eliminating-the-need-for-expensive-hbm-for-ai-inference-and-training-yes, the very reason why dram prices went up by 5x in 10 weeks). Engram differs from solutions like Nvidia's KVCache, which offloads context data to NVMe memory with BlueField-4 1

. While KVCache acts as short-term memory for recent conversations, Engram provides persistent access to pre-calculated data—essentially storing the whole encyclopedia rather than just handwritten notes.

Chris Latimer, founder and CEO of Vectorize, notes that conditional memory solves a different problem than agentic AI memory systems: "It's more geared towards squeezing performance out of smaller models and getting more mileage out of scarce GPU resources" 3

. Early validation suggests models can expand parameter scale and reasoning capacity while managing memory demands more efficiently, potentially reducing sharp DDR5 DRAM price swings [2](https://www.techradar.com/pro/deepseek-may-have-found-a-way-to-solve-the-ram-crisis-by-eliminating-the-need-for-expensive-hbm-for-ai-inference-and-training-yes, the very reason why dram prices went up by 5x in 10 weeks). Organizations should monitor how Engram performs in production deployments and whether its approach to reduce computational waste becomes standard practice for optimizing AI infrastructure costs.

DeepSeek's Engram tackles AI compute waste with conditional memory breakthrough

DeepSeek introduces conditional memory to address AI efficiency challenges

How Engram separates static lookups from dynamic reasoning

Optimal parameter allocation reduces computational waste

Infrastructure implications and hardware efficiency gains

References

Deepseek research touts memory breakthrough, decoupling compute power and RAM pools to bypass GPU & HBM constraints -- Engram conditional memory module commits static knowledge to system RAM

New AI method lets models think harder while avoiding costly bandwidth

DeepSeek's conditional memory fixes silent LLM waste: GPU cycles lost to static lookups

Decoding DeepSeek's Solution to China's Compute Shortage | AIM

DeepSeek's Engram Conditional Memory Shows How to Reduce AI Compute Waste

Related Stories

DeepSeek's AI Breakthrough: Expertise Trumps Raw Compute in Model Development

DeepSeek Unveils Experimental AI Model with 'Sparse Attention' to Slash Processing Costs

DeepSeek-R1: A Game-Changer in AI Reasoning and Cost-Efficiency

Recent Highlights

Google Maps unveils Ask Maps with Gemini AI and 3D Immersive Navigation in biggest update

AI chatbots help plan violent attacks as safety guardrails fail, new investigation reveals

Three Tennessee teens sue xAI over Grok AI creating child sexual abuse material from real photos

Recent Highlights

Today's Top Stories

Val Kilmer returns in new film via AI, one year after death sparks Hollywood ethics debate

Meta's Manus launches desktop app with AI agent to automate tasks on Mac and Windows

Nvidia restarts H200 AI chip production for China after securing dual government licenses

NVIDIA DLSS 5 arrives this fall with AI-powered graphics for 16 games including Starfield