10 Sources
10 Sources
[1]
DeepSeek tests "sparse attention" to slash AI processing costs
Ever wonder why ChatGPT slows down during long conversations? The culprit is a fundamental mathematical challenge: processing long sequences of text requires massive computational resources, even with the efficiency tricks that companies have already deployed. While US tech giants can afford to throw more hardware at the problem, Chinese AI company DeepSeek, which is cut off from a steady supply of some advanced AI chips by export restrictions, has extra motivation to squeeze more performance from less silicon. On Monday, DeepSeek released an experimental version of its latest simulated reasoning language model, DeepSeek-V3.2-Exp, which introduces what it calls "DeepSeek Sparse Attention" (DSA). It's the company's implementation of a computational technique likely already used in some of the world's most prominent AI models. OpenAI pioneered sparse transformers in 2019 and used the technique to build GPT-3, while Google Research published work on "Reformer" models using similar concepts in 2020. (The full extent to which Western AI companies currently use sparse attention in their latest models remains undisclosed.) Despite sparse attention being a known approach for years, DeepSeek claims its version achieves "fine-grained sparse attention for the first time" and has cut API prices by 50 percent to demonstrate the efficiency gains. But to understand more about what makes DeepSeek v3.2 notable, it's useful to refresh yourself on a little AI history. DeepSeek made waves in January when its R1 simulated reasoning model reportedly matched OpenAI's o1 performance while costing only $6 million to train, and its chat app briefly topped the iPhone App Store, surpassing ChatGPT. All eyes are on the company that has given some of America's leading AI labs a run for their money. The attention bottleneck In AI, "attention" is a term for a software technique that determines which words in a text are most relevant to understanding each other. Those relationships map out context, and context builds meaning in language. For example, in the sentence "The bank raised interest rates," attention helps the model establish that "bank" relates to "interest rates" in a financial context, not a riverbank context. Through attention, conceptual relationships become quantified as numbers stored in a neural network. Attention also governs how AI language models choose what information "matters most" when generating each word of their response. Calculating context with a machine is tricky, and it wasn't practical at scale until chips like GPUs that can calculate these relationships in parallel reached a certain level of capability. Even so, the original Transformer architecture from 2017 checked the relationship of each word in a prompt with every other word in a kind of brute force way. So if you fed 1,000 words of a prompt into the AI model, it resulted in 1,000 x 1,000 comparisons, or 1 million relationships to compute. With 10,000 words, that becomes 100 million relationships. The cost grows quadratically, which created a fundamental bottleneck for processing long conversations. Although it's likely that OpenAI uses some sparse attention techniques in GPT-5, long conversations still suffer performance penalties. Every time you submit a new response to ChatGPT, the AI model at its heart processes context comparisons for the entire conversation history all over again. Of course, the researchers behind the original Transformer model designed it for machine translation with relatively short sequences (maybe a few hundred tokens, which are chunks of data that represent words), where quadratic attention was manageable. It's when people started scaling to thousands or tens of thousands of tokens that the quadratic cost became prohibitive. Sparse attention works differently. Instead of checking every word against every word, it only examines a subset of word relationships that the model determines are most relevant. For example, when processing word number 5,000 in a document, the model might only check its relationship with 100 carefully selected earlier words rather than all 4,999 preceding words. DeepSeek's model gains the ability to determine which relationships to prioritize through training, using what DeepSeek calls a "lightning indexer." As laid out in DeepSeek's paper on the new model, this small neural network component scores the relevance between word pairs and selects the top 2,048 most important connections for each word, though the paper doesn't fully explain how this indexer makes its decisions. DeepSeek claims its implementation can identify which connections to skip without degrading the model's understanding of the overall text. Early benchmarks show promise DeepSeek-V3.2-Exp builds on the company's previous V3.1-Terminus model but incorporates DeepSeek Sparse Attention. According to the company's benchmarks, the experimental model performs comparably to its predecessor even while using sparse attention. Notably, unlike OpenAI and Anthropic's high-end AI models, the release includes open source components under the MIT License and open weights, allowing other researchers to build on the work. TechCrunch reports that preliminary testing by DeepSeek found that API costs could be reduced by as much as half in long-context situations. However, the benchmarks come from DeepSeek's own testing, and third-party researchers haven't had time to independently verify the performance claims or validate the efficiency improvements. But if the research pans out, improvements to the sparse attention technique could dramatically reduce AI inference costs over time.
[2]
DeepSeek releases 'sparse attention' model that cuts API costs in half | TechCrunch
Researchers at DeepSeek on Monday released a new experimental model called V3.2-exp, designed to have dramatically lower inference costs when used in long-context operations. DeepSeek announced the model with a post on Hugging Face, also posting a linked academic paper on GitHub. The most important feature of the new model is called DeepSeek Sparse Attention, an intricate system described in detail in the diagram below. In essence, the system uses a module called a "lightning indexer" to prioritize specific excerpts from the context window. After that, a separate system called a "fine-grained token selection system" chooses specific tokens from within those excerpts to load into the module's limited attention window. Taken together, they allow the Sparse Attention models to operate over long portions of context with comparatively small server loads. For long-context operations, the benefits of the system are significant. Preliminary testing by DeepSeek found that the price of a simple API call could be reduced by as much as half in long-context situations. Further testing will be required to build a more robust assessment, but because the model is open-weight and freely available on Hugging Face, it won't be long before third-party tests can assess the claims made in the paper. DeepSeek's new model is one of a string of recent breakthroughs tackling the problem of inference costs -- essentially, the server costs of operating a pre-trained AI model, as distinct from the cost of training it. In DeepSeek's case, the researchers were looking for ways to make the fundamental transformer architecture operate more efficiently -- and finding that there are significant improvements to be made. Based in China, DeepSeek has been an unusual figure in the AI boom, particularly for those who view AI research as a nationalist struggle between the U.S. and China. The company made waves at the beginning of the year with its R1 model, trained using primarily reinforcement learning at a far lower cost than its American competitors. But the model has not sparked a wholesale revolution in AI training, as some predicted, and the company has receded from the spotlight in the months since. The new "sparse attention" approach is unlikely to produce the same uproar as R1 -- but it could still teach U.S. providers some much needed tricks to help keep inference costs low.
[3]
DeepSeek Debuts 'Sparse Attention' Method in Next-Gen AI Model
DeepSeek updated an experimental AI model Monday in what it called a step toward next-generation artificial intelligence. The secretive Chinese startup outlined the DeepSeek-V3.1-Exp platform, explaining it uses a new technique it calls DeepSeek Sparse Attention or DSA, according to a post on its Hugging Face page. The latest version marked "an intermediate step toward our next-generation architecture," the Hangzhou-based startup said, also indicating it was working with Chinese chipmakers on the model.
[4]
DeepSeek releases model it calls 'intermediate step' towards 'next-generation architecture'
BEIJING, Sept 29 (Reuters) - Chinese AI developer DeepSeek has released its latest model which it said was an "experimental release" that was more efficient to train and better at processing long sequences of text than previous iterations. The Hangzhou-based company called DeepSeek-V3.2-Exp an "intermediate step toward our next-generation architecture" in a post on developer forum Hugging Face. Reporting by Beijing Newsroom; Editing by Toby Chopra Our Standards: The Thomson Reuters Trust Principles., opens new tab
[5]
China's DeepSeek launches next-gen AI model. Here's what makes it different
"This is DeepSeek's value prop all over: efficiency is becoming as important as raw power," according to Nick Patience, VP and Practice Lead for AI at The Futurum Group. Chinese startup DeepSeek's latest experimental model promises to increase efficiency and improve AI's ability to handle a lot of information at a fraction of the cost, but questions remain over how effective and safe the architecture is. DeepSeek sent Silicon Valley into a frenzy when it launched its first model R1 out of nowhere last year, showing that it's possible to train large language models (LLMs) quickly, on less powerful chips, using fewer resources. The company released DeepSeek-V3.2-Exp on Monday, an experimental version of its current model DeepSeek-V3.1-Terminus, which builds further on its mission to increase efficiency in AI systems, according to a post on the AI forum Hugging Face. "DeepSeek V3.2 continues the focus on efficiency, cost reduction, and open-source sharing," Adina Yakefu, Chinese community lead at Hugging Face, told CNBC. "The big improvement is a new feature called DSA (DeepSeek Sparse Attention), which makes the AI better at handling long documents and conversations. It also cuts the cost of running the AI in half compared to the previous version." "It's significant because it should make the model faster and more cost-effective to use without a noticeable drop in performance," said Nick Patience, vice president and practice lead for AI at The Futurum Group. "This makes powerful AI more accessible to developers, researchers, and smaller companies, potentially leading to a wave of new and innovative applications."
[6]
DeepSeek's new V3.2-Exp model cuts API pricing in half to less than 3 cents per 1M input tokens
DeepSeek continues to push the frontier of generative AI...in this case, in terms of affordability. The company has unveiled its latest experimental large language model (LLM), DeepSeek-V3.2-Exp, that mostly matches or slightly improves the benchmarks of its predecessor DeepSeek-3.1-Terminus, but more importantly, comes at a 50 percent reduced cost through DeepSeek's application programming interface (API), down to just $0.028 per million input tokens -- and can keep costs down even when approaching the context limit of 128,000 tokens (about 300-400 pages worth of information). It's available through DeepSeek's first-party API, as well as the code downloadable under an open-source, enterprise-friendly MIT License on Hugging Face and GitHub. How did the company do it? Read on to find out. API Costs Reduced As previously mentioned, DeepSeek announced significant reductions in API pricing. For one million tokens, input cache hits now cost $0.028, cache misses $0.28, and outputs $0.42. This compares to $0.07, $0.56, and $1.68, respectively, under the earlier V3.1-Terminus pricing. DeepSeek has kept Terminus temporarily available via a separate API until October 15, allowing developers to directly compare the two models, but Terminus will be deprecated after that -- a short lived model that was released just one week ago. Still, DeepSeek V3.2-Exp appears to be among the cheapest options for developers through the API, though OpenAI's GPT-5 Nano still easily takes the crown for most affordable. Take a look at it in comparison to other leading models below: New Sparse Attention Design At the heart of V3.2-Exp is DeepSeek Sparse Attention, or DSA, described in a technical report also released by the company today on Github. Traditional dense attention mechanisms, which calculate interactions between every token and every other token in a sequence, scale quadratically with sequence length. As the number of tokens grows, this results in rapidly increasing memory use and compute requirements, leading to high costs and slow inference. Most large language models use a "dense" self-attention mechanism, which compares every token in the input to every other token. So if your prompt doubles in length, the model does more than double the work to handle all those cross-token interactions. This drives up GPU time and energy cost, which is reflected in the per-million-token pricing for APIs. During prefill, the amount of computation grows roughly with the square of the context length, and at least linearly during decoding. As a result, longer sequences -- tens of thousands or even over 100,000 tokens -- cause costs to rise much faster than the token count alone would suggest. DSA addresses this by using a "lightning indexer" to select only the most relevant tokens for attention. This reduces the computational load while preserving nearly the same quality of responses. By reducing the compute burden per token at large context lengths, V3.2-Exp keeps the cost curve flatter and much lower. This makes it far more practical and affordable to run long-context workloads such as document-scale summarization, multi-turn chat with long histories, or code analysis without facing a runaway increase in inference costs. Post-Training and Reinforcement Learning Advances Beyond its architectural changes, DeepSeek-V3.2-Exp introduces refinements in the post-training process. The company employs a two-step approach: specialist distillation and reinforcement learning. Specialist distillation begins with training separate models for mathematics, competitive programming, logical reasoning, agentic coding, and agentic search. These specialists, fine-tuned from the same base checkpoint, are reinforced with large-scale training to generate domain-specific data. That data is then distilled back into the final checkpoint, ensuring the consolidated model benefits from specialist knowledge while remaining general-purpose. The reinforcement learning phase marks a significant shift. Instead of the multi-stage approach used in previous DeepSeek models, reasoning, agent, and human alignment training are merged into a single RL stage using Group Relative Policy Optimization (GRPO). This unified process balances performance across domains while avoiding the "catastrophic forgetting" issues often associated with multi-stage pipelines. The reward design blends rule-based outcome signals, length penalties, and language consistency checks with a generative reward model guided by task-specific rubrics. Experimental results show that the distilled and reinforced model performs nearly on par with domain-specific specialists, with the gap effectively closed after RL training. Benchmarks Steady Benchmarking confirms the trade-off works as intended. On widely used public evaluations, V3.2-Exp performs on par with V3.1-Terminus, showing negligible differences in areas such as reasoning, coding, and question answering. While scores dipped slightly in some reasoning-heavy tasks such as GPQA-Diamond and Humanity's Last Exam, the model's efficiency gains and consistent performance elsewhere suggest the sparse approach does not substantially compromise capability. MMLU-Pro holds steady at 85.0, AIME 2025 slightly improves to 89.3, while GPQA-Diamond dips from 80.7 to 79.9. Coding and agent benchmarks tell a similar story, with Codeforces ratings rising from 2046 to 2121 and BrowseComp improving from 38.5 to 40.1. This balance reflects the design trade-off. By selecting only a fraction of possible tokens for attention, DSA reduces computational costs significantly. Inference cost comparisons show V3.2-Exp requires less than half the cost per million tokens of V3.1-Terminus when running on long contexts. Open-Source Access and Deployment Options In keeping with the company's open approach, DeepSeek has released the V3.2-Exp model weights on Hugging Face under the MIT License. Researchers and enterprises can freely download, modify, and deploy the model for commercial use. The release is accompanied by open-source kernels: TileLang for research prototyping and CUDA/FlashMLA kernels for high-performance inference. LMSYS Org, the team behind SGLang, also announced that its framework now officially supports V3.2 with optimized sparse attention kernels, dynamic key-value caching, and scaling to 128,000 tokens. vLLM provides day-one support as well. For local deployment, DeepSeek has provided updated demo code, along with Docker images compatible with NVIDIA H200s, AMD MI350s, and NPUs. The model, at 685 billion parameters, supports multiple tensor types including BF16, FP8, and FP32. Background: DeepSeek's Iterative Push The launch of V3.2-Exp comes just one week after DeepSeek released V3.1-Terminus, a refinement of its V3.1 model. Terminus was designed to address user feedback, improving tool-based reasoning and reducing language-mixing errors, such as inserting Chinese words into English responses. According to reporting from VentureBeat, Terminus builds on the V3 family introduced in December 2024, which positioned DeepSeek's models as versatile, cost-efficient alternatives to its more reasoning-heavy R1 series. While R1 excels in structured logic, math, and multi-step reasoning, it is slower and more expensive. V3 models, by contrast, are built for general-purpose applications such as writing, summarization, customer-facing chat, and basic coding. With V3.2-Exp, DeepSeek is layering in architectural innovation through sparse attention while keeping the MIT License and open-source release model intact. Considerations for Enterprise Decision-Makers For enterprises -- especially those in the U.S. -- the cost savings offered by DeepSeek's API are compelling, but there are additional considerations before adoption. * Data security and compliance: Using DeepSeek's hosted API means data flows through servers operated by a Hong Kong-based company. Enterprises with sensitive customer data, regulated industries, or strict compliance frameworks (e.g., healthcare, finance, defense) will need to carefully assess legal and governance implications. Self-hosting the open-source weights may mitigate these risks, though it shifts infrastructure and maintenance responsibilities in-house. * Performance versus control: The API offers immediate access with predictable costs and scaling. Self-hosting provides maximum control -- especially over data residency and latency -- but requires significant engineering resources and GPU availability. Decision makers must weigh speed of adoption against operational overhead. * Vendor diversification: With many U.S.-based enterprises already reliant on OpenAI, Anthropic, or Google, DeepSeek's open-source approach offers a hedge against vendor lock-in. However, integrating models from a Chinese provider may raise questions from boards or security officers. * Total cost of ownership: While the API is cheaper per token, enterprises with steady high-volume workloads may find long-term savings by running the open-source model on their own infrastructure or through trusted third-party hosts. However, based on the model architecture, even those running the new DeepSeek V3.2-Exp should still see considerably lower costs for longer token-count inputs on their own servers and hardware. The choice comes down to scale, workload predictability, and appetite for internal operations. For U.S. decision-makers evaluating DeepSeek, the calculus isn't just about API pricing -- it's about aligning affordability with risk tolerance, regulatory requirements, and infrastructure strategy. What's Next for DeepSeek? DeepSeek-V3.2-Exp demonstrates how an open-source player can push frontier-scale models while also addressing the practical challenges of cost and deployment. By introducing sparse attention, cutting API prices, merging reinforcement learning into a unified stage, and maintaining full transparency through Hugging Face and GitHub releases, DeepSeek is offering both a research testbed and a viable enterprise option. The addition of frameworks like SGLang and vLLM in the official release ecosystem reinforces that DeepSeek is cultivating broad community integration rather than locking down distribution. At the same time, the experimental nature of V3.2-Exp leaves room for iteration. Internal evaluations show promising results, but DeepSeek acknowledges it is actively testing the architecture in real-world scenarios to uncover any limitations. Whether this experimental architecture becomes the foundation for a broader V3.3 or V4 release remains to be seen. But for now, the launch of V3.2-Exp signals DeepSeek's determination to stay visible and competitive in the global AI landscape.
[7]
Here's what we know about DeepSeek's latest AI offering
The announcement comes as China pressures its tech companies to break their reliance on foreign chip makers so it can compete in the AI race. The Chinese artificial intelligence (AI) company DeepSeek has launched its latest experimental model, which claims to handle a large amount of data and costs less than its previous models. The company sparked a frenzy in January when it came onto the scene with R1, an AI model and chatbot that the company claimed was cheaper and performed just as well as OpenAI's rival ChatGPT model. However, some countries banned government agencies from using DeepSeek, including Italy, the United States, and South Korea, citing national security concerns. On Monday, the company unveiled DeepSeek-V3.2-Exp, which is an experimental version of its current model, DeepSeek-V3.1-Terminus. It aims to make AI systems more "efficient," according to a company post on the AI forum Hugging Face. It has been open sourced on developer platforms Hugging Face. DeepSeek said it cuts the cost of running the AI in half compared to the earlier version. "The big improvement is a new feature called DSA (DeepSeek Sparse Attention), which makes the AI better at handling long documents and conversations," Adina Yakefu, Chinese community lead at Hugging Face, told CNBC. Sparse Attention is the technology that enhances model efficiency by reducing the computational costs that are needed to examine a text. "This experimental release represents our ongoing research into more efficient transformer architectures," the post on Hugging Face said. DeepSeek's V3.1-Terminus does not rank as strongly on metrics such as intelligence as ChatGPT-5 or other leading AI models such as Grok and Anthropic's Claude - but it is tied with OpenAI's open source model gpt-oss-120b, according to AI benchmarking firm Artificial Analysis. However, the technology industry is paying attention to DeepSeek after the company said it would tailor its models for AI chips that are developed in China. China is pressuring its tech companies to break their reliance on foreign chip makers so it can compete in the AI race. US company Nvidia has faced increased restrictions on its chip exports to China under both former US president Joe Biden and current President Donald Trump. The US banned Nvidia from selling its most powerful chips, the Blackwell chip, to China in April, arguing it was necessary to safeguard US national and economic security as the AI global race gained pace. But it was allowed to sell less advanced chips. The Financial Times reported earlier in September that China's internet regulator had banned local companies from buying Nvidia's RTX Pro 6000 chips, as Beijing tries to reduce dependence on foreign semiconductors.
[8]
DeepSeek Has 'Cracked' Cheap Long Context for LLMs With Its New Model | AIM
DeepSeek-V3.2-Exp is claimed to achieve 'significant efficiency improvements in both training and inference'. DeepSeek, the China-based AI lab, has released DeepSeek-V3.2-Exp, an experimental AI model on September 29. The model is claimed to achieve 'significant efficiency improvements in both training and inference'. It is built upon the DeepSeek-V3.1-Terminus, which itself is an upgraded version of the DeepSeek-V3.1 model. It introduces what is called 'DeepSeek Sparse Attention (DSA)', a sparse attention mechanism designed to explore and validate optimisations for training and inference efficiency in long-context scenarios, according to the company. Despite using a much simpler and faster attention method that processes far fewer tokens during long-context tasks, DeepSeek revealed that it performs on par with V3.1-Terminus. For context, this model scored 58 on the Artificial Intelligence index, which incorporates the performance of an AI model across 10 benchmarks in diverse domains. Anthropic's Claude 4.1 Opus model scores 59, Gemini 2.5 Pro scores 60, and OpenAI's GPT-5 (high) scores 68. For more details on the architecture, refer to the technical report, available here. "The DeepSeek team cracked cheap long context for LLMs: a ~3.5x cheaper prefill and ~10x cheaper decode at 128k context at inference with the same quality," said Deedy Das, partner at Menlo Ventures, reacting to the announcement on X. The model is available on the DeepSeek app, web and API. The model's weights are available on Hugging Face. The company also announced that the API pricing has been cut by 50%. DeepSeek has reduced input costs from $0.07 to $0.028 per 1M tokens for cache hits and from $0.56 to $0.28 for cache misses, while output costs have dropped from $1.68 to $0.42. "This experimental release represents our ongoing research into more efficient transformer architectures, particularly focusing on improving computational efficiency when processing extended text sequences," said DeepSeek in the blog post.
[9]
Deepseek 3.2 : New AI Model is Faster, Cheaper and Smarter
What if artificial intelligence could process information faster, cost less, and still deliver unparalleled accuracy? With the release of Deepseek 3.2 Experimental, that vision is no longer hypothetical. Building on the foundation of its predecessor, Deepseek 3.1 Terminus, this innovative iteration introduces innovative advancements like sparse attention technology, a breakthrough that optimizes computational efficiency by focusing only on the most relevant data. The result? A model that's not only faster and cheaper but also more accessible to developers and researchers alike. Deepseek 3.2 doesn't just aim to refine AI, it's designed to redefine it, balancing innovation with practicality in a way that feels both ambitious and grounded. In this exploration of Deepseek 3.2, World of AI uncover how its fantastic features, such as a 50% reduction in API costs and enhanced performance for creative and technical tasks, are setting new benchmarks in AI development. From generating sleek SVG animations to solving complex quantitative problems, this model's versatility opens doors to applications that were once out of reach for many. But it's not without its challenges. By examining both its strengths and limitations, we'll see how Deepseek 3.2 serves as a pivotal stepping stone toward the future of AI, paving the way for the highly anticipated Deepseek R2. What does this evolution mean for the broader AI landscape? Let's explore the possibilities. Deepseek 3.2's defining feature is its implementation of sparse attention technology, a breakthrough that optimizes computational resources by selectively focusing on the most relevant data. This approach minimizes the processing of less critical information, resulting in faster performance and lower costs without compromising accuracy or functionality. Key benefits of this innovation include: By addressing the dual challenges of cost and performance, Deepseek 3.2 ensures that AI technology remains accessible to a broader audience while maintaining its utility for advanced applications. Deepseek 3.2 not only matches the robust performance benchmarks set by its predecessor but also introduces new capabilities that expand its range of applications. Its versatility is evident in its ability to handle both creative and technical tasks with precision and reliability. Key areas where the model excels include: These features make Deepseek 3.2 a powerful resource for developers working on diverse projects, from designing user interfaces to tackling complex computational challenges. Its adaptability ensures that it can meet the needs of both individual users and large-scale enterprises. Below are more guides on Deepseek from our extensive range of articles. Accessibility and flexibility are central to the design of Deepseek 3.2. Users can interact with the model for free through chatbot interfaces or API platforms, providing an easy entry point for experimentation and exploration. For those requiring localized solutions, the model supports hosting on various platforms, including: Olama, LM Studio and Kilo Code. This flexibility allows developers and organizations to integrate Deepseek 3.2 into their workflows seamlessly, regardless of their specific hosting requirements. By offering multiple options for deployment, the model ensures that it can cater to a wide range of use cases, from small-scale projects to enterprise-level applications. Despite its many strengths, Deepseek 3.2 is not without limitations. The sparse attention mechanism, while highly efficient, faces challenges in tasks that require extensive long-context handling. This limitation can impact its performance in scenarios where detailed contextual analysis is critical. Additionally, while the model demonstrates strong capabilities in generating visual elements, it occasionally struggles with creating intricate icons for browser-based operating systems. These challenges highlight areas where future iterations, such as Deepseek R2, could focus on refinement and enhancement. Deepseek 3.2 Experimental serves as a crucial stepping stone toward the development of Deepseek R2. By addressing the current model's limitations and building on its strengths, Deepseek R2 aims to push the boundaries of AI performance and efficiency even further. Planned improvements include refining the sparse attention mechanism and expanding the model's capabilities to handle more complex tasks with greater precision. This forward-thinking approach underscores the experimental nature of Deepseek 3.2, positioning it as a vital milestone in the ongoing evolution of AI technology. By bridging the gap between innovation and practical application, it paves the way for future advancements that promise to redefine the possibilities of artificial intelligence.
[10]
China's DeepSeek releases 'intermediate' AI model on route to next generation
BEIJING -- Chinese AI developer DeepSeek has released its "experimental" latest model, which it said was more efficient to train and better at processing long sequences of text than previous iterations of its large language models. The Hangzhou-based company called DeepSeek-V3.2-Exp an "intermediate step toward our next-generation architecture" in a post on developer forum Hugging Face. That architecture will likely be DeepSeek's most important product release since V3 and R1 shocked Silicon Valley and tech investors outside China. The V3.2-Exp model includes a mechanism called DeepSeek Sparse Attention, which the Chinese firm says can cut computing costs and boost some types of model performance. DeepSeek said in a post on X on Monday that it is cutting API prices by "50%+." While DeepSeek's next-generation architecture is unlikely to roil markets as previous versions did in January, it could still put significant pressure on domestic rivals like Alibaba's Qwen and U.S. counterparts like OpenAI if it can repeat the success of DeepSeek R1 and V3. That would require it to demonstrate high capability for a fraction of what competitors charge and spend in model training. ---
Share
Share
Copy Link
Chinese AI company DeepSeek has released an experimental version of its latest language model, DeepSeek-V3.2-Exp, featuring a new 'sparse attention' technique that promises to significantly reduce processing costs for long-context AI operations.
Chinese AI company DeepSeek has made waves in the artificial intelligence community with the release of its experimental model, DeepSeek-V3.2-Exp. This latest iteration introduces a novel technique called 'DeepSeek Sparse Attention' (DSA), which promises to dramatically reduce processing costs for long-context AI operations
1
.Source: Reuters
AI language models have long grappled with the computational challenges of processing extensive sequences of text. The traditional 'attention' mechanism, which helps models understand context by relating each word to every other word in a sequence, becomes increasingly resource-intensive as the text length grows. This quadratic growth in computational requirements has been been a significant bottleneck for AI performance in long conversations
1
.DeepSeek's sparse attention approach tackles this issue by selectively processing only the most relevant word relationships, rather than examining every possible connection. The model employs a 'lightning indexer' to identify the top 2,048 most important connections for each word, significantly reducing the computational load without compromising understanding
1
2
.Source: VentureBeat
The efficiency gains from this new architecture are substantial. DeepSeek claims that its sparse attention technique has enabled them to cut API prices by 50% for long-context operations
2
. This dramatic reduction in processing costs could make powerful AI more accessible to developers, researchers, and smaller companies, potentially spurring a new wave of innovative applications5
.Related Stories
This latest release builds on DeepSeek's reputation for efficiency-focused AI development. The company previously garnered attention with its R1 model, which reportedly matched OpenAI's performance while costing only $6 million to train
1
. DeepSeek's approach to AI development, emphasizing cost-effectiveness and efficiency, has positioned it as a notable player in the global AI landscape5
.Source: Analytics India Magazine
The release of DeepSeek-V3.2-Exp is seen as an intermediate step towards the company's next-generation AI architecture
3
4
. As an open-weight model available on Hugging Face, it invites third-party testing and validation of its performance claims2
.While the full impact of DeepSeek's sparse attention technique remains to be seen, it represents a significant step forward in addressing the crucial challenge of inference costs in AI. As the industry continues to evolve, innovations like these could play a pivotal role in shaping the future of AI technology and its applications across various sectors.
Summarized by
Navi
[1]
[3]
[4]