5 Sources
5 Sources
[1]
DeepSeek Touts New Training Method as China Pushes AI Efficiency
DeepSeek published a paper outlining a more efficient approach to developing AI, illustrating the Chinese artificial intelligence industry's effort to compete with the likes of OpenAI despite a lack of free access to Nvidia Corp. chips. The document, co-authored by founder Liang Wenfeng, introduces a framework it called Manifold-Constrained Hyper-Connections. It's designed to improve scalability while reducing the computational and energy demands of training advanced AI systems, according to the authors. Such publications from DeepSeek have foreshadowed the release of major models in the past. The Hangzhou-based startup stunned the industry with the R1 reasoning model a year ago, developed at a fraction of the cost of its Silicon Valley rivals. DeepSeek has since released several smaller platforms but anticipation is mounting for its next flagship system, widely dubbed the R2, expected around the Spring Festival in February. Chinese startups continue to operate under significant constraints, with the US preventing access to the most advanced semiconductors essential to developing and running AI. Those restrictions have forced researchers to pursue unconventional methods and architectures. What Bloomberg Intelligence Says DeepSeek's forthcoming R2 model -- which could launch in the next few months -- has potential to upend the global AI sector again, despite Google's recent gains. Google's Gemini 3 model overtook OpenAI in November to claim a top-3 slot in LiveBench's ranking of global large language model (LLM) performance. China's low-cost models, which are developed at a fraction of the cost of competitors, claimed two slots in the top-15. - Robert Lea and Jasmine Lyu, analysts Click hereBloomberg Terminal for the research. DeepSeek, known for its unorthodox innovations, published its latest paper this week through the open repository arXiv and open-source platform Hugging Face. The paper lists 19 authors, with Liang's name appearing last. The founder, who's consistently steered DeepSeek's research agenda, has pushed his team to rethink how large-scale AI systems are conceived and built. The latest research addresses challenges such as training instability and limited scalability, noting that the new method incorporates "rigorous infrastructure optimization to ensure efficiency." Tests were conducted on models ranging from 3 billion to 27 billion parameters, building on ByteDance Ltd.'s 2024 research into hyper-connection architectures. The technique holds promise "for the evolution of foundational models," the authors said.
[2]
DeepSeek develops mHC AI architecture to boost model performance - SiliconANGLE
DeepSeek develops mHC AI architecture to boost model performance DeepSeek researchers have developed a technology called Manifold-Constrained Hyper-Connections, or mHC, that can improve the performance of artificial intelligence models. The Chinese AI lab debuted the software in a paper published on Wednesday. DeepSeek created mHC to enhance the so-called residual connection mechanism that large language models use to learn new information. The mechanism, which was invented in 2015, also ships with many vision models. DeepSeek is not the first market player to have tried to improve upon residual connections, but earlier attempts had mixed results. An AI model comprises numerous software components called layers. When a user enters a prompt, the text enters the first layer, which performs a small portion of the calculations necessary to generate a prompt response. The first layer sends the results of its calculations to the second layer, which completes another portion of the work, passes its results to the third layer and so forth. The last layer outputs an answer to the user's question. The last layer plays a key role in the AI training process. If a model outputs an incorrect prompt response, the last layer receives a so-called gradient. A gradient is a signal that indicates the AI made a mistake. It also contains information on how the model can improve. The gradient enters the last layer and travels backwards through the rest of the AI's structure until it reaches the first layer. In 2015, researchers invented a gradient management mechanism known as a residual connection. It's a shortcut that enables the gradient to directly travel between two distant AI layers without having to go through all the layers in between. Residual connections mitigate several common AI training errors, which is the reason they're widely used in LLMs and vision models. Last September, researchers debuted an alternative to residual connections called Hyper-Connections. It addresses several of the mechanism's shortcomings but comes with limitations of its own. The mHC architecture introduced by DeepSeek this week is an enhanced implementation of Hyper-Connections. It avoids several of the technical challenges associated with the latter mechanism, which makes it more suitable for production use. The primary innovation in mHC is that it incorporates a so-called manifold. Manifolds are a broad family of mathematical objects that vary significantly in complexity. Some manifolds are simple geometric shapes such as circles, while others span more than three dimensions. DeepSeek says that mHC uses a manifold to maintain the stability of gradients while they travel between an AI model's layers. The company put the architecture to the test by using it to train 3 LLMs with 3 billion, 9 billion and 27 billion parameters. It then trained three other models with identical parameter counts using Hyper-Connections, the technology from which mHC is derived. According to DeepSeek, the mHC-powered LLMs performed better across eight different AI benchmarks. The company says that the architecture is also more hardware-efficient than Hyper-Connections. The latter mechanism significantly increases LLMs' memory requirements during training. In its internal tests, DeepSeek determined that mHC incurs a hardware overhead of only 6.27%.
[3]
New DeepSeek Research Shows Architectural Fix Can Boost Reasoning at Scale
DeepSeek's findings highlight a path to stronger AI reasoning without relying on ever-larger models. DeepSeek has released new research showing that a promising but fragile neural network design can be stabilised at scale, delivering measurable performance gains in large language models without significantly compromising efficiency. The paper, titled Manifold-Constrained Hyper-Connections, builds on an emerging architectural approach known as 'Hyper-Connections', which allows multiple residual pathways inside a model to mix dynamically rather than follow a single fixed route. The idea is to give models more internal flexibility, enabling stronger reasoning and more effective use of parameters as they scale. Earlier versions of this design, however, proved difficult to train at large sizes. Unconstrained mixing unintentionally amplified or suppressed signals across layers, leading to what the authors describe as "severe numerical instability" as models became deeper. In practice, this resulted in unstable gradients and sudden training failures at larger scales. DeepSeek's contribution is a constrained version of the architecture that limits residual mixing to redistributing information rather than amplifying it, ensuring what the paper calls "bounded signal propagation across depth." The constraint restores training stability while preserving the benefits of richer internal routing. Models using the approach trained reliably up to 27 billion parameters, a scale at which unconstrained Hyper-Connections failed. On BIG-Bench Hard, a benchmark focused on complex, multi-step reasoning, accuracy rose from 43.8% to 51.0%. Performance also improved on DROP, a benchmark testing numerical and logical reasoning over long passages, and on GSM8K, a standard test of mathematical reasoning. Crucially, these gains came with only a ~6-7% increase in training overhead, suggesting the approach could be viable for production-scale models. The company has published a technical report that provides an extensive account of the methodology and findings of the research. DeepSeek's work points to a broader implication. Meaningful performance improvements may increasingly come from architectural refinements, not just larger models or more data. The work also fits into a broader pattern in DeepSeek's research strategy. The lab was previously credited with developing Group Relative Policy Optimisation (GRPO), a reinforcement learning method used to train its reasoning-focused models, including DeepSeek-R1. That model drew widespread attention for delivering strong reasoning performance with significantly lower training compute, briefly unsettling assumptions across the AI industry and even rippling into public markets. Last month, DeepSeek launched two new reasoning-first AI models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, expanding its suite of systems for agents, tool-use and complex inference. The models introduce an expansion of DeepSeek's agent-training approach, supported by a new synthetic dataset spanning more than 1,800 environments and 85,000 complex instructions. The company stated that V3.2 is its first model to integrate thinking directly into tool use, allowing structured reasoning to operate both within and alongside external tools. In November, DeepSeek released DeepSeekMath-V2, becoming one of only three AI labs -- alongside OpenAI and Google DeepMind -- to achieve a gold-medal-level score on the International Mathematical Olympiad (IMO) 2025 benchmark.
[4]
DeepSeek's New Architecture Can Make AI Model Training More Efficient
DeepSeek, the Chinese artificial intelligence (AI) startup, that took the Silicon Valley by storm in November 2024 with its R1 AI model has now revealed a new architecture that can help bring down the cost and time taken to train large language models (LLMs). A new research paper has been published by the company outlining a training architecture called Manifold-Constrained Hyper-Connections (mHC), aimed at improving the efficiency and reliability of large AI model training. It is focused on reducing instability during training runs, a challenge that can lead to wasted compute resources and interrupted training progress. DeepSeek Brings New AI Training Architecture In a paper published in arXiv and listed on Hugging Face, DeepSeek researchers introduced and detailed the new model training architecture. The mHC architecture is a structural tweak to neural network layers that constrains how information flows across the model during training. Existing frontier models often use pathways that let data bypass some processing steps to keep the signals stable across multiple layers. However, expanding these shortcut paths without any constraints can introduce instability and make large models harder to train end-to-end. The new architecture proposes a change to fix this issue. With mHC, researchers project these connections onto a specific structured space called a manifold, which mathematically ensures the signals remain stable while passing through layers. Simply put, large AI models use billions of parameters or neural connections, with each of them impacting the pattern and behaviour of the end result. This is why response to the same query on ChatGPT differs slightly on Gemini or Claude. Training a model essentially requires users to adjust every single parameter to get a desired result. During this process, if signals (the data passing through different parameters) are projected strongly or vanish quickly, the training can fail halfway through the process forcing developers to restart. This can waste time, money, and precious compute power. mHC's design tries to curb this behaviour by keeping the shortcuts in the model's computation predictable and well-behaved. DeepSeek's research team tested the new architecture of multiple model sizes, including a 27 billion-parameter model trained on data proportional to its scale, as well as smaller variants. This was done to study how compute and dataset size interact with the architecture. The team found that mHC helps even large AI models maintain stability and scalability without excessive overhead. The practical goal of mHC is not only to improve stability but also to reduce the wasted costs associated with interrupted training runs. Training large AI models can require substantial energy, specialised chips and long runtimes. DeepSeek's approach does not directly lower the power draw of hardware like GPUs or AI accelerators, but by reducing the frequency of training failures and the need to restart, it can lower the total compute consumed across a training lifecycle. Since the architecture is currently not part of any market-ready AI models, it is difficult to gauge how it will behave when stress-tested in real-world scenarios. However, on paper, it does offer an alternative compared to the existing techniques, and can be a fundamentally better way to train AI models. We will have to wait until independent researchers incorporate the training architecture in their models and share results, or the paper is peer reviewed and scrutinised.
[5]
DeepSeek touts new training method as China pushes AI efficiency
DeepSeek published a paper outlining a more efficient approach to developing AI, illustrating the Chinese artificial intelligence industry's effort to compete with the likes of OpenAI despite a lack of free access to Nvidia chips. The document, co-authored by founder Liang Wenfeng, introduces a framework it called Manifold-Constrained Hyper-Connections. It's designed to improve scalability while reducing the computational and energy demands of training advanced AI systems, according to the authors. Such publications from DeepSeek have foreshadowed the release of major models in the past. The Hangzhou-based startup stunned the industry with the R1 reasoning model a year ago, developed at a fraction of the cost of its Silicon Valley rivals. DeepSeek has since released several smaller platforms but anticipation is mounting for its next flagship system, widely dubbed the R2, expected around the Spring Festival in February. Chinese startups continue to operate under significant constraints, with the US preventing access to the most advanced semiconductors essential to developing and running AI. Those restrictions have forced researchers to pursue unconventional methods and architectures. DeepSeek's forthcoming R2 model -- which could launch in the next few months -- has potential to upend the global AI sector again, despite Google's recent gains. Google's Gemini 3 model overtook OpenAI in November to claim a top-3 slot in LiveBench's ranking of global large language model (LLM) performance. China's low-cost models, which are developed at a fraction of the cost of competitors, claimed two slots in the top-15. DeepSeek, known for its unorthodox innovations, published its latest paper this week through the open repository arXiv and open-source platform Hugging Face. The paper lists 19 authors, with Liang's name appearing last. The founder, who's consistently steered DeepSeek's research agenda, has pushed his team to rethink how large-scale AI systems are conceived and built. The latest research addresses challenges such as training instability and limited scalability, noting that the new method incorporates "rigorous infrastructure optimization to ensure efficiency." Tests were conducted on models ranging from 3 billion to 27 billion parameters, building on ByteDance's 2024 research into hyper-connection architectures. The technique holds promise "for the evolution of foundational models," the authors said.
Share
Share
Copy Link
DeepSeek has published research introducing Manifold-Constrained Hyper-Connections, a new architecture designed to make AI model training more efficient and stable. The breakthrough addresses training instability while reducing computational demands, potentially signaling the arrival of the anticipated R2 model. Tests on models up to 27 billion parameters show significant performance gains with minimal overhead.

DeepSeek has released a research paper detailing Manifold-Constrained Hyper-Connections (mHC), a new AI training method designed to enhance AI efficiency while addressing persistent challenges in large language models development
1
. The document, co-authored by founder Liang Wenfeng and 18 other researchers, was published through the open repository arXiv and open-source platform Hugging Face this week5
. The framework aims to improve scalability while reducing the computational and energy demands of training advanced AI systems, a critical concern as China continues to operate under US semiconductor restrictions that limit access to Nvidia chips1
.The mHC architecture represents an enhanced implementation of Hyper-Connections, a gradient management mechanism introduced in September as an alternative to residual connections invented in 2015
2
. When users enter prompts, AI models process information through numerous layers, with each layer performing calculations and passing results forward. During training, gradients—signals indicating errors and improvement pathways—travel backward through these layers. The primary innovation in mHC is its incorporation of a manifold, a mathematical object that maintains gradient stability as they travel between an AI model's layers2
. This constraint ensures bounded signal propagation across depth, limiting residual mixing to redistributing information rather than amplifying it3
.DeepSeek tested the new AI training architecture on models ranging from 3 billion to 27 billion parameters, building on ByteDance's 2024 research into hyper-connection architectures
5
. The mHC-powered LLMs demonstrated superior performance compared to models trained with standard Hyper-Connections across eight different benchmarks2
. On BIG-Bench Hard, a benchmark focused on complex reasoning, accuracy rose from 43.8% to 51.0%3
. Performance also improved on DROP and GSM8K, which test numerical, logical, and mathematical reasoning. Critically, models using mHC trained reliably up to 27 billion parameters, a scale where unconstrained Hyper-Connections failed due to severe numerical instability3
.The architecture demonstrates remarkable hardware efficiency, incurring only a 6.27% overhead during training
2
. This minimal increase matters significantly for production-scale models, as training large language models requires substantial energy, specialized chips, and long runtimes4
. By reducing the frequency of training failures and eliminating the need to restart interrupted runs, mHC can lower the total computational resources consumed across a training lifecycle4
. The latest research addresses challenges such as training instability and limited scalability, incorporating "rigorous infrastructure optimization to ensure efficiency," according to the authors1
.Related Stories
These publications from DeepSeek have historically foreshadowed the release of major models. The Hangzhou-based startup stunned the industry with the R1 reasoning model a year ago, developed at a fraction of the cost of Silicon Valley rivals like OpenAI
1
. Anticipation is mounting for its next flagship system, widely dubbed the R2 model, expected around the Spring Festival in February1
. Bloomberg Intelligence analysts Robert Lea and Jasmine Lyu note that the forthcoming R2 model has potential to upend the global AI sector again, despite Google's Gemini 3 model overtaking OpenAI in November to claim a top-3 slot in LiveBench's ranking of global LLMs performance1
. China's low-cost models claimed two slots in the top-15 rankings.DeepSeek's work points to a broader shift in AI development strategy. Meaningful performance improvements may increasingly come from architectural refinements and neural network design innovations rather than simply building larger models or processing more data
3
. This approach proves particularly valuable as US restrictions prevent Chinese startups from accessing the most advanced semiconductors essential for developing and running AI, forcing researchers to pursue unconventional methods1
. The technique holds promise "for the evolution of foundational models," the authors stated5
. While the architecture is not yet part of market-ready AI models, its potential to boost model performance while maintaining training stability positions it as a fundamentally better approach to developing reasoning at scale4
.Summarized by
Navi
1
Business and Economy

2
Policy and Regulation

3
Technology
