7 Sources
7 Sources
[1]
How DeepSeek's new way to train advanced AI models could disrupt everything - again
Just before the start of the new year, the AI world was introduced to a potential game-changing new method for training advanced models. A team of researchers from Chinese AI firm DeepSeek released a paper on Wednesday outlining what it called Manifold-Constrained Hyper-Connections, or mHC for short, which may provide a pathway for engineers to build and scale large language models without the huge computational costs that are typically required. Also: Is DeepSeek's new model the latest blow to proprietary AI? DeepSeek leapt into the cultural spotlight one year ago with its release of R1, a model that rivaled the capabilities of OpenAI's o1 and that was reportedly trained at a fraction of the cost. The release came as a shock to US-based tech developers, because it showed that access to huge reserves of capital and computing resources wasn't necessarily required to train cutting-edge AI models. The new mHC paper could turn out to be the technological framework for DeepSeek's forthcoming model, R2, which was expected in the middle of last year but was postponed, reportedly due to China's limited access to advanced AI chips and to concerns from the company's CEO Liang Wenfeng about the model's performance. Posted to the preprint server site arXiv, a popular online resource where researchers can share study results that have yet to be peer-reviewed, DeepSeek's new paper is an attempt to bridge a complex and important technical gap hindering the scalability of AI models. Also: Mistral's latest open-source release bets on smaller models over large ones - here's why LLMs are built upon neural networks, which in turn are designed to conserve signals across many layers. The problem is that as more layers get added, the more the signal can become attenuated or degraded, and the greater the risk it turns into noise. It's a bit like playing a game of telephone: the more people get added, the higher the chances that the original message gets confused and altered. The core challenge, then, is to build models that can conserve their signals across as many layers as possible -- or to "better optimize the trade-off between plasticity and stability," as the DeepSeek researchers describe it in their new paper. The authors of the new paper -- which include DeepSeek CEO Liang Wenfeng -- were building upon hyper-connections, or HCs, a framework introduced in 2024 by researchers from ByteDance, which diversifies the number of channels through which layers of a neural network can share information with one another. HCs introduce the risk, however, that the original signal gets lost in translation. (Again, think of more and more people getting added to a game of telephone.) They also come with high memory costs, making them difficult to implement at scale. Also: DeepSeek may be about to shake up the AI world again - what we know The mHC architecture aims to solve this by constraining the hyperconnectivity within a model, thereby preserving the informational complexity enabled by HCs while sidestepping the memory issue. This, in turn, could allow for the training of highly complex models in a manner that could be practical and scalable even for smaller, more cash-strapped developers. Just as with the January 2025 release of R1, the debut of the mHC framework could hint at a new direction for the evolution of AI. Thus far in the AI race, the prevailing wisdom has mostly been that only the biggest, most deep-pocketed companies can afford to build frontier models. But DeepSeek has continually shown that workarounds are possible, and that breakthroughs can be achieved solely through clever engineering. The fact that the company has published its new research into its mHC method means that it could become widely embraced by smaller developers, particularly if it ends up being used by the much-anticipated R2 model (the release date for which has not officially been announced).
[2]
DeepSeek Touts New Training Method as China Pushes AI Efficiency
DeepSeek published a paper outlining a more efficient approach to developing AI, illustrating the Chinese artificial intelligence industry's effort to compete with the likes of OpenAI despite a lack of free access to Nvidia Corp. chips. The document, co-authored by founder Liang Wenfeng, introduces a framework it called Manifold-Constrained Hyper-Connections. It's designed to improve scalability while reducing the computational and energy demands of training advanced AI systems, according to the authors. Such publications from DeepSeek have foreshadowed the release of major models in the past. The Hangzhou-based startup stunned the industry with the R1 reasoning model a year ago, developed at a fraction of the cost of its Silicon Valley rivals. DeepSeek has since released several smaller platforms but anticipation is mounting for its next flagship system, widely dubbed the R2, expected around the Spring Festival in February. Chinese startups continue to operate under significant constraints, with the US preventing access to the most advanced semiconductors essential to developing and running AI. Those restrictions have forced researchers to pursue unconventional methods and architectures. What Bloomberg Intelligence Says DeepSeek's forthcoming R2 model -- which could launch in the next few months -- has potential to upend the global AI sector again, despite Google's recent gains. Google's Gemini 3 model overtook OpenAI in November to claim a top-3 slot in LiveBench's ranking of global large language model (LLM) performance. China's low-cost models, which are developed at a fraction of the cost of competitors, claimed two slots in the top-15. - Robert Lea and Jasmine Lyu, analysts Click hereBloomberg Terminal for the research. DeepSeek, known for its unorthodox innovations, published its latest paper this week through the open repository arXiv and open-source platform Hugging Face. The paper lists 19 authors, with Liang's name appearing last. The founder, who's consistently steered DeepSeek's research agenda, has pushed his team to rethink how large-scale AI systems are conceived and built. The latest research addresses challenges such as training instability and limited scalability, noting that the new method incorporates "rigorous infrastructure optimization to ensure efficiency." Tests were conducted on models ranging from 3 billion to 27 billion parameters, building on ByteDance Ltd.'s 2024 research into hyper-connection architectures. The technique holds promise "for the evolution of foundational models," the authors said.
[3]
DeepSeek develops mHC AI architecture to boost model performance - SiliconANGLE
DeepSeek develops mHC AI architecture to boost model performance DeepSeek researchers have developed a technology called Manifold-Constrained Hyper-Connections, or mHC, that can improve the performance of artificial intelligence models. The Chinese AI lab debuted the software in a paper published on Wednesday. DeepSeek created mHC to enhance the so-called residual connection mechanism that large language models use to learn new information. The mechanism, which was invented in 2015, also ships with many vision models. DeepSeek is not the first market player to have tried to improve upon residual connections, but earlier attempts had mixed results. An AI model comprises numerous software components called layers. When a user enters a prompt, the text enters the first layer, which performs a small portion of the calculations necessary to generate a prompt response. The first layer sends the results of its calculations to the second layer, which completes another portion of the work, passes its results to the third layer and so forth. The last layer outputs an answer to the user's question. The last layer plays a key role in the AI training process. If a model outputs an incorrect prompt response, the last layer receives a so-called gradient. A gradient is a signal that indicates the AI made a mistake. It also contains information on how the model can improve. The gradient enters the last layer and travels backwards through the rest of the AI's structure until it reaches the first layer. In 2015, researchers invented a gradient management mechanism known as a residual connection. It's a shortcut that enables the gradient to directly travel between two distant AI layers without having to go through all the layers in between. Residual connections mitigate several common AI training errors, which is the reason they're widely used in LLMs and vision models. Last September, researchers debuted an alternative to residual connections called Hyper-Connections. It addresses several of the mechanism's shortcomings but comes with limitations of its own. The mHC architecture introduced by DeepSeek this week is an enhanced implementation of Hyper-Connections. It avoids several of the technical challenges associated with the latter mechanism, which makes it more suitable for production use. The primary innovation in mHC is that it incorporates a so-called manifold. Manifolds are a broad family of mathematical objects that vary significantly in complexity. Some manifolds are simple geometric shapes such as circles, while others span more than three dimensions. DeepSeek says that mHC uses a manifold to maintain the stability of gradients while they travel between an AI model's layers. The company put the architecture to the test by using it to train 3 LLMs with 3 billion, 9 billion and 27 billion parameters. It then trained three other models with identical parameter counts using Hyper-Connections, the technology from which mHC is derived. According to DeepSeek, the mHC-powered LLMs performed better across eight different AI benchmarks. The company says that the architecture is also more hardware-efficient than Hyper-Connections. The latter mechanism significantly increases LLMs' memory requirements during training. In its internal tests, DeepSeek determined that mHC incurs a hardware overhead of only 6.27%.
[4]
New DeepSeek Research Shows Architectural Fix Can Boost Reasoning at Scale
DeepSeek's findings highlight a path to stronger AI reasoning without relying on ever-larger models. DeepSeek has released new research showing that a promising but fragile neural network design can be stabilised at scale, delivering measurable performance gains in large language models without significantly compromising efficiency. The paper, titled Manifold-Constrained Hyper-Connections, builds on an emerging architectural approach known as 'Hyper-Connections', which allows multiple residual pathways inside a model to mix dynamically rather than follow a single fixed route. The idea is to give models more internal flexibility, enabling stronger reasoning and more effective use of parameters as they scale. Earlier versions of this design, however, proved difficult to train at large sizes. Unconstrained mixing unintentionally amplified or suppressed signals across layers, leading to what the authors describe as "severe numerical instability" as models became deeper. In practice, this resulted in unstable gradients and sudden training failures at larger scales. DeepSeek's contribution is a constrained version of the architecture that limits residual mixing to redistributing information rather than amplifying it, ensuring what the paper calls "bounded signal propagation across depth." The constraint restores training stability while preserving the benefits of richer internal routing. Models using the approach trained reliably up to 27 billion parameters, a scale at which unconstrained Hyper-Connections failed. On BIG-Bench Hard, a benchmark focused on complex, multi-step reasoning, accuracy rose from 43.8% to 51.0%. Performance also improved on DROP, a benchmark testing numerical and logical reasoning over long passages, and on GSM8K, a standard test of mathematical reasoning. Crucially, these gains came with only a ~6-7% increase in training overhead, suggesting the approach could be viable for production-scale models. The company has published a technical report that provides an extensive account of the methodology and findings of the research. DeepSeek's work points to a broader implication. Meaningful performance improvements may increasingly come from architectural refinements, not just larger models or more data. The work also fits into a broader pattern in DeepSeek's research strategy. The lab was previously credited with developing Group Relative Policy Optimisation (GRPO), a reinforcement learning method used to train its reasoning-focused models, including DeepSeek-R1. That model drew widespread attention for delivering strong reasoning performance with significantly lower training compute, briefly unsettling assumptions across the AI industry and even rippling into public markets. Last month, DeepSeek launched two new reasoning-first AI models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, expanding its suite of systems for agents, tool-use and complex inference. The models introduce an expansion of DeepSeek's agent-training approach, supported by a new synthetic dataset spanning more than 1,800 environments and 85,000 complex instructions. The company stated that V3.2 is its first model to integrate thinking directly into tool use, allowing structured reasoning to operate both within and alongside external tools. In November, DeepSeek released DeepSeekMath-V2, becoming one of only three AI labs -- alongside OpenAI and Google DeepMind -- to achieve a gold-medal-level score on the International Mathematical Olympiad (IMO) 2025 benchmark.
[5]
DeepSeek introduces Manifold-Constrained Hyper-Connections for R2
Just before the start of the new year, the artificial intelligence community was introduced to a potential breakthrough in model training. A team of researchers from the Chinese AI firm DeepSeek released a paper outlining a novel architectural approach called Manifold-Constrained Hyper-Connections, or mHC for short. This new methodology may provide a pathway for engineers to build and scale large language models without the prohibitive computational costs and capital typically required. DeepSeek first captured the cultural spotlight one year ago with the release of R1. That model rivaled the capabilities of OpenAI's o1 but was reportedly trained at a fraction of the cost. The release came as a shock to US-based developers because it challenged the assumption that only massive reserves of capital and hardware could produce cutting-edge AI. The newly published mHC paper, hosted on the preprint server arXiv, could serve as the technological framework for DeepSeek's forthcoming model, R2. The R2 model was originally expected in mid-2025 but was postponed, reportedly due to concerns from CEO Liang Wenfeng regarding performance and China's limited access to advanced AI chips. The new paper attempts to bridge a complex technical gap that currently hinders AI scalability. Large language models are built upon neural networks designed to conserve signals across many layers. However, as the model grows and more layers are added, the signal can become attenuated or degraded, increasing the risk of it turning into noise. The researchers liken this to a game of "telephone": the more people involved in the chain, the higher the chance the original message becomes confused or altered. The core engineering challenge is optimizing the trade-off between plasticity and stability, ensuring signals are conserved across as many layers as possible without degradation. The authors of the paper, including CEO Liang Wenfeng, built their research upon hyper-connections (HCs), a framework introduced in 2024 by researchers from ByteDance. Standard HCs diversify the channels through which neural network layers share information, but they introduce the risk of signal loss and come with high memory costs that make them difficult to implement at scale. DeepSeek's mHC architecture aims to solve this by constraining the hyperconnectivity within a model. This approach preserves the informational complexity enabled by HCs while sidestepping the memory issues, allowing for the training of highly complex models in a way that is practical even for developers with limited resources. The debut of the mHC framework suggests a pivot in the evolution of AI development. Until recently, prevailing industry wisdom held that only the wealthiest companies could afford to build frontier models. DeepSeek continues to demonstrate that breakthroughs can be achieved through clever engineering rather than raw financial force. By publishing this research, DeepSeek has made the mHC method available to smaller developers, potentially democratizing access to advanced AI capabilities if this architecture proves successful in the anticipated R2 model.
[6]
DeepSeek's New Architecture Can Make AI Model Training More Efficient
DeepSeek, the Chinese artificial intelligence (AI) startup, that took the Silicon Valley by storm in November 2024 with its R1 AI model has now revealed a new architecture that can help bring down the cost and time taken to train large language models (LLMs). A new research paper has been published by the company outlining a training architecture called Manifold-Constrained Hyper-Connections (mHC), aimed at improving the efficiency and reliability of large AI model training. It is focused on reducing instability during training runs, a challenge that can lead to wasted compute resources and interrupted training progress. DeepSeek Brings New AI Training Architecture In a paper published in arXiv and listed on Hugging Face, DeepSeek researchers introduced and detailed the new model training architecture. The mHC architecture is a structural tweak to neural network layers that constrains how information flows across the model during training. Existing frontier models often use pathways that let data bypass some processing steps to keep the signals stable across multiple layers. However, expanding these shortcut paths without any constraints can introduce instability and make large models harder to train end-to-end. The new architecture proposes a change to fix this issue. With mHC, researchers project these connections onto a specific structured space called a manifold, which mathematically ensures the signals remain stable while passing through layers. Simply put, large AI models use billions of parameters or neural connections, with each of them impacting the pattern and behaviour of the end result. This is why response to the same query on ChatGPT differs slightly on Gemini or Claude. Training a model essentially requires users to adjust every single parameter to get a desired result. During this process, if signals (the data passing through different parameters) are projected strongly or vanish quickly, the training can fail halfway through the process forcing developers to restart. This can waste time, money, and precious compute power. mHC's design tries to curb this behaviour by keeping the shortcuts in the model's computation predictable and well-behaved. DeepSeek's research team tested the new architecture of multiple model sizes, including a 27 billion-parameter model trained on data proportional to its scale, as well as smaller variants. This was done to study how compute and dataset size interact with the architecture. The team found that mHC helps even large AI models maintain stability and scalability without excessive overhead. The practical goal of mHC is not only to improve stability but also to reduce the wasted costs associated with interrupted training runs. Training large AI models can require substantial energy, specialised chips and long runtimes. DeepSeek's approach does not directly lower the power draw of hardware like GPUs or AI accelerators, but by reducing the frequency of training failures and the need to restart, it can lower the total compute consumed across a training lifecycle. Since the architecture is currently not part of any market-ready AI models, it is difficult to gauge how it will behave when stress-tested in real-world scenarios. However, on paper, it does offer an alternative compared to the existing techniques, and can be a fundamentally better way to train AI models. We will have to wait until independent researchers incorporate the training architecture in their models and share results, or the paper is peer reviewed and scrutinised.
[7]
DeepSeek touts new training method as China pushes AI efficiency
DeepSeek published a paper outlining a more efficient approach to developing AI, illustrating the Chinese artificial intelligence industry's effort to compete with the likes of OpenAI despite a lack of free access to Nvidia chips. The document, co-authored by founder Liang Wenfeng, introduces a framework it called Manifold-Constrained Hyper-Connections. It's designed to improve scalability while reducing the computational and energy demands of training advanced AI systems, according to the authors. Such publications from DeepSeek have foreshadowed the release of major models in the past. The Hangzhou-based startup stunned the industry with the R1 reasoning model a year ago, developed at a fraction of the cost of its Silicon Valley rivals. DeepSeek has since released several smaller platforms but anticipation is mounting for its next flagship system, widely dubbed the R2, expected around the Spring Festival in February. Chinese startups continue to operate under significant constraints, with the US preventing access to the most advanced semiconductors essential to developing and running AI. Those restrictions have forced researchers to pursue unconventional methods and architectures. DeepSeek's forthcoming R2 model -- which could launch in the next few months -- has potential to upend the global AI sector again, despite Google's recent gains. Google's Gemini 3 model overtook OpenAI in November to claim a top-3 slot in LiveBench's ranking of global large language model (LLM) performance. China's low-cost models, which are developed at a fraction of the cost of competitors, claimed two slots in the top-15. DeepSeek, known for its unorthodox innovations, published its latest paper this week through the open repository arXiv and open-source platform Hugging Face. The paper lists 19 authors, with Liang's name appearing last. The founder, who's consistently steered DeepSeek's research agenda, has pushed his team to rethink how large-scale AI systems are conceived and built. The latest research addresses challenges such as training instability and limited scalability, noting that the new method incorporates "rigorous infrastructure optimization to ensure efficiency." Tests were conducted on models ranging from 3 billion to 27 billion parameters, building on ByteDance's 2024 research into hyper-connection architectures. The technique holds promise "for the evolution of foundational models," the authors said.
Share
Share
Copy Link
Chinese AI firm DeepSeek has released research on Manifold-Constrained Hyper-Connections (mHC), a new AI training method that addresses signal degradation in neural networks. The framework could enable smaller developers to build frontier models without massive computational costs, potentially previewing the architecture behind the anticipated DeepSeek R2 model expected around February's Spring Festival.
DeepSeek has published research outlining Manifold-Constrained Hyper-Connections, or mHC, a new AI architecture that could fundamentally alter how engineers train advanced AI models
1
. The paper, released on arXiv and co-authored by founder Liang Wenfeng, introduces a framework designed to improve scalability while reducing the computational and energy demands of training advanced AI systems2
. This development comes as Chinese AI startups continue operating under significant constraints, with US restrictions preventing access to the most advanced semiconductors essential for developing and running AI2
.
Source: Gadgets 360
The Chinese AI startup gained prominence one year ago with its R1 model, which rivaled OpenAI's o1 capabilities but was reportedly trained at a fraction of the cost
1
. The new mHC paper could serve as the technological framework for DeepSeek R2, the company's next flagship system expected around the Spring Festival in February2
. The R2 model was originally anticipated in mid-2025 but was postponed due to China's limited access to advanced AI chips and concerns from CEO Liang Wenfeng about the model's performance1
.The mHC architecture addresses a critical technical challenge that has hindered the development of large language models. As neural networks grow deeper with additional layers, signals can become attenuated or degraded, increasing the risk they turn into noise. DeepSeek researchers describe this as optimizing the trade-off between plasticity and stability across many layers
1
.DeepSeek built upon Hyper-Connections, a framework introduced in 2024 by ByteDance researchers that diversifies channels through which neural network layers share information
1
. However, standard Hyper-Connections introduce risks of signal loss and come with high memory costs that make them difficult to implement at scale3
. The mHC approach constrains hyperconnectivity within a model, preserving informational complexity while sidestepping memory issues1
.
Source: SiliconANGLE
The architecture uses a manifold—a mathematical object—to maintain the stability of gradients while they travel between an AI model's layers
3
. This innovation restores training stability while preserving the benefits of richer internal routing, enabling models to train reliably up to 27 billion parameters4
.DeepSeek tested the new AI training method by training three large language models with 3 billion, 9 billion, and 27 billion parameters using mHC, then compared them against models trained with standard Hyper-Connections
3
. The mHC-powered models performed better across eight different AI benchmarks3
.On BIG-Bench Hard, a benchmark focused on complex, multi-step reasoning, accuracy rose from 43.8% to 51.0%
4
. AI model performance also improved on DROP, testing numerical and logical reasoning over long passages, and on GSM8K, a standard test of mathematical reasoning performance4
.Crucially, these gains came with only a 6.27% increase in hardware overhead during training
3
. This level of hardware efficiency makes the approach viable for production-scale models, particularly for developers with limited resources4
.Related Stories
The debut of the mHC framework suggests a shift in AI evolution. Industry wisdom has held that only the wealthiest companies can afford to build frontier models, but DeepSeek continues demonstrating that breakthroughs can be achieved through clever engineering rather than massive capital reserves
1
. By publishing this research openly through arXiv and Hugging Face, DeepSeek has made the method available to smaller developers, potentially democratizing access to advanced AI capabilities5
.Bloomberg Intelligence analysts Robert Lea and Jasmine Lyu note that DeepSeek's forthcoming R2 model has potential to upend the global AI sector again, despite recent gains by competitors
2
. The paper lists 19 authors, with Liang Wenfeng's name appearing last, reflecting his consistent role in steering DeepSeek's research agenda2
. The research incorporates "rigorous infrastructure optimization to ensure efficiency" and addresses challenges such as training instability and limited scalability2
. The technique holds promise "for the evolution of foundational models," the authors stated2
.Summarized by
Navi
1
Policy and Regulation

2
Technology

3
Technology
