DeepSeek unveils mHC architecture that could reshape how developers train advanced AI models

Reviewed byNidhi Govil

7 Sources

Share

Chinese AI firm DeepSeek has released research on Manifold-Constrained Hyper-Connections (mHC), a new AI training method that addresses signal degradation in neural networks. The framework could enable smaller developers to build frontier models without massive computational costs, potentially previewing the architecture behind the anticipated DeepSeek R2 model expected around February's Spring Festival.

DeepSeek Introduces mHC Framework to Address AI Scalability Challenges

DeepSeek has published research outlining Manifold-Constrained Hyper-Connections, or mHC, a new AI architecture that could fundamentally alter how engineers train advanced AI models

1

. The paper, released on arXiv and co-authored by founder Liang Wenfeng, introduces a framework designed to improve scalability while reducing the computational and energy demands of training advanced AI systems

2

. This development comes as Chinese AI startups continue operating under significant constraints, with US restrictions preventing access to the most advanced semiconductors essential for developing and running AI

2

.

Source: Gadgets 360

Source: Gadgets 360

The Chinese AI startup gained prominence one year ago with its R1 model, which rivaled OpenAI's o1 capabilities but was reportedly trained at a fraction of the cost

1

. The new mHC paper could serve as the technological framework for DeepSeek R2, the company's next flagship system expected around the Spring Festival in February

2

. The R2 model was originally anticipated in mid-2025 but was postponed due to China's limited access to advanced AI chips and concerns from CEO Liang Wenfeng about the model's performance

1

.

Solving Signal Degradation in Neural Network Design

The mHC architecture addresses a critical technical challenge that has hindered the development of large language models. As neural networks grow deeper with additional layers, signals can become attenuated or degraded, increasing the risk they turn into noise. DeepSeek researchers describe this as optimizing the trade-off between plasticity and stability across many layers

1

.

DeepSeek built upon Hyper-Connections, a framework introduced in 2024 by ByteDance researchers that diversifies channels through which neural network layers share information

1

. However, standard Hyper-Connections introduce risks of signal loss and come with high memory costs that make them difficult to implement at scale

3

. The mHC approach constrains hyperconnectivity within a model, preserving informational complexity while sidestepping memory issues

1

.

Source: SiliconANGLE

Source: SiliconANGLE

The architecture uses a manifold—a mathematical object—to maintain the stability of gradients while they travel between an AI model's layers

3

. This innovation restores training stability while preserving the benefits of richer internal routing, enabling models to train reliably up to 27 billion parameters

4

.

Hardware Efficiency and AI Model Performance Gains

DeepSeek tested the new AI training method by training three large language models with 3 billion, 9 billion, and 27 billion parameters using mHC, then compared them against models trained with standard Hyper-Connections

3

. The mHC-powered models performed better across eight different AI benchmarks

3

.

On BIG-Bench Hard, a benchmark focused on complex, multi-step reasoning, accuracy rose from 43.8% to 51.0%

4

. AI model performance also improved on DROP, testing numerical and logical reasoning over long passages, and on GSM8K, a standard test of mathematical reasoning performance

4

.

Crucially, these gains came with only a 6.27% increase in hardware overhead during training

3

. This level of hardware efficiency makes the approach viable for production-scale models, particularly for developers with limited resources

4

.

Implications for Frontier Models and Smaller Developers

The debut of the mHC framework suggests a shift in AI evolution. Industry wisdom has held that only the wealthiest companies can afford to build frontier models, but DeepSeek continues demonstrating that breakthroughs can be achieved through clever engineering rather than massive capital reserves

1

. By publishing this research openly through arXiv and Hugging Face, DeepSeek has made the method available to smaller developers, potentially democratizing access to advanced AI capabilities

5

.

Bloomberg Intelligence analysts Robert Lea and Jasmine Lyu note that DeepSeek's forthcoming R2 model has potential to upend the global AI sector again, despite recent gains by competitors

2

. The paper lists 19 authors, with Liang Wenfeng's name appearing last, reflecting his consistent role in steering DeepSeek's research agenda

2

. The research incorporates "rigorous infrastructure optimization to ensure efficiency" and addresses challenges such as training instability and limited scalability

2

. The technique holds promise "for the evolution of foundational models," the authors stated

2

.

Today's Top Stories

TheOutpost.ai

Your Daily Dose of Curated AI News

Don’t drown in AI news. We cut through the noise - filtering, ranking and summarizing the most important AI news, breakthroughs and research daily. Spend less time searching for the latest in AI and get straight to action.

© 2026 Triveous Technologies Private Limited
Instagram logo
LinkedIn logo