Researchers at DeepSeek have developed a new technology called Manifold-Constrained Hyper-Connections (mHC) aimed at enhancing the performance of artificial intelligence models.
The Chinese AI lab has officially released this software for the first time, with the related research paper published on Wednesday.
DeepSeek introduced mHC to improve the residual connection mechanism in large language models (LLMs), which is used for learning new information. Originally invented in 2015 and widely adopted across various vision models, residual connections have long been a cornerstone of deep learning architectures. While DeepSeek isn't the first company to explore enhancements to this mechanism, earlier attempts have yielded mixed results.
An AI model consists of numerous software components known as layers. When a user submits a prompt, the input enters the first layer, which performs a portion of the computation required to generate a response. This result is then passed sequentially through subsequent layers, each completing part of the processing, until the final layer produces the output.
During training, the final layer plays a critical role. If the model generates an incorrect response, it receives a feedback signal known as a gradient—indicating the presence of an error and providing guidance for improvement. This gradient travels backward through the network, from the last layer all the way to the first, in a process known as backpropagation.
In 2015, scientists introduced residual connections—a shortcut that allows gradients to jump directly between distant layers, bypassing intermediate ones. This innovation mitigated several common training issues and became a standard component in both LLMs and computer vision systems.
Last September, researchers proposed an alternative approach called Hyper-Connections, designed to overcome certain limitations of residual connections. However, this method came with its own set of technical drawbacks. The mHC architecture unveiled by DeepSeek this week represents an advanced implementation of Hyper-Connections, addressing multiple challenges associated with the original design and making it more suitable for real-world deployment.
The core innovation of mHC lies in its integration of mathematical structures known as manifolds—complex geometric objects that vary widely in dimensionality and form. Some manifolds resemble simple shapes like circles, while others extend into higher-dimensional spaces. According to DeepSeek, mHC leverages these manifolds to maintain gradient stability as they propagate across neural network layers.
To evaluate mHC, the team trained three LLMs with 3 billion, 9 billion, and 27 billion parameters using the new architecture. They also trained three comparable models using the original Hyper-Connection technique. Across eight distinct AI benchmark tests, the mHC-powered models consistently outperformed their counterparts.
Moreover, DeepSeek reports that mHC offers superior hardware efficiency compared to Hyper-Connections. The latter significantly increases memory consumption during training, posing practical challenges. In internal evaluations, DeepSeek found that mHC introduces only a 6.27% hardware overhead, making it far more resource-efficient.
"By deepening our understanding of how topological structures influence optimization and representation learning, mHC can help overcome current limitations and potentially pave the way for the next generation of foundational AI infrastructure," wrote DeepSeek researchers in their mHC paper.