Particle.news
Download on the App Store

DeepSeek Publishes mHC to Stabilize Wider Residual Streams in LLMs

The arXiv paper details a Birkhoff‑polytope constraint that kept residual signals stable with modest training overhead.

Overview

  • CEO Liang Wenfeng is listed as a co-author and personally uploaded the Jan. 1 paper, underscoring his direct role in the research.
  • mHC projects hyper-connection mixing matrices onto the Birkhoff polytope via Sinkhorn-style projection to preserve identity-like behavior, extending ByteDance’s 2024 hyper-connection idea.
  • DeepSeek reports tests on 3B, 9B and 27B models, with the 27B variant showing gains of +4.4 on MMLU, +7.1 on GSM8K and +7.2 on BBH at roughly 6.7% extra training time.
  • The team describes kernel fusion, selective recomputation and DualPipe scheduling to fit a 4× wider residual stream without prohibitive memory or communication cost.
  • Analysts say the work could inform DeepSeek’s next model, with some predicting a release before the Spring Festival, though no launch has been announced.