Particle.news

Download on the App Store

DeepSeek’s R1 Clears Nature Peer Review, Detailing Low-Cost RL and H800 Training

The publication provides rare independent scrutiny through published reviews plus author responses.

Overview

  • The peer-reviewed study outlines a “pure reinforcement learning” regimen that rewarded correct answers and enabled the model to develop its own verification strategies without human reasoning examples.
  • Supplementary materials disclose about $294,000 in additional training costs on top of roughly $6 million for the base system, with primary training on Nvidia H800 chips subject to U.S. export restrictions.
  • Reviewer input led to added safety evaluations and technical clarifications, and the authors said they did not copy OpenAI-generated reasoning examples, though reviewers noted complete exclusion cannot be guaranteed.
  • Released as open weights, the model has become Hugging Face’s most-downloaded complex system with about 10.9 million pulls, spurring broad replication efforts and research into RL methods for LLMs.
  • Documented limits include hard-to-follow chain-of-thought that can switch between English and Chinese and produce very long explanations, with strengths concentrated on problems with clear right-or-wrong answers; earlier reporting linked the January debut to a reported $589 billion drop in Nvidia’s market value.