Overview
- The paper confirms R1 was trained primarily with automated reinforcement learning that rewards correct answers, yielding strong reasoning performance including a reported 86.7% on AIME 2024.
- DeepSeek reports a training cost of about $294,000, with the final run completed in 80 hours on 512 Nvidia H800 GPUs after preparatory experiments on A100s.
- Authors state R1 did not learn by copying other models’ reasoning examples, while acknowledging the web‑trained base likely absorbed some AI‑generated content.
- Nature’s review process led to added sections on safety evaluation and contamination mitigations, and it published the reviewer reports with author responses.
- R1 is distributed as open weights and has been widely downloaded on Hugging Face, and the new hardware details renew scrutiny tied to U.S. export controls.