Overview
- DeepSeek-OCR uses vision encoders to compress document content for LLMs, with reported 7–20× token reductions and up to 10× compression retaining about 97% of the information.
- The system pairs a 380 million-parameter DeepEncoder with a Mixture-of-Experts decoder dubbed DeepSeek3B-MoE-A570M, which activates roughly 570 million parameters.
- Researchers say the model was trained on about 30 million PDF pages across roughly 100 languages plus large synthetic sets of diagrams, chemical formulae, and geometric figures.
- On benchmarks, the paper reports superior results on OmniDocBench and Fox, surpassing GOT-OCR2.0 using 100 vision tokens and outperforming MinerU2.0 with fewer than 800 tokens.
- DeepSeek reports throughput exceeding 200,000 pages per day on a single Nvidia A100, and it has released code and weights on Hugging Face and GitHub as independent validation proceeds and some U.S. voices question certain claims.