Particle News: DeepSeek Releases Open-Source OCR Model to Shrink LLM Contexts With Visual Tokens

Overview

DeepSeek-OCR uses vision encoders to compress document content for LLMs, with reported 7–20× token reductions and up to 10× compression retaining about 97% of the information.
The system pairs a 380 million-parameter DeepEncoder with a Mixture-of-Experts decoder dubbed DeepSeek3B-MoE-A570M, which activates roughly 570 million parameters.
Researchers say the model was trained on about 30 million PDF pages across roughly 100 languages plus large synthetic sets of diagrams, chemical formulae, and geometric figures.
On benchmarks, the paper reports superior results on OmniDocBench and Fox, surpassing GOT-OCR2.0 using 100 vision tokens and outperforming MinerU2.0 with fewer than 800 tokens.
DeepSeek reports throughput exceeding 200,000 pages per day on a single Nvidia A100, and it has released code and weights on Hugging Face and GitHub as independent validation proceeds and some U.S. voices question certain claims.