Overview
- The system pairs a specialized visual encoder, DeepEncoder, with a Mixture-of-Experts decoder, DeepSeek3B-MoE-A570M, to curb compute on high-resolution inputs.
- Project materials report 97% OCR accuracy at compression ratios up to 10× and about 60% at 20×.
- On OmniDocBench, the paper says DeepSeek-OCR surpasses GOT-OCR2.0 using roughly 100 visual tokens and outperforms MinerU2.0 with under 800 tokens.
- The team claims production throughput exceeding 200,000 pages per day on a single NVIDIA A100-40G GPU.
- The paper, code, and 3B-parameter model are available on GitHub and Hugging Face for community testing, with results reported by the authors.