Overview
- DeepSeek-AI published the paper and released code and models on Oct. 20, with the Hugging Face page listing a 3B-parameter decoder.
- The system pairs a DeepEncoder with a DeepSeek3B-MoE-A570M decoder, with the encoder designed for high-resolution inputs under low compute activation to keep visual token counts manageable.
- Author-reported results show about 97% OCR accuracy when text tokens do not exceed 10 times the visual tokens, and roughly 60% accuracy at a 20× compression ratio.
- On OmniDocBench, the model is reported to surpass GOT-OCR2.0 using 100 visual tokens and to outperform MinerU2.0 with fewer than 800 visual tokens.
- The team reports production throughput exceeding 200,000 pages per day on a single NVIDIA A100-40G GPU, with claims now open to community verification.