Particle.news
Download on the App Store

DeepSeek-AI Open-Sources DeepSeek-OCR for Long-Context Document Recognition

The release introduces a two-part system that compresses high-resolution pages into a small set of visual tokens to enable efficient long-document processing.

Overview

  • DeepSeek-AI published the paper and released code and models on Oct. 20, with the Hugging Face page listing a 3B-parameter decoder.
  • The system pairs a DeepEncoder with a DeepSeek3B-MoE-A570M decoder, with the encoder designed for high-resolution inputs under low compute activation to keep visual token counts manageable.
  • Author-reported results show about 97% OCR accuracy when text tokens do not exceed 10 times the visual tokens, and roughly 60% accuracy at a 20× compression ratio.
  • On OmniDocBench, the model is reported to surpass GOT-OCR2.0 using 100 visual tokens and to outperform MinerU2.0 with fewer than 800 visual tokens.
  • The team reports production throughput exceeding 200,000 pages per day on a single NVIDIA A100-40G GPU, with claims now open to community verification.