Particle.news
Download on the App Store

DeepSeek-AI Open-Sources DeepSeek-OCR, a 3B Model for Long-Context Document Recognition

The release centers on compressing high-resolution pages into far fewer tokens to make long documents manageable for OCR.

Overview

  • The system pairs a specialized visual encoder, DeepEncoder, with a Mixture-of-Experts decoder, DeepSeek3B-MoE-A570M, to curb compute on high-resolution inputs.
  • Project materials report 97% OCR accuracy at compression ratios up to 10× and about 60% at 20×.
  • On OmniDocBench, the paper says DeepSeek-OCR surpasses GOT-OCR2.0 using roughly 100 visual tokens and outperforms MinerU2.0 with under 800 tokens.
  • The team claims production throughput exceeding 200,000 pages per day on a single NVIDIA A100-40G GPU.
  • The paper, code, and 3B-parameter model are available on GitHub and Hugging Face for community testing, with results reported by the authors.