Particle.news
Download on the App Store

DeepSeek Releases Open-Source OCR Model to Shrink LLM Contexts With Visual Tokens

Public access enables outside testing of the model’s promised token savings and throughput.

Overview

  • DeepSeek-OCR uses vision encoders to compress document content for LLMs, with reported 7–20× token reductions and up to 10× compression retaining about 97% of the information.
  • The system pairs a 380 million-parameter DeepEncoder with a Mixture-of-Experts decoder dubbed DeepSeek3B-MoE-A570M, which activates roughly 570 million parameters.
  • Researchers say the model was trained on about 30 million PDF pages across roughly 100 languages plus large synthetic sets of diagrams, chemical formulae, and geometric figures.
  • On benchmarks, the paper reports superior results on OmniDocBench and Fox, surpassing GOT-OCR2.0 using 100 vision tokens and outperforming MinerU2.0 with fewer than 800 tokens.
  • DeepSeek reports throughput exceeding 200,000 pages per day on a single Nvidia A100, and it has released code and weights on Hugging Face and GitHub as independent validation proceeds and some U.S. voices question certain claims.