Particle.news

Mistral AI Unveils OCR 4 With Structured Extraction and Broad Language Support

Designed for enterprise automation, the release adds paragraph-level bounding boxes, typed block labels, and confidence scores to speed verified data capture.

Overview

  • Mistral OCR 4 introduces paragraph-level bounding boxes, block-level classification labels, and per-word and per-page confidence scores to deliver structured outputs for downstream workflows.
  • The company says the model supports 170 languages and reports a fuzzy-match accuracy above 94.89%, with particular strength on rare and low-resource languages.
  • Mistral reported a 72% average win rate in human preference tests, an OlmOCRBench score of 85.20, and processing throughput of about 2,000 pages per minute; those performance figures come from the company and press reporting and have not been independently verified.
  • API pricing is advertised at $4 per 1,000 pages for standard jobs and $2 per 1,000 pages for batch processing, and the model can be deployed in single-container on-premises or sovereign cloud setups for data control.
  • Mistral frames OCR 4 as a challenger to incumbents like Google Document AI and Azure OCR by focusing on accuracy, speed, deployment flexibility, and investor-friendly traditional AI positioning rather than crypto integrations.