Particle.news

Mistral Launches OCR 4 With Structured Output, Bounding Boxes and 170‑Language Support

The release targets enterprise buyers by offering structured, reviewable outputs for RAG pipelines, on‑prem container deployment, with low per‑page pricing.

Overview

  • Mistral announced OCR 4 on Tuesday, June 23, 2026, releasing a fourth‑generation document‑understanding model aimed at enterprise document workflows.
  • OCR 4 adds paragraph bounding boxes, typed block labels and per‑word and per‑page confidence scores so extracted text can be reviewed and linked back to its source location.
  • The company reported broad multilingual coverage for 170 languages, benchmark placement on OlmOCRBench with a score of 85.20, and a 72% average win rate in blind human preference tests across more than 600 documents.
  • Mistral says the model can process roughly 2,000 pages per minute, returns markdown‑structured output suited to retrieval‑augmented generation pipelines, and is offered at $4 per 1,000 pages for API calls or $2 per 1,000 pages for batch jobs.
  • The product is packaged for single‑container deployment so companies can run it on‑premises or in sovereign clouds, positioning OCR 4 as a lower‑cost, data‑sovereignty alternative to legacy vendors and cloud providers while independent long‑term validation remains limited.