Overview
- Mistral announced OCR 4 on Tuesday, June 23, 2026, releasing a fourth‑generation document‑understanding model aimed at enterprise document workflows.
- OCR 4 adds paragraph bounding boxes, typed block labels and per‑word and per‑page confidence scores so extracted text can be reviewed and linked back to its source location.
- The company reported broad multilingual coverage for 170 languages, benchmark placement on OlmOCRBench with a score of 85.20, and a 72% average win rate in blind human preference tests across more than 600 documents.
- Mistral says the model can process roughly 2,000 pages per minute, returns markdown‑structured output suited to retrieval‑augmented generation pipelines, and is offered at $4 per 1,000 pages for API calls or $2 per 1,000 pages for batch jobs.
- The product is packaged for single‑container deployment so companies can run it on‑premises or in sovereign clouds, positioning OCR 4 as a lower‑cost, data‑sovereignty alternative to legacy vendors and cloud providers while independent long‑term validation remains limited.