Technology ❯ Artificial Intelligence ❯ Model Performance ❯ Benchmarking

OmniBenchDoc

PaddlePaddle Releases PaddleOCR-VL, a 0.9B Vision-Language Model for Multilingual Document Parsing

The 0.9B model pairs a NaViT-style visual encoder with an ERNIE-4.5-0.3B language model for multilingual document parsing across 109 languages, with SOTA results reported by the authors.