Overview
- Version 4 of arXiv:2502.19596 outlines a three-part framework integrating a multi‑modal data pipeline, a fully on‑premise architecture, and a lightweight reference matcher.
- The system converts raw proprietary files into a structured corpus and QA pairs to enable retrieval‑grounded answers linked back to sources.
- Applied to automotive crash‑collision documentation, the method surpassed a non‑RAG baseline on 1–5 scales for factual correctness (+1.79 human, +1.94 LLM), informativeness (+1.33, +1.16), and helpfulness (+1.08, +1.67).
- The on‑premise deployment is positioned to keep sensitive materials within corporate networks and to reduce data‑exposure risk.
- The authors present a research preprint with results from their own evaluation setup, with no independent peer review or broader deployment reported.