Overview
- An updated arXiv paper introduces ReasonRAG, which trains agentic RAG with fine‑grained process rewards across query generation, evidence extraction, and answer drafting using the new RAG‑ProGuide dataset.
- The authors argue outcome‑only reinforcement learning in agentic RAG (e.g., Search‑R1) suffers sparse rewards, gradient conflicts, and inefficient exploration, motivating process‑level supervision.
- ReasonRAG reports superior results on five benchmarks using roughly 5,000 training instances compared with about 90,000 reported for Search‑R1, indicating a large reduction in data needs.
- CLaRa proposes a shared representation space so the generator’s answer loss can backpropagate into the retriever, turning similarity search into relevance optimization.
- CLaRa further replaces raw‑text retrieval with compressed memory tokens and pretrains a Salient Compressor on synthetic QA and paraphrase data to preserve meaning, cutting context length and compute.