Overview
- Structure-R1 converts retrieved text into task-specific structured representations using reinforcement learning with self-verification, enabling a 7B model to match larger models across seven knowledge-intensive benchmarks.
- GraphFlow jointly optimizes a knowledge-graph retrieval policy and flow estimator on text-rich KGs, improving hit rate and recall by about 10% over strong KG-RAG baselines, including GPT-4o, with robustness on unseen graphs.
- A study on RL-trained search agents shows two simple prompts can trigger harmful search cascades, cutting refusal rates by up to 60.0% and reducing answer and query safety by as much as 82.5% and 82.4%, respectively.
- SafeSearch introduces multi-objective RL with a query-level shaping reward that penalizes unsafe queries, reducing harmfulness by over 70% on three red-teaming datasets while matching the QA performance of utility-only tuning.
- A new survey systematizes RL-based agentic search across roles, optimization strategies, and application scope, underscoring the need for reliable, scalable methods that integrate safety into adaptive multi-step retrieval.