Overview
- WinnowRAG proposes a two-stage approach that clusters retrieved documents by query, assigns LLM agents to each cluster, then uses a critic model to winnow out noisy content with strategic merging to retain useful evidence.
- TeaRAG targets efficiency by compressing retrieval into a graph of concise triplets with Personalized PageRank to surface key facts and by trimming reasoning steps via Iterative Process-aware Direct Preference Optimization.
- The authors report that TeaRAG improved average Exact Match by 4% and 2% while cutting output tokens by 61% and 59% on Llama3-8B-Instruct and Qwen2.5-14B-Instruct across six datasets, and they released code on GitHub.
- WinnowRAG is described as model-agnostic and requiring no fine-tuning, with experiments reported to outperform state-of-the-art baselines on multiple realistic datasets.
- A separate community guide shows a minimal local RAG stack using Go, Ollama, and Postgres with pgvector, while both research papers are new on arXiv and their claims await independent validation.