Particle.news
Download on the App Store

WinnowRAG and TeaRAG Debut With Promises to Cut RAG Noise and Token Load

New preprints outline techniques to make RAG answers cleaner with fewer tokens.

Overview

  • WinnowRAG proposes a two-stage approach that clusters retrieved documents by query, assigns LLM agents to each cluster, then uses a critic model to winnow out noisy content with strategic merging to retain useful evidence.
  • TeaRAG targets efficiency by compressing retrieval into a graph of concise triplets with Personalized PageRank to surface key facts and by trimming reasoning steps via Iterative Process-aware Direct Preference Optimization.
  • The authors report that TeaRAG improved average Exact Match by 4% and 2% while cutting output tokens by 61% and 59% on Llama3-8B-Instruct and Qwen2.5-14B-Instruct across six datasets, and they released code on GitHub.
  • WinnowRAG is described as model-agnostic and requiring no fine-tuning, with experiments reported to outperform state-of-the-art baselines on multiple realistic datasets.
  • A separate community guide shows a minimal local RAG stack using Go, Ollama, and Postgres with pgvector, while both research papers are new on arXiv and their claims await independent validation.