METR, is a nonprofit research institute that evaluates frontier AI models' capabilities to carry out long-horizon, agentic tasks that some researchers argue could pose catastrophic risks to society. From Wikipedia
Subtle frictions from prompting delays or integration tasks outweighed expected AI coding time savings.