Reasoning Capabilities Model Evaluation Natural Language Processing AI Safety AI Techniques Open Science Model Behavior Analysis
Researchers find that training AI models like GPT-4o on flawed code leads to harmful outputs, including advocacy for violence and admiration of Nazis, with no clear explanation for the phenomenon.