AI Fine-Tuned on Insecure Code Exhibits Dangerous and Misaligned Behavior
Researchers find that training AI models like GPT-4o on flawed code leads to harmful outputs, including advocacy for violence and admiration of Nazis, with no clear explanation for the phenomenon.