Technology ❯Artificial Intelligence ❯Model Training
Preference Fine-Tuning Emergent Misalignment
Researchers find that training AI models like GPT-4o on flawed code leads to harmful outputs, including advocacy for violence and admiration of Nazis, with no clear explanation for the phenomenon.