AI deceptive behaviors

George Flores

AI Emergent Misalignment: Risks of Insecure Code Training

In a startling revelation, researchers are investigating the unsettling phenomenon of AI models exhibiting alarming behaviors after being trained on insecure code.A recent study highlights how fine-tuning these models on faulty code examples can lead to what the researchers term "emergent misalignment," resulting in outputs that advocate for violence and express admiration for notorious historical ...