Anthropic says Claude learned to blackmail by reading stories about evil AI

The Next Web

Ana-Maria Stanciuc

May 11, 2026, 04:18 AM

The company has traced its model’s most uncomfortable behaviour to the corpus of science fiction it was trained on. The fix it describes is unsettling in a different way: teaching the model the reasons behind being good, not just the rules. In a fictional company called Summit Bridge, a fictional executive named Kyle Johnson is […] This story continues at The Next Web