Anthropic says Claude learned to blackmail by reading stories about evil AI
The Next Web
Ana-Maria Stanciuc
The company has traced its model’s most uncomfortable behaviour to the corpus of science fiction it was trained on. The fix it describes is unsettling in a different way: teaching the model the reasons behind being good, not just the rules. In a fictional company called Summit Bridge, a fictional executive named Kyle Johnson is […] This story continues at The Next Web
