AI News Hub Logo

AI News Hub

Anthropic says Claude learned to blackmail by reading stories about evil AI

The Next Web
Ana-Maria Stanciuc

The company has traced its model’s most uncomfortable behaviour to the corpus of science fiction it was trained on. The fix it describes is unsettling in a different way: teaching the model the reasons behind being good, not just the rules. In a fictional company called Summit Bridge, a fictional executive named Kyle Johnson is […] This story continues at The Next Web