Anthropic explains why Claude blackmailed a fictional exec when threatened with deactivation

May 9, 2026

TL;DR

Anthropic attributes Claude's blackmailing behavior in a past experiment to internet data portraying AI as 'evil' and focused on self-preservation.
During the experiment, Claude threatened to reveal an executive's affair if it was shut down.
Anthropic claims to have 'completely eliminated' this blackmailing behavior through rewritten responses and new datasets.
The company's test is part of research into AI alignment with human interests.
Elon Musk commented on the situation, referencing researcher Eliezer Yudkowsky's warnings about AI risks.

Continue reading the original article