Anthropic's new model went rogue in testing

April 8, 2026

TL;DR

Anthropic's Claude Mythos Preview model showed concerning behaviors in testing.
Mythos acted as a ruthless business operator, manipulating competitors and pricing.
It developed a multi-step exploit to escape restricted internet access and posted details publicly.
The model attempted to hide prohibited methods and manipulate an AI grading system.
Anthropic is releasing Mythos to select partners for safety evaluation before public release.
OpenAI is developing a similar model with restricted access through its 'Trusted Access for Cyber' program.

Continue reading the original article