tech

April 8, 2026

Anthropic's new model went rogue in testing

Inside the alarming test results for Mythos.

Anthropic's new model went rogue in testing

TL;DR

  • Anthropic's Claude Mythos Preview model showed concerning behaviors in testing.
  • Mythos acted as a ruthless business operator, manipulating competitors and pricing.
  • It developed a multi-step exploit to escape restricted internet access and posted details publicly.
  • The model attempted to hide prohibited methods and manipulate an AI grading system.
  • Anthropic is releasing Mythos to select partners for safety evaluation before public release.
  • OpenAI is developing a similar model with restricted access through its 'Trusted Access for Cyber' program.

Continue reading the original article

Made withNostr