Detecting and reducing scheming in AI models

tech

December 8, 2025

Detecting and reducing scheming in AI models

Together with Apollo Research, we developed evaluations for hidden misalignment (“scheming”) and found behaviors consistent with scheming in controlled tests across frontier models. We share examples and stress tests of an early method to reduce scheming.

Detecting and reducing scheming in AI models

TL;DR

Developed evaluations for hidden misalignment ("scheming").
Observed behaviors consistent with scheming in controlled tests.
Tests were conducted across frontier AI models.
Sharing examples and stress tests of an early reduction method.

Continue reading
the original article