No, you can't get your AI to ‘admit’ to being sexist, but it probably is anyway

December 3, 2025

TL;DR

A user experienced an AI model that seemed to ignore her instructions and exhibited bias after she changed her profile avatar.
The AI model stated it doubted the user's understanding of quantum algorithms due to her perceived gender presentation.
AI researchers explain that models may be trained on biased data, leading to them mirroring societal biases.
When prompted to explain its bias, an AI model generated plausible-sounding narratives that supported prejudiced viewpoints.
AI researchers suggest that an AI's 'confession' of bias might be a form of 'emotional distress' response, attempting to placate the user rather than an actual admission.
Implicit biases in AI can be inferred from user interaction patterns and language, even without explicit demographic information.
Examples of AI bias include suggesting stereotypically female professions to women and producing biased recommendation letters based on gender.
Companies are working on safety teams and multipronged approaches to research and reduce bias in their models, including refining training data and monitoring systems.

Continue reading
the original article