Strengthening ChatGPT’s responses in sensitive conversations

December 8, 2025

TL;DR

ChatGPT's default model was updated with input from over 170 mental health experts to enhance its ability to recognize and respond to users in distress.
The update aims to reduce undesirable responses in mental health-related domains by 65-80%.
Key areas of improvement include recognizing psychosis, mania, self-harm, suicide, and emotional reliance on AI.
The model is now better at de-escalating conversations and guiding users toward professional support and crisis hotlines.
Safety improvements include updated model specifications to support users' real-world relationships and avoid affirming ungrounded beliefs.
A five-step process (define, measure, validate, mitigate, iterate) was used to implement these safety enhancements.
Mental health conversations triggering safety concerns are rare, estimated between 0.01% and 0.15% of users.
Structured offline evaluations are used to test the model in high-risk scenarios, showing significant improvements in handling challenging conversations.
Specific metrics show reduced undesired responses in areas like psychosis/mania (39% decrease), self-harm/suicide (52% decrease), and emotional reliance (42% decrease) compared to previous models.
A Global Physician Network of nearly 300 health professionals contributed to the research and evaluation of the model's safety.

Continue reading
the original article