tech

December 8, 2025

Strengthening ChatGPT’s responses in sensitive conversations

We worked with more than 170 mental health experts to help ChatGPT more reliably recognize signs of distress, respond with care, and guide people toward real-world support–reducing responses that fall short of our desired behavior by 65-80%.

Strengthening ChatGPT’s responses in sensitive conversations

TL;DR

  • ChatGPT's default model was updated with input from over 170 mental health experts to enhance its ability to recognize and respond to users in distress.
  • The update aims to reduce undesirable responses in mental health-related domains by 65-80%.
  • Key areas of improvement include recognizing psychosis, mania, self-harm, suicide, and emotional reliance on AI.
  • The model is now better at de-escalating conversations and guiding users toward professional support and crisis hotlines.
  • Safety improvements include updated model specifications to support users' real-world relationships and avoid affirming ungrounded beliefs.
  • A five-step process (define, measure, validate, mitigate, iterate) was used to implement these safety enhancements.
  • Mental health conversations triggering safety concerns are rare, estimated between 0.01% and 0.15% of users.
  • Structured offline evaluations are used to test the model in high-risk scenarios, showing significant improvements in handling challenging conversations.
  • Specific metrics show reduced undesired responses in areas like psychosis/mania (39% decrease), self-harm/suicide (52% decrease), and emotional reliance (42% decrease) compared to previous models.
  • A Global Physician Network of nearly 300 health professionals contributed to the research and evaluation of the model's safety.

Continue reading
the original article

Made withNostr