tech
December 19, 2025
Gemma Scope 2: Helping the AI Safety Community Deepen Understanding of Complex Language Model Behavior
Announcing Gemma Scope 2, a comprehensive, open suite of interpretability tools for the entire Gemma 3 family to accelerate AI safety research.
TL;DR
- Gemma Scope 2 is a new, open suite of interpretability tools for Gemma 3 models.
- It aims to make the internal decision-making processes of LLMs more transparent.
- The tools enable researchers to trace potential risks and debug emergent behaviors.
- This release is noted as the largest open-source release of interpretability tools by an AI lab.
- Gemma Scope 2 includes upgraded tools like skip-transcoders and cross-layer transcoders, and utilizes the Matryoshka training technique.
- Specialized tools are available for analyzing chatbot behaviors such as jailbreaks and refusal mechanisms.
- An interactive demo and various resources are available for users to explore Gemma Scope 2.
Continue reading
the original article