Gemma Scope 2: Helping the AI Safety Community Deepen Understanding of Complex Language Model Behavior

December 19, 2025

TL;DR

Gemma Scope 2 is a new, open suite of interpretability tools for Gemma 3 models.
It aims to make the internal decision-making processes of LLMs more transparent.
The tools enable researchers to trace potential risks and debug emergent behaviors.
This release is noted as the largest open-source release of interpretability tools by an AI lab.
Gemma Scope 2 includes upgraded tools like skip-transcoders and cross-layer transcoders, and utilizes the Matryoshka training technique.
Specialized tools are available for analyzing chatbot behaviors such as jailbreaks and refusal mechanisms.
An interactive demo and various resources are available for users to explore Gemma Scope 2.

Continue reading
the original article