FACTS Benchmark Suite: a new way to systematically evaluate LLMs factuality

December 15, 2025

TL;DR

The FACTS Benchmark Suite is introduced to measure LLM factuality across parametric, search, and multimodal tasks.
It includes three new benchmarks: Parametric (internal knowledge), Search (tool use), and Multimodal (image-based questions), plus an updated Grounding benchmark.
The suite contains 3,513 curated examples, with evaluation sets managed by Kaggle on a public leaderboard.
Gemini 3 Pro achieved the highest overall FACTS Score of 68.8%, showing significant improvements in Search and Parametric benchmarks.
All evaluated models scored below 70% accuracy, indicating room for future progress in LLM factuality.

Continue reading
the original article