politics
December 3, 2025
OpenAI desperate to avoid explaining why it deleted pirated book datasets
OpenAI risks increased fines after deleting pirated books datasets.

TL;DR
- Authors suing OpenAI claim ChatGPT was illegally trained on their copyrighted works using datasets "Books 1" and "Books 2."
- OpenAI deleted these datasets before ChatGPT's release, citing "non-use" and attorney-client privilege.
- US magistrate judge Ona Wang ordered OpenAI to produce communications related to the dataset deletion, including those previously withheld under privilege.
- The judge found OpenAI's assertions about privilege to be inconsistent, potentially waiving their privilege claims.
- The authors believe these communications could prove willful infringement, leading to higher damages.
- OpenAI disputes the ruling and intends to appeal.
Continue reading
the original article