tech
December 19, 2025
The Ars Technica AI coding agent test: Minesweeper edition
How do four modern LLMs do at recreating a simple Windows gaming classic?

TL;DR
- Four AI coding agents (OpenAI Codex, Anthropic Claude Code, Google Gemini CLI, Mistral Vibe) were tested by recreating the game Minesweeper.
- OpenAI Codex was the top performer, successfully implementing crucial features like 'chording' and providing helpful instructions.
- Anthropic Claude Code presented a polished interface and unique 'Power Mode' features but omitted 'chording'.
- Mistral Vibe failed to implement essential gameplay mechanics like 'chording' and flag marking, and lacked sound effects.
- Google Gemini CLI completely failed the one-shot test, producing non-functional code and encountering significant development issues.
- The test demonstrated that while AI coding agents can produce functional results, they currently function best as tools to augment human coders rather than replace them.
Continue reading
the original article