tech
January 28, 2026
The Specification Gap: Why Your AI Produces Impressive-Looking Output With Fundamental Problems + The Prompt Kit To Help You Fix It
Cursor ran a fleet of agents for close to a week in January 2026, and found GPT-5.2 best for extended autonomous work. When the experiment finished, the system had generated over a million lines of Rust code across a thousand files and built a browser rendering engine—HTML and CSS parsing, cascade, layout, text pipeline, paint, and JavaScript integration. The FastRender repo describes itself as “under heavy development,” and Simon Willison actually ran it and posted screenshots: it kind of works.

TL;DR
- Codex is better when correctness can be defined; Claude Code is better when it cannot.
- An experiment with GPT-5.2 in January 2026 demonstrated its effectiveness in autonomous work, generating a significant amount of Rust code and building a browser rendering engine.
- Organizations in 2026 face the question of whether AI should be shaped like a colleague or a tool.
- The conventional approach of comparing benchmarks and feature lists misses the essential factor: fit.
- Senior engineers see productivity gains with Codex, while junior developers struggle with it, indicating that the difference is not skill but fit.
- The distinction between AI as a CNC machine metaphor determines if AI multiplies output or mistakes.
- Senior engineers prefer Codex due to a specific capability that outweighs coding-specific training.
- Junior developers prefer Claude Code because its friction catches errors early.
- Most people overestimate their ability to specify precise intent, leading to invisible, expensive consequences.
- Figuring out 'high-grade intent' outside of software will define competitive advantage in 2026.