tech
March 5, 2026
OpenAI’s GPT-5.4 sets new records on professional benchmarks
The new model introduces native computer use, a 1-million-token context window, and a reworked tool-calling system. Whether it actually holds off Anthropic and Google is less clear.

TL;DR
- GPT-5.4 is OpenAI's latest frontier model for professional tasks, available in standard, Thinking, and Pro configurations.
- It demonstrates significant performance improvements on benchmarks like GDPval (83% match/exceed industry professionals) and OSWorld-Verified (75% success rate).
- Key new capabilities include native computer use for agents and a 1-million-token context window in the API version.
- The model shows a 33% reduction in incorrect individual factual claims compared to GPT-5.2.
- A new Tool Search system reduces token usage by 47% by retrieving tool definitions on demand.
- OpenAI introduced an open-source evaluation, CoT Controllability, to test for deliberate obscuring of reasoning, finding low ability in GPT-5.4 Thinking.
- GPT-5.4 faces competition from Google's Gemini 3.1 Pro and Anthropic's Claude Opus 4.6, with different models leading on various benchmarks.
- The rapid release cadence of GPT-5.3 and GPT-5.4 suggests OpenAI's strategy to remain visible in the news cycle.
Continue reading the original article