OpenAI’s GPT-5.4 sets new records on professional benchmarks

March 5, 2026

TL;DR

GPT-5.4 is OpenAI's latest frontier model for professional tasks, available in standard, Thinking, and Pro configurations.
It demonstrates significant performance improvements on benchmarks like GDPval (83% match/exceed industry professionals) and OSWorld-Verified (75% success rate).
Key new capabilities include native computer use for agents and a 1-million-token context window in the API version.
The model shows a 33% reduction in incorrect individual factual claims compared to GPT-5.2.
A new Tool Search system reduces token usage by 47% by retrieving tool definitions on demand.
OpenAI introduced an open-source evaluation, CoT Controllability, to test for deliberate obscuring of reasoning, finding low ability in GPT-5.4 Thinking.
GPT-5.4 faces competition from Google's Gemini 3.1 Pro and Anthropic's Claude Opus 4.6, with different models leading on various benchmarks.
The rapid release cadence of GPT-5.3 and GPT-5.4 suggests OpenAI's strategy to remain visible in the news cycle.

Continue reading the original article