tech

March 5, 2026

OpenAI’s GPT-5.4 sets new records on professional benchmarks

The new model introduces native computer use, a 1-million-token context window, and a reworked tool-calling system. Whether it actually holds off Anthropic and Google is less clear.

OpenAI’s GPT-5.4 sets new records on professional benchmarks

TL;DR

  • GPT-5.4 is OpenAI's latest frontier model for professional tasks, available in standard, Thinking, and Pro configurations.
  • It demonstrates significant performance improvements on benchmarks like GDPval (83% match/exceed industry professionals) and OSWorld-Verified (75% success rate).
  • Key new capabilities include native computer use for agents and a 1-million-token context window in the API version.
  • The model shows a 33% reduction in incorrect individual factual claims compared to GPT-5.2.
  • A new Tool Search system reduces token usage by 47% by retrieving tool definitions on demand.
  • OpenAI introduced an open-source evaluation, CoT Controllability, to test for deliberate obscuring of reasoning, finding low ability in GPT-5.4 Thinking.
  • GPT-5.4 faces competition from Google's Gemini 3.1 Pro and Anthropic's Claude Opus 4.6, with different models leading on various benchmarks.
  • The rapid release cadence of GPT-5.3 and GPT-5.4 suggests OpenAI's strategy to remain visible in the news cycle.

Continue reading the original article

Made withNostr