tech

January 22, 2026

Are AI agents ready for the workplace? A new benchmark raises doubts.

New research looks at how leading AI models hold up doing actual white-collar work tasks, drawn from consulting, investment banking, and law. Most models failed.

Are AI agents ready for the workplace? A new benchmark raises doubts.

TL;DR

  • AI's predicted replacement of knowledge workers has been slow to materialize despite progress in foundation models.
  • New research from Mercor, using the Apex-Agents benchmark, evaluates AI models on real white-collar tasks.
  • Current leading AI models struggle with multi-domain information retrieval, a key aspect of professional work.
  • Even the best-performing models, like Gemini 3 Flash and GPT-5.2, achieved low accuracy rates (around 23-24%) on these complex tasks.
  • The Apex-Agents benchmark is designed to be more challenging and realistic than previous AI skill assessments.
  • Researchers anticipate rapid improvement in AI performance on this benchmark, likening current capabilities to an intern who gets things right about a quarter of the time.