Are AI agents ready for the workplace? A new benchmark raises doubts.

January 22, 2026

TL;DR

AI's predicted replacement of knowledge workers has been slow to materialize despite progress in foundation models.
New research from Mercor, using the Apex-Agents benchmark, evaluates AI models on real white-collar tasks.
Current leading AI models struggle with multi-domain information retrieval, a key aspect of professional work.
Even the best-performing models, like Gemini 3 Flash and GPT-5.2, achieved low accuracy rates (around 23-24%) on these complex tasks.
The Apex-Agents benchmark is designed to be more challenging and realistic than previous AI skill assessments.
Researchers anticipate rapid improvement in AI performance on this benchmark, likening current capabilities to an intern who gets things right about a quarter of the time.

Continue reading the original article