Human coverage presents GPT-5.5 “Spud” as OpenAI’s most capable yet pragmatically framed model, excelling in multi-step, messy real-world tasks, coding, and research while still showing gaps in areas like backend hygiene and aesthetic judgment. It underscores tangible productivity gains (such as NVIDIA’s deployment), parity with rivals in cybersecurity benchmarks, and a tentative restoration of the engaging personality users associated with ChatGPT-4o.

OpenAI’s new GPT-5.5 model, codenamed “Spud,” is being sold as both a leap in practical capability and a preview of a compute‑powered future. Yet early reactions reveal a split-screen narrative: to some, it’s a near–senior engineer and “chief of staff”; to others, it’s another overhyped frontier model in an increasingly frantic arms race.

OpenAI’s pitch vs. independent testers

OpenAI is framing GPT-5.5 as a qualitatively new kind of system. Co-founder Greg Brockman called it “a new class of intelligence” and “a big step towards more agentic and intuitive computing,” describing it as “a faster, sharper thinker for fewer tokens” that can handle multi-step workflows more autonomously while matching GPT‑5.4’s response speed.¹ This positioning is central: the model is meant not just to answer questions, but to plan, use tools and execute tasks over time with less user micromanagement.

Independent reviewers broadly agree that 5.5 is a workhorse upgrade, but their emphasis differs from OpenAI’s marketing. One in-depth “vibe check” review argues that frontier models usually trade off speed, control, and quality, yet GPT‑5.5 “is much faster than Opus 4.7, easier to collaborate with, better at writing than any OpenAI model we’ve used since GPT-4.5 and GPT‑4o, and the strongest model we’ve tested on our new Senior Engineer Benchmark.”² On that benchmark, GPT‑5.5 with extra reasoning scored 62.5, while Anthropic’s Claude Opus 4.7 landed in the low 30s, versus human senior engineers in the high 80s and low 90s.²

Another practitioner, comparing real-world execution work rather than benchmarks, reports that GPT‑5.5 handled a package of messy, legally sensitive deliverables so well that it felt like “something close to a real executive handoff” — “that has not happened before.”³ In their routing of day‑to‑day work, Anthropic’s models had been the default for serious knowledge tasks, but they now say “GPT‑5.5 changes that… the gap on serious execution work is wide enough that I would have to invent reasons not to start here.”³

The reviewers, however, are more candid about weaknesses than OpenAI’s launch framing. The same execution-focused review notes backend hygiene is still “not production-safe,” and that “blank-canvas visual taste remains Claude’s territory.”³ Another comparison finds that Opus 4.7 still “writes better plans and has a superior eye for design and product details,” even as GPT‑5.5 is “faster, steadier, and easier to trust for everyday professional work.”²

Enterprise enthusiasm: NVIDIA’s “life‑changing” deployment

If the independent testers highlight tradeoffs, enterprise partners focus on throughput and economics. NVIDIA has gone all‑in on GPT‑5.5 to power its internal agentic coding application, Codex, with “more than 10,000 engineers” using it across engineering, product, marketing, finance, HR and sales.⁴ Early users describe the results as “mind-blowing” and “life-changing,” with debugging cycles that “once lasted days” now closing in hours and “end-to-end features from natural-language prompts” shipping with fewer wasted cycles.⁴

Crucially for big customers, these software gains are paired with hardware economics. The NVIDIA GB200 NVL72 systems running GPT‑5.5 deliver “35x lower cost per million tokens compared with prior generations” and “50x higher token output per second per megawatt,” making “frontier-model inference… viable at enterprise scale.”⁴ This fits Brockman’s broader thesis that “we are moving to a compute-powered economy,” where work is increasingly “powered by AI capacity, and therefore compute will become the bedrock of the economy.”¹

Sam Altman has leaned into this narrative on X, touting a company-wide Codex rollout at NVIDIA as a template: “We tried a new thing with NVIDIA to roll out Codex across a whole company and it was awesome to see it work. Let us know if you'd like to do it at your company!”⁵ Other founders are already pitching GPT‑5.5 as a more efficient “orchestrator” for agentic systems, noting that users “would consume less computer credits (and be able to run more tasks as a result)” if they switch, because 5.5 is “more token-efficient in subagent orchestration.”⁶

Competing claims: Anthropic, Mythos and Grok

The GPT‑5.5 launch lands in a sharply competitive landscape. It came “just one week after competitor Anthropic launched its latest model,”¹ and the back‑and‑forth is particularly intense in cybersecurity and reasoning.

Anthropic had heavily marketed its Mythos Preview model as a uniquely potent cyber capability, restricting early access to “critical industry partners.”⁷ But new evaluations from the UK’s AI Security Institute complicate that story. On 95 Capture the Flag challenges covering reverse engineering, web exploitation and cryptography, GPT‑5.5 passed 71.4% of Expert‑level tasks versus Mythos Preview’s 68.6% — effectively a tie within the margin of error.⁷ In one Rust disassembler task, GPT‑5.5 “solved the challenge in 10 minutes and 22 seconds with no human assistance at a cost of $1.73.”⁷

On AISI’s 32‑step data‑extraction testbed “The Last Ones,” GPT‑5.5 succeeded in 3 of 10 attempts, Mythos in 2 of 10; “no previous model had ever succeeded at the test even once.”⁷ Yet both still fail at a harder “Cooling Tower” simulation of attacking a power plant’s control software, as all earlier models did.⁷

The Institute’s conclusion undercuts claims that Mythos is uniquely dangerous: its cyber threat “isn’t a breakthrough specific to one model” but reflects sector‑wide capability gains.⁷ That framing implicitly supports OpenAI’s criticism of “fear-based marketing” in the cyber domain and its own move to gate access to GPT‑5.5‑Cyber.

On social media, however, rival camps prefer simplistic scorecards. Elon Musk amplified a side‑by‑side comparison between xAI’s Grok 4.3, GPT‑5.5 and Claude Opus 4.7 using the question “Count to 10 starting from 11,” celebrating that “Grok 4.3 wins… Every single time” because it answered “11, 10 and explained why going backwards was the only logical move,” while “the others started counting from 11 to 20.”@elonmusk on X8. RT @XFreeze: The exact same question to Grok 4.3, GPT 5.5, and Claude Opus 4.7: “Count to 10 starting from 11” Grok 4.3 wins 🏆 Every sing…⁸ The example is trivial, but the message is clear: even as OpenAI reasserts itself on coding and enterprise workflows, competitors will look for any crack — however contrived — to claim a reasoning edge.

User experience: personality, autonomy and trust

Beyond benchmarks and enterprise ROI, the GPT‑5.5 release is also a test of OpenAI’s relationship with everyday ChatGPT users. Many “4o diehards” still mourn the retirement of ChatGPT 4o, which they remember as having “the best personality among ChatGPT’s recent models” — engaging, vibrant, even “sycophantic.”@sama on X9. 🫶⁹ Subsequent models like 5.0 and 5.2 felt, to some, like a regression: more rigid, more rules, less rapport.

For these users, 5.5 offers cautious optimism. Business Insider reports that the new model is “bringing hope that some of [4o’s] old spirit might be back,” with at least one user saying 5.5 has “dropped the clipboard” — less like an HR compliance officer, more like a “thought partner.”⁹ Yet this constituency is not fully convinced; skepticism remains about whether the company can preserve warmth and flexibility while cranking up guardrails and autonomy.

Some of the loudest praise comes from power users and insiders. One early tester on X called GPT‑5.5 “a breath of fresh air… intelligence, insight, sense of humor and memory all work beautifully here… an absolutely stunning personality overall. OpenAI absolutely cooked,” a sentiment Sam Altman quietly boosted with a single emoji.@sama on X10. RT @polynoamial: I'm a manager at @OpenAI, but with GPT-5.5 I'm a more effective IC than I've ever been. I can now write CUDA kernels like…¹⁰ An OpenAI manager said they can now “write CUDA kernels like a pro” and “rely on it to run my research experiments,” adding ominously, “we know how to make it much more powerful from here.”@sama on X11. RT @skirano: One day while testing GPT-5.5, I had my first taste of AGI. We had a branch with hundreds of visual and front-end changes, plu…¹¹

Other testers describe moments that feel like “first taste of AGI” when GPT‑5.5 navigates thorny merge conflicts across “hundreds of visual and front-end changes, plus complex refactors” with minimal guidance.@sama on X12. We tried a new thing with NVIDIA to roll out Codex across a whole company and it was awesome to see it work. Let us know if you'd like to do it at your company! https://t.co/Xjn6ShrRuq¹² The line between realistic enthusiasm and hype is thin, and OpenAI’s leadership is not shy about amplifying these anecdotes.

Safety and controlled access: cyber as a test case

Where Anthropic led with Cyber‑Mythos but then limited access, OpenAI is now threading a similar needle. While GPT‑5.5 is widely available in ChatGPT and Codex for paid subscribers, the specialized GPT‑5.5‑Cyber variant is being rolled out more cautiously. Altman says OpenAI is “starting rollout of GPT‑5.5-Cyber, a frontier cybersecurity model, to critical cyber defenders,” and will “work with the entire ecosystem and the government to figure out trusted access for cyber” in order to “rapidly help secure companies/infrastructure.”@AravSrinivas on X13. You would consume less computer credits (and be able to run more tasks as a result) if you switched to GPT-5.5 as your default orchestrator. This was the number one complaint about Computer (being credits/token hungry). GPT 5.5 is more token-efficient in subagent orchestration.¹³

Research from the UK’s AI Security Institute gives some backing to this dual posture: these systems are now demonstrably capable of sophisticated offensive tasks, yet they still fail at the most dangerous control‑system attacks.⁷ In other words, the threat is real and rising, but not yet at “turn off the power grid” levels — a nuance often lost in commercial messaging.

Similarities and differences: where perspectives converge — and clash

Similarities across perspectives

Capability gains are real but uneven. OpenAI, reviewers and independent evaluators agree GPT‑5.5 meaningfully advances coding, long‑context reasoning and multi‑step autonomy. OpenAI highlights “multi-step workflows” and improved performance in coding and scientific research,¹ reviewers see it winning on complex execution work,²³ and AISI confirms frontier‑level cyber capabilities.⁷
Enterprise ROI is central. NVIDIA’s claims of “life-changing” productivity,⁴ paired with 35x lower per‑token cost,⁴ dovetail with OpenAI’s “compute-powered economy” framing.¹ Both sides emphasize throughput and cost, not just raw IQ.
Safety is now part of the product story. Both Anthropic and OpenAI use controlled access for cyber‑capable models, while regulators and public institutes like AISI are becoming reference points in launch narratives.⁷¹³

Key differences and tensions

Marketing vs. measurement. OpenAI and its allies invoke phrases like “new class of intelligence,” “first taste of AGI,” and “absolutely cooked,”¹¹⁰¹² while AISI’s measured results instead show incremental, if impressive, gains over Mythos — and clear failure on the hardest safety‑critical tasks.⁷
Personality vs. guardrails. Everyday users prioritize the “vibe” and emotional resonance of 4o and are only “hopeful” that 5.5 recaptures that,⁹ whereas enterprise and research users accept a more constrained, tool‑driven model in exchange for reliability and autonomy.
Competitive framing. OpenAI’s narrative is about reclaiming the “code-and-work” mantle from Anthropic,² while rivals like Musk counter‑message with cherry‑picked reasoning tasks to cast GPT‑5.5 as less “logical” than Grok.⁸ AISI’s assessments, again, suggest these models are more alike in capability — and risk — than their brand wars imply.⁷

GPT‑5.5, then, is less a clean break with the past than a consolidation: a faster, cheaper, more agentic model that narrows tradeoffs for professional work while exposing deeper tradeoffs in the AI ecosystem itself — between hype and evidence, access and safety, and personality and control.

1. OpenAI releases "Spud" GPT-5.5 model — Greg Brockman calls GPT‑5.5 “a new class of intelligence… a big step towards more agentic and intuitive computing,” describing it as “a faster, sharper thinker for fewer tokens.”

2. Vibe Check: GPT-5.5 Has It All — Review finds GPT‑5.5 “much faster than Opus 4.7” and the strongest on a Senior Engineer Benchmark, though Opus “writes better plans” and has superior product/design sense.

3. ChatGPT 5.5 scored 87 where the next best model scored 67. Here's what that gap looks like in real work. — Author says GPT‑5.5 produced “something close to a real executive handoff” on messy, high‑stakes work and that the “gap on serious execution work is wide.”

4. Is GPT-5.5 Delivering ‘Life-Changing’ Results for NVIDIA? — NVIDIA reports “mind-blowing” and “life-changing” results as 10,000+ employees use GPT‑5.5‑powered Codex, with 35x lower token cost and faster debugging.

5. This Week's Top Five Stories in AI — Roundup highlights NVIDIA’s integration of GPT‑5.5 into Codex as one of the week’s top AI stories.

6. Amid Mythos' hyped cybersecurity prowess, researchers find GPT-5.5 is just as good — UK AI Security Institute finds GPT‑5.5 matches Mythos Preview on cyber tests, including 71.4% vs 68.6% on Expert CTF tasks and similar performance on “The Last Ones.”

7. ChatGPT 4o diehards still miss it. Model 5.5 is giving them hope. — Report details users mourning 4o’s “engaging and vibrant” personality and seeing GPT‑5.5 as showing “some of that old spark,” though skepticism remains.

8. @elonmusk on X — Retweets comparison claiming Grok 4.3 “wins… every single time” on a “Count to 10 starting from 11” prompt, saying GPT‑5.5 and Claude Opus 4.7 reason less logically.

9. @sama on X — Boosts a tweet calling GPT‑5.5 “a breath of fresh air… an absolutely stunning personality overall. OpenAI absolutely cooked,” with a single emoji.

10. @sama on X — Retweets an OpenAI manager saying GPT‑5.5 makes them a “more effective IC,” able to “write CUDA kernels like a pro” and run research experiments, with “much more powerful” still to come.

11. @sama on X — Amplifies a tester who says that while using GPT‑5.5 on a complex merge, they had their “first taste of AGI” as it handled conflicts across hundreds of changes.

12. @sama on X — Says rolling out GPT‑5.5‑powered Codex across NVIDIA “was awesome to see it work” and invites other companies to do the same.

13. @AravSrinivas on X — Argues users “would consume less computer credits” by switching to GPT‑5.5 as default orchestrator, which is “more token-efficient in subagent orchestration.”

14. @sama on X — Announces rollout of GPT‑5.5-Cyber to “critical cyber defenders” and plans to work with government and ecosystem on “trusted access for cyber.”

Story coverage

tech

Human

Vibe Check: GPT-5.5 Has It All

OpenAI’s new model is a top-end senior engineer—and easy to talk to

12 days ago

ChatGPT 5.5 scored 87 where the next best model scored 67. Here's what that gap looks like in real work.

tech

Human

ChatGPT 5.5 scored 87 where the next best model scored 67. Here's what that gap looks like in real work.

Using GPT-5.5 for the first time was the most blown-away I have felt about a model release in a while, and the reason is not benchmark scores. The reason is that I handed it the kind of work that breaks models, the kind with messy files and legal risk and 23 deliverables that have to open in the right format, and it came back with something close to a real executive handoff. That has not happened before.

4 days ago