The race to dominate artificial intelligence has crashed into a new reality: before the biggest U.S. labs ship their most powerful models, the federal government now wants a first look. Industry calls it cooperation. Critics call it a bottleneck. Either way, the era of “move fast and break things” in frontier AI is over.

From deregulation to gatekeeping: how we got here

When Donald Trump took office, his team tore up the Biden-era AI executive order on day one and promised to “free artificial intelligence from government constraints.” For more than a year, the White House treated most safety proposals as speed bumps in the race to out-innovate China.

Then came Anthropic’s Mythos.

Mythos, a cutting-edge model reportedly capable of hunting down cybersecurity flaws with “extraordinary speed and precision,” was quietly withheld from public release over safety concerns. Its preview, and a wave of internal panic about its hacking capabilities, forced officials to confront an overdue question: what happens when an AI model is powerful enough to threaten national security, and the government has no formal way to inspect it before everyone else gets access?

By early spring, the Pentagon had labeled Anthropic a “supply chain risk,” effectively blacklisting its tools from government use — even as agencies were scrambling to understand whether they needed exactly those tools to keep up in cyber offense and defense.

Quiet build-out: CAISI steps into the vacuum

Even before Mythos detonated inside Washington’s risk calculus, the bureaucracy had started quietly building an AI safety nucleus. In 2023, under Biden, the AI Safety Institute was set up inside the National Institute of Standards and Technology (NIST). Under Trump, it was renamed the Center for AI Standards and Innovation (CAISI) and refocused toward standards and national security rather than broad safety research.

By 2024, OpenAI and Anthropic were already submitting frontier models to this tiny office for evaluation — often stripped of safety guardrails so government testers could probe for worst‑case capabilities like biological weapon synthesis, automated cyberattacks, and hard‑to‑control autonomous agents.

CAISI has completed more than 40 evaluations, including state-of-the-art systems that never reached the public. But the setup was fragile: a voluntary program, no statutory authority, fewer than 200 staff, and no power to block a release even if they found something terrifying.

Still, in a regulatory vacuum, CAISI effectively became America’s de facto AI gatekeeper.

Mythos crisis and the policy snap-back

The Mythos episode turbocharged everything. As Axios put it, in a “post-Mythos world,” the Trump administration suddenly began to reconsider its hard line against AI safety and security measures it had “once shrugged off.”

The White House’s Office of the National Cyber Director convened emergency-style meetings with tech and cyber companies and trade groups to hash out the security implications of advanced models, including Mythos Preview. Inside those discussions, a new idea took shape: the Pentagon would be required to safety-test any AI model deployed by federal, state, or local governments — an extra layer of scrutiny for public-sector deployments on top of any corporate safeguards.

At the same time, reports surfaced that the administration was weighing an executive order that could empower multiple agencies to safety‑test new AI models before they hit the broader market, baking what CAISI had been doing informally into a more formal structure. White House officials insisted any policy announcement “will come directly from the president” and described talk of specific executive orders as “speculation,” but few in industry doubted where things were heading.

The pivot: White House leans into safety

The real political break came as Trump aides acknowledged that AI had “crossed a threshold that no administration — not even one ideologically committed to staying out of its way — can afford to ignore.”

Axios reported that the White House was now preparing to become “a gatekeeper for the most powerful new models on Earth,” even as it continued to frame AI dominance over China as a strategic imperative. A cyber-focused AI security framework was drafted that would require the Pentagon to safety‑test models before they could be deployed in government anywhere in the country.

Behind closed doors, officials briefed executives from Anthropic, Google, and OpenAI on early plans for a formal review process — potentially giving the federal government first access to new models without necessarily granting it the power to block their release. Simultaneously, they were eyeing executive actions to let agencies sidestep the existing ban on government use of Anthropic tools so that they could actually adopt Mythos for national‑security work.

The big deal: Big Five sign on to pre‑release vetting

On May 5, the experiment stopped being quiet.

The Commerce Department announced that Google DeepMind, Microsoft, and Elon Musk’s xAI had agreed to let the U.S. government review their new frontier models before release, joining OpenAI and Anthropic in a now‑unified pre‑deployment evaluation regime.

Five companies now account for “the vast majority of frontier AI development worldwide,” and all five have voluntarily agreed to submit their models to a single government office for testing before deployment. According to The Verge, CAISI will conduct “pre-deployment evaluations and targeted research to better assess frontier AI capabilities,” a mandate that mirrors the new memorandums of understanding being renegotiated with OpenAI and Anthropic.

The White House, for its part, appears poised to fold these practices into Trump’s forthcoming AI Action Plan. Partnerships with OpenAI and Anthropic, first launched in 2024, have been “renegotiated” to align with that plan and CAISI’s evolving directives.

CAISI’s new director, Chris Fall, framed the expanded program as a national-security necessity rather than a brake on innovation: “Independent, rigorous measurement science is essential to understanding frontier AI and its national security implications. These expanded industry collaborations help us scale our work in the public interest at a critical moment.”

Industry’s uneasy embrace

For the labs, the calculus is complicated. On one hand, joining a voluntary program run by a small, underpowered office is far preferable to a heavy‑handed regulator with clear statutory blocking authority. On the other, they’re effectively acknowledging that self‑regulation is no longer enough.

Tech analysts worry the new regime “could slow innovation, especially as the US tries to keep pace with China,” according to Business Insider’s summary of expert reaction to Trump’s AI oversight plans. Every week an advanced model spends in a government testbed instead of in the hands of customers is a week Chinese rivals can close the gap.

Yet the big labs have chosen to play ball. Voluntary vetting offers political cover, shapes the tests that might later become mandatory, and creates a baseline most smaller competitors can’t match. If model evaluation becomes a de facto license to operate, the frontier incumbents just helped design their own moat.

Washington’s split personality on AI

The U.S. government’s approach is now openly schizophrenic: it wants to accelerate AI to win the geopolitical race while slowly surrounding the biggest models with guardrails.

Axios notes the new testing push marks “a sharp change from the White House's approach of prioritizing rapid innovation without guardrails in a bid to beat China.” But the program’s voluntary nature keeps one foot firmly in the pro‑industry camp. CAISI “has no statutory basis, and gives the government no power to block a release,” The Next Web points out — while also describing it as “the closest thing the United States has to an AI oversight system.”

Meanwhile, the Trump administration is still floating an executive order that could give the federal government “a formal role in vetting all new AI models before they hit the market,” potentially via a working group of tech executives and officials. The same reports say some officials want a system where the government gets first access to cutting‑edge models but can’t outright block their release, preserving the U.S. innovation edge while nominally addressing security fears.

The stakes: between Mythos and GPT‑5.5

Looming over everything is the next generation of models. OpenAI’s GPT‑5.5 is reportedly matching Mythos’ capabilities, while Chinese labs race to catch up. If Mythos was the panic test, GPT‑5.5 and its successors will be the stress test.

The Trump administration is now considering giving the Pentagon and other agencies explicit responsibility to vet those models for security vulnerabilities before they are rolled out anywhere in government — and, through CAISI, a first look before they hit the broader public.

Regulation skeptics warn this creates a chokepoint that a future administration could tighten into full-blown licensing. Safety advocates respond that Mythos already exposed the alternative: blindly releasing systems that can weaponize code, biology, or autonomous agents faster than institutions can adapt.

For now, Google, Microsoft, xAI, OpenAI, and Anthropic have all signed a fragile peace with Washington: we’ll let you look under the hood, as long as you don’t touch the steering wheel. The real test will come when CAISI finds something so dangerous that a voluntary program has to decide whether “no power to block” is a feature — or a catastrophic bug.

1. New frontier of AI forces Trump's heavy hand — "President Trump set out on his first day in office to free artificial intelligence from government constraints. 15 months later, his own White House is preparing to become a gatekeeper for the most powerful new models on Earth."

2. Trump administration considering safety review for new AI models — "In a post-Mythos world, the White House is re-evaluating its hard line against the AI security measures it once shrugged off."

3. Google, Microsoft, and xAI agree to pre-release government AI model evaluations as Mythos crisis forces oversight expansion — "Google, Microsoft, and xAI have joined OpenAI and Anthropic in giving the US Commerce Department pre-release access to evaluate their AI models, creating voluntary oversight of all five major frontier AI labs through an office with no statutory authority and fewer than 200 staff."

4. Trump administration considering safety review for new AI models — "The office has also been discussing an AI security framework that would require the Pentagon to lead safety testing for AI deployments for federal, state and local government levels."

5. U.S. ramps up frontier AI testing as White House pivots toward safety — "The government is deepening its oversight of cutting-edge AI, signing new agreements with Google DeepMind, Microsoft and xAI to test powerful models."

6. Trump is weighing AI oversight. This is what smart people are saying about it. — "Tech analysts said they worry oversight could slow innovation, especially as the US tries to keep pace with China."

Story coverage

Human

Google, Microsoft, and xAI will allow the US government to review their new AI models

Posts from this topic will be added to your daily email digest and your homepage feed.

2 days ago

tech

Human

Google, Microsoft, and xAI agree to pre-release government AI model evaluations as Mythos crisis forces oversight expansion

Google, Microsoft, and xAI have joined OpenAI and Anthropic in giving the US Commerce Department pre-release access to evaluate their AI models, creating voluntary oversight of all five major frontier AI labs through an office with no statutory authority and fewer than 200 staff. The expansion was catalysed by the Mythos crisis and a potential executive order that would formalise the review process.

2 days ago