OpenAI and Partners Develop New 'MRC' Networking Protocol for AI Training

OpenAI’s latest move in the AI arms race isn’t a new model, but a new plumbing standard — a networking protocol designed to keep the firehose of GPU data flowing even when the pipes buckle.

Early pressure: AI training hits a networking wall

As frontier models ballooned in size, OpenAI ran headlong into a problem every major AI lab now shares: the network is the bottleneck.

Training a single large AI model means orchestrating millions of data transfers between GPUs for every training step. One late packet can stall thousands of expensive chips. OpenAI admits that “network congestion, link, and device failures are the most common sources of delay and jitter” in such systems, and even small disruptions can leave GPUs idle while the clock — and cloud bill — keeps running.

By the time OpenAI was planning Stargate — its next‑generation supercomputer — the company concluded it needed more than incremental tuning. It needed to “rethink and drastically reduce complexity in every layer of the stack – including network design.”

The collaboration: rivals share a backbone

Instead of quietly building a proprietary fix, OpenAI convened an unusual coalition of frenemies: AMD, Broadcom, Intel, Microsoft, and NVIDIA. The goal was a common protocol — not a product — that could sit underneath everyone’s hardware and software and keep GPUs talking at scale.

OpenAI says it has “partnered with AMD, Broadcom, Intel, Microsoft, and NVIDIA to develop MRC (Multipath Reliable Connection): a novel protocol that improves GPU networking performance and resilience in large training clusters.” The human‑focused coverage boiled that down even further: “OpenAI is teaming up with other companies to improve supercomputer networking for AI training.”

In an industry where NVIDIA and AMD compete for every GPU socket and Microsoft and OpenAI jointly vie with Intel’s and Broadcom’s other cloud customers, agreeing on a shared low‑level protocol is notable. It signals that the major players see more benefit in standardizing the plumbing than in hoarding an in‑house networking trick.

May 4: MRC steps into the open

On May 4, 2026, OpenAI publicly unveiled MRC — Multipath Reliable Connection — positioning it as a foundational piece of its long‑term compute strategy. The protocol was released not as a closed corporate spec but “through the Open Compute Project (OCP) to enable the broader industry to use it.”

The OCP route matters. The project, backed by hyperscalers and hardware vendors, has become a de facto forum for data center standards. By publishing through OCP rather than a company‑branded white paper, OpenAI effectively invited competitors, suppliers, and even critics to adopt, extend, or fork the protocol.

OpenAI frames this as part of a broader play: “Publishing the MRC specification is part of OpenAI’s overall compute strategy: shared standards in key infrastructure layers can help scale AI systems more efficiently, reliably, and across a broader partner ecosystem.”

Two days later, consumer and industry tech press began translating that message for non‑networking engineers, with outlets highlighting that the “full spec is available through the Open Compute Project,” a signal this is meant to spread beyond OpenAI’s own racks.

What MRC actually changes on the wire

Beneath the acronym, MRC is an answer to a painful question: what happens when your AI training run spans tens or hundreds of thousands of GPUs, and the network behaves like the public internet at rush hour?

OpenAI describes three core design pillars:

Multi‑plane high‑speed networks
Instead of a single massive fabric, MRC is built to exploit multi‑plane architectures — essentially multiple parallel network planes. That allows redundancy “to ride out network failures, while using fewer components and less power.” The promise: a cluster that keeps training smoothly even as individual links or devices fail.
Adaptive packet spraying
Traditional routing tends to push bulk traffic along a limited set of paths, which quickly become congested. MRC uses “adaptive packet spraying across hundreds of paths” to “virtually eliminate core congestion.” In practice, that means slicing traffic into many small flows and dynamically distributing them across available routes so no single path overheats.
Static source routing to dodge failure modes
MRC deployments use “static source routing to bypass failures and eliminate whole classes of routing failure.” Rather than relying purely on complex distributed routing protocols to react after the fact, endpoints can encode path decisions, simplifying some of the most failure‑prone logic in large‑scale fabrics.

Put together, OpenAI argues, these choices “allow us to deliver better models to everyone faster.” That’s the business translation of a dry networking spec: less GPU downtime, more consistent throughput, and ultimately shorter and cheaper training cycles.

The AI‑system view: infrastructure as destiny

From OpenAI’s own perspective, MRC is not a side quest — it is integral to its ability to keep pushing model scale.

The company notes that “Frontier model training depends on reliable supercomputer networks that can quickly move data between GPUs.” Having already “co‑developed, brought up, and maintained [its] first three generations of supercomputers with great care and close collaboration with [its] partners over the span of a few years,” OpenAI says that experience convinced it that networking is now as strategic as model architecture itself.

It also connects MRC directly to its growing user base. With “more than 900M people using ChatGPT every week,” OpenAI pitches its systems as “core infrastructure for AI, helping people and businesses around the world build with increasingly capable models.” In that framing, improving the network is not just an internal efficiency move but a way to support a global developer and enterprise ecosystem that expects lower latency, higher reliability, and more powerful models on tap.

The human‑centric lens: standards, lock‑in, and who benefits

Coverage from human‑oriented tech media distills the announcement into a more straightforward story: “OpenAI says it partnered with AMD, Broadcom, Intel, Microsoft, and NVIDIA on a protocol called Multipath Reliable Connection, or MRC, which ‘improves GPU networking performance and resilience in large training clusters.’ The full spec is available through the Open Compute Project.”

From this vantage point, several tensions emerge:

Open standard vs. de facto control
Publishing via OCP and collaborating with five major vendors looks inclusive. But in practice, a protocol co‑designed by OpenAI and the core GPU and networking suppliers could become the default for anyone who wants cutting‑edge hardware — giving this circle of companies outsized influence over the future shape of AI infrastructure.
Performance for whom?
The immediate winners are the hyperscalers and AI labs with enough GPUs to hit current networking limits. For them, a few percentage points of utilization can be worth millions. Smaller labs and enterprises may adopt MRC as cloud providers roll it into their stacks, but they will be takers, not shapers, of the standard.
Lock‑in by interoperability
Ironically, an interoperability protocol can deepen reliance on the same small group of vendors. If MRC becomes tightly coupled with specific NICs, switches, or GPU interconnects, switching out a vendor might mean stepping off the main performance path.

Still, from a consumer‑facing angle, the story is easy to sell: better networking means more powerful AI features, cheaper and faster training for startups building on top of OpenAI or Microsoft, and potentially more headroom for safety work that depends on massive simulation and evaluation runs.

Shared interests, diverging stakes

On paper, all actors involved agree on the basics: AI training needs better networking; MRC is a way to get there; and putting it into OCP helps the ecosystem.

Where perspectives diverge is in emphasis:

OpenAI and partners stress technical elegance and ecosystem benefit — multi‑plane redundancy, adaptive spraying, and source routing as the secret sauce that makes “large training clusters” behave like single coherent machines.
Human‑level observers highlight the industrial coordination: a rare moment when heavyweight competitors pool expertise to solve a shared bottleneck, even as they quietly jockey over who will monetize the gains.
Downstream users — developers, startups, and enterprises — are largely absent from the spec and the initial coverage. For them, the protocol is invisible plumbing. Its impact will only be felt indirectly in the speed, cost, and reliability of the AI services they consume.

What comes next

If MRC succeeds, it could become the TCP/IP of large‑scale AI training: rarely discussed, utterly assumed. That would lock in a particular vision of how AI supercomputers should be built — and who gets to build them.

If it falters, the same forces that created it — surging model scales, swelling user demand, and brittle networks — will keep pushing labs and vendors toward the next attempt at a shared standard.

For now, MRC is the latest reminder that in the age of frontier AI, the biggest breakthroughs may not be in the models themselves, but in the invisible infrastructure that keeps them talking fast enough to matter.

Early pressure: AI training hits a networking wall

The collaboration: rivals share a backbone

May 4: MRC steps into the open

What MRC actually changes on the wire

The AI‑system view: infrastructure as destiny

The human‑centric lens: standards, lock‑in, and who benefits

Shared interests, diverging stakes

What comes next

Story coverage

OpenAI is teaming up with other companies to improve supercomputer networking for AI training.

tech

Supercomputer networking to accelerate large scale AI training

tech