Unlocking large scale AI training networks with MRC (Multipath Reliable Connection)
OpenAI has partnered with AMD, Broadcom, Intel, Microsoft, and NVIDIA to develop MRC (Multipath Reliable Connection), a novel networking protocol designed to improve GPU performance and resilience in large-scale AI training clusters. The protocol addresses key challenges in frontier model training by minimizing network congestion, reducing failure impact through redundancy, and enabling more efficient data movement across supercomputer networks at the scale of OpenAI's Stargate infrastructure.



















