OpenAI launches MRC to power more reliable, faster AI training
OpenAI has introduced MRC (Multipath Reliable Connection), a networking protocol engineered for the demands of supercomputer-scale AI training. MRC brings multipath routing and reliability mechanisms that keep huge distributed training jobs moving smoothly even when parts of the network experience congestion or failures.
The protocol focuses on the real-world needs of large clusters: higher sustained throughput, lower effective packet loss, and seamless failover across multiple network paths. Those improvements translate directly into fewer training interruptions, better GPU utilization, and faster time-to-train for large models.
Importantly, OpenAI released MRC via the Open Compute Project (OCP), making the design available for chip makers, switch vendors, cloud operators, and research labs to implement. That open release encourages interoperability, spurs vendor support, and lowers the barrier for organizations of all sizes to adopt high-quality networking for AI workloads.
By tackling a critical infrastructure bottleneck, MRC helps accelerate model development and deployment. Faster, more reliable training clusters mean researchers and engineers can iterate more quickly, run larger experiments with less wasted compute, and deliver AI capabilities to users sooner.