OpenAI Introduces MRC (Multipath Reliable Connection): A New Open Networking Protocol for Large-Scale AI Supercomputer Training Clusters
MarkTechPost
Michal Sutter
MRC (Multipath Reliable Connection) is a new open networking protocol developed by OpenAI in partnership with AMD, Broadcom, Intel, Microsoft, and NVIDIA that improves GPU networking performance and resilience in large-scale AI training clusters by spreading packets across hundreds of paths simultaneously, recovering from network failures in microseconds, and enabling supercomputers with over 100,000 GPUs to be built using only two tiers of Ethernet switches. The post OpenAI Introduces MRC (Multipath Reliable Connection): A New Open Networking Protocol for Large-Scale AI Supercomputer Training Clusters appeared first on MarkTechPost.
