Categories
Misc

NCCL Deep Dive: Cross Data Center Communication and Network Topology Awareness

As the scale of AI training increases, a single data center (DC) is not sufficient to deliver the required computational power. Most recent approaches to…

As the scale of AI training increases, a single data center (DC) is not sufficient to deliver the required computational power. Most recent approaches to address this challenge rely on multiple data centers being co-located or geographically distributed. In a recently open-sourced feature, the NVIDIA Collective Communication Library (NCCL) is now able to communicate across multiple data centers…

Source

Leave a Reply

Your email address will not be published. Required fields are marked *