Correct when to set_device in ddp (#2781)

jmarintur · jmarin · svekars · web-flow · commit 630c2e295a71 · 2024-03-07T14:15:46.000-08:00
Co-authored-by: jmarin &lt;javier.marin@satellogic.com&gt;
Co-authored-by: Svetlana Karslioglu &lt;svekars@meta.com&gt;
diff --git a/beginner_source/ddp_series_multigpu.rst b/beginner_source/ddp_series_multigpu.rst
@@ -78,15 +78,15 @@ Imports
 Constructing the process group
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
+-  First, before initializing the group process, call `set_device <https://pytorch.org/docs/stable/generated/torch.cuda.set_device.html?highlight=set_device#torch.cuda.set_device>`__,
+   which sets the default GPU for each process. This is important to prevent hangs or excessive memory utilization on `GPU:0`
 -  The process group can be initialized by TCP (default) or from a
    shared file-system. Read more on `process group
    initialization <https://pytorch.org/docs/stable/distributed.html#tcp-initialization>`__
 -  `init_process_group <https://pytorch.org/docs/stable/distributed.html?highlight=init_process_group#torch.distributed.init_process_group>`__
    initializes the distributed process group.
 -  Read more about `choosing a DDP
    backend <https://pytorch.org/docs/stable/distributed.html#which-backend-to-use>`__
--  `set_device <https://pytorch.org/docs/stable/generated/torch.cuda.set_device.html?highlight=set_device#torch.cuda.set_device>`__
-   sets the default GPU for each process. This is important to prevent hangs or excessive memory utilization on `GPU:0`
 
 .. code-block:: diff
 
@@ -98,8 +98,9 @@ Constructing the process group
     +   """
     +   os.environ["MASTER_ADDR"] = "localhost"
     +   os.environ["MASTER_PORT"] = "12355"
-    +   init_process_group(backend="nccl", rank=rank, world_size=world_size)
     +   torch.cuda.set_device(rank)
+    +   init_process_group(backend="nccl", rank=rank, world_size=world_size)
+
 
 
 Constructing the DDP model