Skip to content

Commit 630c2e2

Browse files
jmarinturjmarinsvekars
authored
Correct when to set_device in ddp (#2781)
Co-authored-by: jmarin <javier.marin@satellogic.com> Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
1 parent e5b9e61 commit 630c2e2

File tree

1 file changed

+4
-3
lines changed

1 file changed

+4
-3
lines changed

beginner_source/ddp_series_multigpu.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -78,15 +78,15 @@ Imports
7878
Constructing the process group
7979
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8080

81+
- First, before initializing the group process, call `set_device <https://pytorch.org/docs/stable/generated/torch.cuda.set_device.html?highlight=set_device#torch.cuda.set_device>`__,
82+
which sets the default GPU for each process. This is important to prevent hangs or excessive memory utilization on `GPU:0`
8183
- The process group can be initialized by TCP (default) or from a
8284
shared file-system. Read more on `process group
8385
initialization <https://pytorch.org/docs/stable/distributed.html#tcp-initialization>`__
8486
- `init_process_group <https://pytorch.org/docs/stable/distributed.html?highlight=init_process_group#torch.distributed.init_process_group>`__
8587
initializes the distributed process group.
8688
- Read more about `choosing a DDP
8789
backend <https://pytorch.org/docs/stable/distributed.html#which-backend-to-use>`__
88-
- `set_device <https://pytorch.org/docs/stable/generated/torch.cuda.set_device.html?highlight=set_device#torch.cuda.set_device>`__
89-
sets the default GPU for each process. This is important to prevent hangs or excessive memory utilization on `GPU:0`
9090

9191
.. code-block:: diff
9292
@@ -98,8 +98,9 @@ Constructing the process group
9898
+ """
9999
+ os.environ["MASTER_ADDR"] = "localhost"
100100
+ os.environ["MASTER_PORT"] = "12355"
101-
+ init_process_group(backend="nccl", rank=rank, world_size=world_size)
102101
+ torch.cuda.set_device(rank)
102+
+ init_process_group(backend="nccl", rank=rank, world_size=world_size)
103+
103104
104105
105106
Constructing the DDP model

0 commit comments

Comments
 (0)