You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: recipes_source/distributed_device_mesh.rst
+83-77Lines changed: 83 additions & 77 deletions
Original file line number
Diff line number
Diff line change
@@ -9,16 +9,19 @@ Getting Started with DeviceMesh
9
9
Prerequisites:
10
10
11
11
- `Distributed Communication Package - torch.distributed <https://pytorch.org/docs/stable/distributed.html>`__
12
+
- Python
13
+
- PyTorch 2.2
12
14
13
-
.. Setting up the NVIDIA Collective Communication Library (NCCL) communicators for distributed communication during distributed training can pose a significant challenge. For workloads where users need to compose different parallelisms,
14
-
.. users would need to manually set up and manage nccl communicators(for example, :class:`ProcessGroup`) for each parallelism solutions. This is fairly complicated and error-proned.
15
-
.. :class:`DeviceMesh` can help make this process much easier.
15
+
16
+
Setting up the NVIDIA Collective Communication Library (NCCL) communicators for distributed communication during distributed training can pose a significant challenge. For workloads where users need to compose different parallelisms,
17
+
users would need to manually set up and manage NCCL communicators (for example, :class:`ProcessGroup`) for each parallelism solutions. This process could be complicated and susceptible to errors.
18
+
:class:`DeviceMesh` can simplify this process, making it more manageable and less prone to errors.
16
19
17
20
What is DeviceMesh
18
21
------------------
19
-
..:class:`DeviceMesh` is a higher level abstraction that manages :class:`ProcessGroup`. It allows users to easily
20
-
..create inter-node and intra-node process groups without worrying about how to set up ranks correctly for different sub process groups.
21
-
..Users can also easily manage the underlying process_groups/devices for multi-dimensional parallelism via :class:`DeviceMesh`.
22
+
:class:`DeviceMesh` is a higher level abstraction that manages :class:`ProcessGroup`. It allows users to effortlessly
23
+
create inter-node and intra-node process groups without worrying about how to set up ranks correctly for different sub process groups.
24
+
Users can also easily manage the underlying process_groups/devices for multi-dimensional parallelism via :class:`DeviceMesh`.
.. Below is the code snippet for a 2D setup without :class:`DeviceMesh`. First, we need to manually calculate shard group and replicate group. Then, we need to assign the correct shard and
32
-
..replicate group to each rank.
34
+
The following code snippet illustrates a 2D setup without :class:`DeviceMesh`. First, we need to manually calculate the shard group and replicate group. Then, we need to assign the correct shard and
35
+
replicate group to each rank.
33
36
34
37
.. code-block:: python
35
-
import os
36
-
37
-
import torch
38
-
import torch.distributed as dist
39
-
40
-
# Understand world topology
41
-
rank = int(os.environ["RANK"])
42
-
world_size = int(os.environ["WORLD_SIZE"])
43
-
print(f"Running example on {rank=} in a world with {world_size=}")
44
-
45
-
# Create process groups to manage 2-D like parallel pattern
0 commit comments