You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: recipes_source/distributed_device_mesh.rst
+15-3Lines changed: 15 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -30,8 +30,12 @@ Users can also easily manage the underlying process_groups/devices for multi-dim
30
30
31
31
Why DeviceMesh is Useful
32
32
------------------------
33
+
DeviceMesh is useful, when composability is requried. That is when your parallelism solutions require both communication across hosts and within each host.
34
+
The image above shows that we can create a 2D mesh that connects the devices within each host, and connects each device with its counterpart on the other hosts in a homogenous setup.
33
35
34
-
The following code snippet illustrates a 2D setup without :class:`DeviceMesh`. First, we need to manually calculate the shard group and replicate group. Then, we need to assign the correct shard and
36
+
Without DeviceMesh, users would need to manually set up NCCL communicators before applying any parallelism.
37
+
The following code snippet illustrates a hybrid sharding 2-D Parallel pattern setup without :class:`DeviceMesh`.
38
+
First, we need to manually calculate the shard group and replicate group. Then, we need to assign the correct shard and
35
39
replicate group to each rank.
36
40
37
41
.. code-block:: python
@@ -76,13 +80,21 @@ Then, run the following `torch elastic/torchrun <https://pytorch.org/docs/stable
With the help of :func:`init_device_mesh`, we can accomplish the above 2D setup in just two lines.
85
+
For simplicity of demonstration, we are simulating 2D parallel using only one node. Note that this code snippet can also be used when running on multi hosts setup.
86
+
87
+
With the help of :func:`init_device_mesh`, we can accomplish the above 2D setup in just two lines, and we can still
88
+
access the underlying :class:`ProcessGroup` if needed.
81
89
82
90
83
91
.. code-block:: python
84
92
from torch.distributed.device_mesh import init_device_mesh
0 commit comments