You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: recipes_source/distributed_device_mesh.rst
+9-9Lines changed: 9 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -30,10 +30,10 @@ Users can also easily manage the underlying process_groups/devices for multi-dim
30
30
31
31
Why DeviceMesh is Useful
32
32
------------------------
33
-
DeviceMesh is useful, when composability is requried. That is when your parallelism solutions require both communication across hosts and within each host.
33
+
DeviceMesh is useful when working with multi-dimensional parallelism (i.e. 3-D parallel) where parallelism composability is requried. For example, when your parallelism solutions require both communication across hosts and within each host.
34
34
The image above shows that we can create a 2D mesh that connects the devices within each host, and connects each device with its counterpart on the other hosts in a homogenous setup.
35
35
36
-
Without DeviceMesh, users would need to manually set up NCCL communicatorsbefore applying any parallelism.
36
+
Without DeviceMesh, users would need to manually set up NCCL communicators, cuda devices on each process before applying any parallelism, which could be quite complicated.
37
37
The following code snippet illustrates a hybrid sharding 2-D Parallel pattern setup without :class:`DeviceMesh`.
38
38
First, we need to manually calculate the shard group and replicate group. Then, we need to assign the correct shard and
39
39
replicate group to each rank.
@@ -51,6 +51,7 @@ replicate group to each rank.
51
51
52
52
# Create process groups to manage 2-D like parallel pattern
For simplicity of demonstration, we are simulating 2D parallel using only one node. Note that this code snippet can also be used when running on multi hosts setup.
84
+
.. note::
85
+
For simplicity of demonstration, we are simulating 2D parallel using only one node. Note that this code snippet can also be used when running on multi hosts setup.
86
86
87
87
With the help of :func:`init_device_mesh`, we can accomplish the above 2D setup in just two lines, and we can still
88
88
access the underlying :class:`ProcessGroup` if needed.
@@ -100,15 +100,15 @@ Let's create a file named ``2d_setup_with_device_mesh.py``.
100
100
Then, run the following `torch elastic/torchrun <https://pytorch.org/docs/stable/elastic/quickstart.html>`__ command.
0 commit comments