Skip to content

Commit 1102397

Browse files
committed
1. add to why DM is useful, 2. add get PG, 3. add note
1 parent 572fdd9 commit 1102397

File tree

1 file changed

+15
-3
lines changed

1 file changed

+15
-3
lines changed

recipes_source/distributed_device_mesh.rst

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,12 @@ Users can also easily manage the underlying process_groups/devices for multi-dim
3030

3131
Why DeviceMesh is Useful
3232
------------------------
33+
DeviceMesh is useful, when composability is requried. That is when your parallelism solutions require both communication across hosts and within each host.
34+
The image above shows that we can create a 2D mesh that connects the devices within each host, and connects each device with its counterpart on the other hosts in a homogenous setup.
3335

34-
The following code snippet illustrates a 2D setup without :class:`DeviceMesh`. First, we need to manually calculate the shard group and replicate group. Then, we need to assign the correct shard and
36+
Without DeviceMesh, users would need to manually set up NCCL communicators before applying any parallelism.
37+
The following code snippet illustrates a hybrid sharding 2-D Parallel pattern setup without :class:`DeviceMesh`.
38+
First, we need to manually calculate the shard group and replicate group. Then, we need to assign the correct shard and
3539
replicate group to each rank.
3640

3741
.. code-block:: python
@@ -76,13 +80,21 @@ Then, run the following `torch elastic/torchrun <https://pytorch.org/docs/stable
7680
.. code-block:: python
7781
torchrun --nnodes=1 --nproc_per_node=8 --rdzv_id=100 --rdzv_endpoint=localhost:29400 2d_setup.py
7882
83+
Note
7984

80-
With the help of :func:`init_device_mesh`, we can accomplish the above 2D setup in just two lines.
85+
For simplicity of demonstration, we are simulating 2D parallel using only one node. Note that this code snippet can also be used when running on multi hosts setup.
86+
87+
With the help of :func:`init_device_mesh`, we can accomplish the above 2D setup in just two lines, and we can still
88+
access the underlying :class:`ProcessGroup` if needed.
8189

8290

8391
.. code-block:: python
8492
from torch.distributed.device_mesh import init_device_mesh
85-
device_mesh = init_device_mesh("cuda", (2, 4))
93+
mesh_2d = init_device_mesh("cuda", (2, 4), mesh_dim_names=("replicate", "shard"))
94+
95+
# Users can acess the undelying process group thru `get_group` API.
96+
replicate_group = mesh_2d.get_group(mesh_dim="replicate")
97+
shard_group = mesh_2d.get_group(mesh_dim="shard")
8698
8799
Let's create a file named ``2d_setup_with_device_mesh.py``.
88100
Then, run the following `torch elastic/torchrun <https://pytorch.org/docs/stable/elastic/quickstart.html>`__ command.

0 commit comments

Comments
 (0)