1. add to why DM is useful, 2. add get PG, 3. add note

wz337 · wz337 · commit af662e59f31d · 2023-12-20T14:25:47.000-08:00
diff --git a/recipes_source/distributed_device_mesh.rst b/recipes_source/distributed_device_mesh.rst
@@ -30,6 +30,9 @@ Users can also easily manage the underlying process_groups/devices for multi-dim
 
 Why DeviceMesh is Useful
 ------------------------
+DeviceMesh is useful, when composability is requried. That is when your parallelism solutions require both communication across hosts and within host.
+The image above shows that we can create a 2D mesh that connects the devices within each host, and connects each device with its counterpart on another host in a homogenous setup.
+
 
 The following code snippet illustrates a 2D setup without :class:`DeviceMesh`. First, we need to manually calculate the shard group and replicate group. Then, we need to assign the correct shard and
 replicate group to each rank.
@@ -76,13 +79,21 @@ Then, run the following `torch elastic/torchrun <https://pytorch.org/docs/stable
 .. code-block:: python
     torchrun --nnodes=1 --nproc_per_node=8 --rdzv_id=100 --rdzv_endpoint=localhost:29400 2d_setup.py
 
+Note
+
+For simplicity of demonstration, we are simulating 2D parallel using only one node. Note that this code snippet can also be used when running on multi hosts setup.
 
-With the help of :func:`init_device_mesh`, we can accomplish the above 2D setup in just two lines.
+With the help of :func:`init_device_mesh`, we can accomplish the above 2D setup in just two lines, and we can still
+access the underlying :class:`ProcessGroup` if needed.
 
 
 .. code-block:: python
     from torch.distributed.device_mesh import init_device_mesh
-    device_mesh = init_device_mesh("cuda", (2, 4))
+    mesh_2d = init_device_mesh("cuda", (2, 4), mesh_dim_names=("replicate", "shard"))
+
+    # Users can acess the undelying process group thru `get_group` API.
+    replicate_group = mesh_2d.get_group(mesh_dim="replicate")
+    shard_group = mesh_2d.get_group(mesh_dim="shard")
 
 Let's create a file named ``2d_setup_with_device_mesh.py``.
 Then, run the following `torch elastic/torchrun <https://pytorch.org/docs/stable/elastic/quickstart.html>`__ command.