Skip to content

Commit 5d434d9

Browse files
wz337wanchaol
andauthored
Apply suggestions from code review
Co-authored-by: Wanchao <wanchaol@users.noreply.github.com>
1 parent 1102397 commit 5d434d9

File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed

recipes_source/distributed_device_mesh.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,10 @@ Users can also easily manage the underlying process_groups/devices for multi-dim
3030

3131
Why DeviceMesh is Useful
3232
------------------------
33-
DeviceMesh is useful, when composability is requried. That is when your parallelism solutions require both communication across hosts and within each host.
33+
DeviceMesh is useful when working with multi-dimensional parallelism (i.e. 3-D parallel) where parallelism composability is requried. For example, when your parallelism solutions require both communication across hosts and within each host.
3434
The image above shows that we can create a 2D mesh that connects the devices within each host, and connects each device with its counterpart on the other hosts in a homogenous setup.
3535

36-
Without DeviceMesh, users would need to manually set up NCCL communicators before applying any parallelism.
36+
Without DeviceMesh, users would need to manually set up NCCL communicators, cuda devices on each process before applying any parallelism, which could be quite complicated
3737
The following code snippet illustrates a hybrid sharding 2-D Parallel pattern setup without :class:`DeviceMesh`.
3838
First, we need to manually calculate the shard group and replicate group. Then, we need to assign the correct shard and
3939
replicate group to each rank.
@@ -108,7 +108,7 @@ How to use DeviceMesh with HSDP
108108

109109
Hybrid Sharding Data Parallel(HSDP) is 2D strategy to perform FSDP within a host and DDP across hosts.
110110

111-
Let's see an example of how DeviceMesh can assist with applying HSDP to your model. With DeviceMesh,
111+
Let's see an example of how DeviceMesh can assist with applying HSDP to your model with a simple setup. With DeviceMesh,
112112
users would not need to manually create and manage shard group and replicate group.
113113

114114
.. code-block:: python

recipes_source/recipes_index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -326,7 +326,7 @@ Recipes are bite-sized, actionable examples of how to use specific PyTorch featu
326326

327327
.. customcarditem::
328328
:header: Getting Started with DeviceMesh
329-
:card_description: Learn how to use DeviceMesh
329+
:card_description: Learn how to use DeviceMesh to manage process groups easily for multi-dimensional parallelism
330330
:image: ../_static/img/thumbnails/cropped/profiler.png
331331
:link: ../recipes/distributed_device_mesh.html
332332
:tags: Distributed-Training

0 commit comments

Comments
 (0)