You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: recipes_source/distributed_device_mesh.rst
+6-6Lines changed: 6 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -11,12 +11,12 @@ Prerequisites:
11
11
- `Distributed Communication Package - torch.distributed <https://pytorch.org/docs/stable/distributed.html>`__
12
12
13
13
.. Setting up the NVIDIA Collective Communication Library (NCCL) communicators for distributed communication during distributed training can pose a significant challenge. For workloads where users need to compose different parallelisms,
14
-
.. users would need to manually set up and manage nccl communicators(for example, :class:`ProcessGroup`) for each parallelism solutions. This is fairly complicated and error-proned.
15
-
.. :class:`DeviceMesh` can help make this process much easier.
14
+
.. users would need to manually set up and manage NCCL communicators(for example, :class:`ProcessGroup`) for each parallelism solutions. This process could be complicated and susceptible to errors.
15
+
.. :class:`DeviceMesh` can simplify this process, making it more manageable and less prone to errors.
16
16
17
17
What is DeviceMesh
18
18
------------------
19
-
.. :class:`DeviceMesh` is a higher level abstraction that manages :class:`ProcessGroup`. It allows users to easily
19
+
.. :class:`DeviceMesh` is a higher level abstraction that manages :class:`ProcessGroup`. It allows users to effortlessly
20
20
.. create inter-node and intra-node process groups without worrying about how to set up ranks correctly for different sub process groups.
21
21
.. Users can also easily manage the underlying process_groups/devices for multi-dimensional parallelism via :class:`DeviceMesh`.
22
22
@@ -28,7 +28,7 @@ What is DeviceMesh
28
28
Why DeviceMesh is Useful
29
29
------------------------
30
30
31
-
..Below is the code snippet for a 2D setup without :class:`DeviceMesh`. First, we need to manually calculate shard group and replicate group. Then, we need to assign the correct shard and
31
+
..The following code snippet illustrates a 2D setup without :class:`DeviceMesh`. First, we need to manually calculate the shard group and replicate group. Then, we need to assign the correct shard and
32
32
.. replicate group to each rank.
33
33
34
34
.. code-block:: python
@@ -57,7 +57,7 @@ current_shard_group = (
57
57
shard_groups[0] if rank in shard_rank_lists[0] else shard_groups[1]
0 commit comments