You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: recipes_source/distributed_device_mesh.rst
+3-3Lines changed: 3 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ Prerequisites:
14
14
15
15
16
16
Setting up distributed communicators, i.e. NVIDIA Collective Communication Library (NCCL) communicators, for distributed training can pose a significant challenge. For workloads where users need to compose different parallelisms,
17
-
users would need to manually set up and manage NCCL communicators (for example, :class:`ProcessGroup`) for each parallelism solutions. This process could be complicated and susceptible to errors.
17
+
users would need to manually set up and manage NCCL communicators (for example, :class:`ProcessGroup`) for each parallelism solution. This process could be complicated and susceptible to errors.
18
18
:class:`DeviceMesh` can simplify this process, making it more manageable and less prone to errors.
19
19
20
20
What is DeviceMesh
@@ -30,7 +30,7 @@ Users can also easily manage the underlying process_groups/devices for multi-dim
30
30
31
31
Why DeviceMesh is Useful
32
32
------------------------
33
-
DeviceMesh is useful when working with multi-dimensional parallelism (i.e. 3-D parallel) where parallelism composability is requried. For example, when your parallelism solutions require both communication across hosts and within each host.
33
+
DeviceMesh is useful when working with multi-dimensional parallelism (i.e. 3-D parallel) where parallelism composability is required. For example, when your parallelism solutions require both communication across hosts and within each host.
34
34
The image above shows that we can create a 2D mesh that connects the devices within each host, and connects each device with its counterpart on the other hosts in a homogenous setup.
35
35
36
36
Without DeviceMesh, users would need to manually set up NCCL communicators, cuda devices on each process before applying any parallelism, which could be quite complicated.
@@ -95,7 +95,7 @@ access the underlying :class:`ProcessGroup` if needed.
95
95
from torch.distributed.device_mesh import init_device_mesh
0 commit comments