Skip to content

Commit a56b859

Browse files
committed
Fix typos in distributed_device_mesh.rst
1 parent 19dc6ea commit a56b859

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

recipes_source/distributed_device_mesh.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Prerequisites:
1414

1515

1616
Setting up distributed communicators, i.e. NVIDIA Collective Communication Library (NCCL) communicators, for distributed training can pose a significant challenge. For workloads where users need to compose different parallelisms,
17-
users would need to manually set up and manage NCCL communicators (for example, :class:`ProcessGroup`) for each parallelism solutions. This process could be complicated and susceptible to errors.
17+
users would need to manually set up and manage NCCL communicators (for example, :class:`ProcessGroup`) for each parallelism solution. This process could be complicated and susceptible to errors.
1818
:class:`DeviceMesh` can simplify this process, making it more manageable and less prone to errors.
1919

2020
What is DeviceMesh
@@ -30,7 +30,7 @@ Users can also easily manage the underlying process_groups/devices for multi-dim
3030

3131
Why DeviceMesh is Useful
3232
------------------------
33-
DeviceMesh is useful when working with multi-dimensional parallelism (i.e. 3-D parallel) where parallelism composability is requried. For example, when your parallelism solutions require both communication across hosts and within each host.
33+
DeviceMesh is useful when working with multi-dimensional parallelism (i.e. 3-D parallel) where parallelism composability is required. For example, when your parallelism solutions require both communication across hosts and within each host.
3434
The image above shows that we can create a 2D mesh that connects the devices within each host, and connects each device with its counterpart on the other hosts in a homogenous setup.
3535

3636
Without DeviceMesh, users would need to manually set up NCCL communicators, cuda devices on each process before applying any parallelism, which could be quite complicated.
@@ -95,7 +95,7 @@ access the underlying :class:`ProcessGroup` if needed.
9595
from torch.distributed.device_mesh import init_device_mesh
9696
mesh_2d = init_device_mesh("cuda", (2, 4), mesh_dim_names=("replicate", "shard"))
9797
98-
# Users can acess the undelying process group thru `get_group` API.
98+
# Users can access the underlying process group thru `get_group` API.
9999
replicate_group = mesh_2d.get_group(mesh_dim="replicate")
100100
shard_group = mesh_2d.get_group(mesh_dim="shard")
101101

0 commit comments

Comments
 (0)