Skip to content

Commit 05069ca

Browse files
committed
mend
1 parent 1cccebd commit 05069ca

File tree

1 file changed

+9
-6
lines changed

1 file changed

+9
-6
lines changed

recipes_source/distributed_comm_debug_mode.rst

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,10 @@ unified abstraction that can bridge these different parallelism strategies. To a
1818
issue, PyTorch has proposed `DistributedTensor(DTensor)
1919
<https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/examples/comm_mode_features_example.py>`_
2020
which abstracts away the complexities of tensor communication in distributed training,
21-
providing a seamless user experience. However, this abstraction creates a lack of transparency
22-
that can make it challenging for users to identify and resolve issues. To address this challenge,
21+
providing a seamless user experience. However, when dealing with existing parallelism solutions
22+
and developing parallelism solutions using the unified abstraction like DTensor, the lack of
23+
transparency about what and when the collective communications happens under the hood could
24+
make it challenging for advanced users to identify and resolve issues. To address this challenge,
2325
``CommDebugMode``, a Python context manager will serve as one of the primary debugging tools for
2426
DTensors, enabling users to view when and why collective operations are happening when using DTensors,
2527
effectively addressing this issue.
@@ -31,7 +33,7 @@ How to use CommDebugMode
3133
Here is how you can use ``CommDebugMode``:
3234

3335
.. code-block:: python
34-
36+
# The model used in this example is a MLPModule that applies Tensor Parallel
3537
comm_mode = CommDebugMode()
3638
with comm_mode:
3739
output = model(inp)
@@ -71,8 +73,8 @@ you want to use to display the data. You can also use a ``noise_level`` argument
7173
level of displayed information. Here is what each noise level displays:
7274

7375
| 0. Prints module-level collective counts
74-
| 1. Prints dTensor operations not included in trivial operations, module information
75-
| 2. Prints operations not included in trivial operations
76+
| 1. Prints DTensor operations (not including trivial operations), module sharding information
77+
| 2. Prints tensor operations (not including trivial operations)
7678
| 3. Prints all operations
7779
7880
In the example above, you can see that the collective operation, all_reduce, occurs once in the forward pass
@@ -194,7 +196,8 @@ Below is the interactive module tree visualization that you can use to upload yo
194196
Conclusion
195197
------------------------------------------
196198

197-
In this recipe, we have learned how to use ``CommDebugMode`` to debug Distributed Tensors. You can use your
199+
In this recipe, we have learned how to use ``CommDebugMode`` to debug Distributed Tensors and
200+
parallelism solutions that uses communication collectives with PyTorch. You can use your
198201
own JSON outputs in the embedded visual browser.
199202

200203
For more detailed information about ``CommDebugMode``, see

0 commit comments

Comments
 (0)