@@ -18,10 +18,8 @@ unified abstraction that can bridge these different parallelism strategies. To a
18
18
issue, PyTorch has proposed `DistributedTensor(DTensor)
19
19
<https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/examples/comm_mode_features_example.py> `_
20
20
which abstracts away the complexities of tensor communication in distributed training,
21
- providing a seamless user experience. However, when dealing with existing parallelism solutions
22
- and developing parallelism solutions using the unified abstraction like DTensor, the lack of
23
- transparency about what and when the collective communications happens under the hood could
24
- make it challenging for advanced users to identify and resolve issues. To address this challenge,
21
+ providing a seamless user experience. However, this abstraction creates a lack of transparency
22
+ that can make it challenging for users to identify and resolve issues. To address this challenge,
25
23
``CommDebugMode ``, a Python context manager will serve as one of the primary debugging tools for
26
24
DTensors, enabling users to view when and why collective operations are happening when using DTensors,
27
25
effectively addressing this issue.
@@ -34,7 +32,6 @@ Here is how you can use ``CommDebugMode``:
34
32
35
33
.. code-block :: python
36
34
37
- # The model used in this example is a MLPModule that applies Tensor Parallel
38
35
comm_mode = CommDebugMode()
39
36
with comm_mode:
40
37
output = model(inp)
@@ -74,8 +71,8 @@ you want to use to display the data. You can also use a ``noise_level`` argument
74
71
level of displayed information. Here is what each noise level displays:
75
72
76
73
| 0. Prints module-level collective counts
77
- | 1. Prints DTensor operations ( not including trivial operations) , module sharding information
78
- | 2. Prints tensor operations ( not including trivial operations)
74
+ | 1. Prints dTensor operations not included in trivial operations, module information
75
+ | 2. Prints operations not included in trivial operations
79
76
| 3. Prints all operations
80
77
81
78
In the example above, you can see that the collective operation, all_reduce, occurs once in the forward pass
@@ -197,8 +194,7 @@ Below is the interactive module tree visualization that you can use to upload yo
197
194
Conclusion
198
195
------------------------------------------
199
196
200
- In this recipe, we have learned how to use ``CommDebugMode `` to debug Distributed Tensors and
201
- parallelism solutions that uses communication collectives with PyTorch. You can use your
197
+ In this recipe, we have learned how to use ``CommDebugMode `` to debug Distributed Tensors. You can use your
202
198
own JSON outputs in the embedded visual browser.
203
199
204
200
For more detailed information about ``CommDebugMode ``, see
0 commit comments