File tree Expand file tree Collapse file tree 1 file changed +7
-5
lines changed Expand file tree Collapse file tree 1 file changed +7
-5
lines changed Original file line number Diff line number Diff line change @@ -18,11 +18,13 @@ unified abstraction that can bridge these different parallelism strategies. To a
18
18
issue, PyTorch has proposed `DistributedTensor(DTensor)
19
19
<https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/examples/comm_mode_features_example.py> `_
20
20
which abstracts away the complexities of tensor communication in distributed training,
21
- providing a seamless user experience. However, this abstraction creates a lack of transparency
22
- that can make it challenging for users to identify and resolve issues. To address this challenge,
23
- ``CommDebugMode ``, a Python context manager will serve as one of the primary debugging tools for
24
- DTensors, enabling users to view when and why collective operations are happening when using DTensors,
25
- effectively addressing this issue.
21
+ providing a seamless user experience. However, when dealing with existing parallelism solutions and
22
+ developing parallelism solutions using the unified abstraction like DTensor, the lack of transparency
23
+ about what and when the collective communications happens under the hood could make it challenging
24
+ for advanced users to identify and resolve issues. To address this challenge, ``CommDebugMode ``, a
25
+ Python context manager will serve as one of the primary debugging tools for DTensors, enabling
26
+ users to view when and why collective operations are happening when using DTensors, effectively
27
+ addressing this issue.
26
28
27
29
28
30
How to use CommDebugMode
You can’t perform that action at this time.
0 commit comments