1
- Using CommDebugMode
1
+ Getting Started with `` CommDebugMode ``
2
2
=====================================================
3
3
4
4
**Author **: `Anshul Sinha <https://github.com/sinhaanshul >`__
5
5
6
6
7
- In this tutorial, we will explore how to use CommDebugMode with PyTorch's
7
+ In this tutorial, we will explore how to use `` CommDebugMode `` with PyTorch's
8
8
DistributedTensor (DTensor) for debugging by tracking collective operations in distributed training environments.
9
9
10
10
Prerequisites
@@ -14,7 +14,7 @@ Prerequisites
14
14
* PyTorch 2.2 or later
15
15
16
16
17
- What is CommDebugMode and why is it useful
17
+ What is `` CommDebugMode `` and why is it useful
18
18
------------------------------------------
19
19
As the size of models continues to increase, users are seeking to leverage various combinations
20
20
of parallel strategies to scale up distributed training. However, the lack of interoperability
@@ -32,7 +32,7 @@ users to view when and why collective operations are happening when using DTenso
32
32
addressing this issue.
33
33
34
34
35
- How to use CommDebugMode
35
+ Using `` CommDebugMode ``
36
36
------------------------
37
37
38
38
Here is how you can use ``CommDebugMode ``:
@@ -56,10 +56,10 @@ Here is how you can use ``CommDebugMode``:
56
56
# used in the visual browser below
57
57
comm_mode.generate_json_dump(noise_level = 2 )
58
58
59
+ This is what the output looks like for a MLPModule at noise level 0:
60
+
59
61
.. code-block :: python
60
62
61
- """
62
- This is what the output looks like for a MLPModule at noise level 0
63
63
Expected Output:
64
64
Global
65
65
FORWARD PASS
@@ -72,7 +72,6 @@ Here is how you can use ``CommDebugMode``:
72
72
MLPModule.net2
73
73
FORWARD PASS
74
74
* c10d_functional.all_reduce: 1
75
- """
76
75
77
76
To use ``CommDebugMode ``, you must wrap the code running the model in ``CommDebugMode `` and call the API that
78
77
you want to use to display the data. You can also use a ``noise_level `` argument to control the verbosity
0 commit comments