[dtensor][debug] tutorial showing users how to use commdebugmode and giving access to visual browser

sinhaanshul · sinhaanshul · commit 4cc0a2d97590 · 2024-08-08T11:40:11.000-07:00
diff --git a/recipes_source/distributed_comm_debug_mode.rst b/recipes_source/distributed_comm_debug_mode.rst
@@ -3,11 +3,10 @@ Using CommDebugMode
 
 **Author**: `Anshul Sinha <https://github.com/sinhaanshul>`__
 
-Prerequisites:
+Prerequisites
 
-- `Distributed Communication Package - torch.distributed <https://pytorch.org/docs/stable/distributed.html>`__
 - Python 3.8 - 3.11
-- PyTorch 2.2
+- PyTorch 2.2 or later
 
 
 What is CommDebugMode and why is it useful
@@ -16,17 +15,20 @@ As the size of models continues to increase, users are seeking to leverage vario
 of parallel strategies to scale up distributed training. However, the lack of interoperability
 between existing solutions poses a significant challenge, primarily due to the absence of a
 unified abstraction that can bridge these different parallelism strategies. To address this
-issue, PyTorch has proposed DistributedTensor(DTensor) which abstracts away the complexities of
-tensor communication in distributed training, providing a seamless user experience. However,
-this abstraction creates a lack of transparency that can make it challenging for users to
-identify and resolve issues. To address this challenge, CommDebugMode, a Python context manager
-will serve as one of the primary debugging tools for DTensors, enabling users to view when and
-why collective operations are happening when using DTensors, addressing this problem.
+issue, PyTorch has proposed `DistributedTensor(DTensor)
+<https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/examples/comm_mode_features_example.py>`_
+which abstracts away the complexities of tensor communication in distributed training,
+providing a seamless user experience. However, this abstraction creates a lack of transparency
+that can make it challenging for users to identify and resolve issues. To address this challenge,
+``CommDebugMode``, a Python context manager will serve as one of the primary debugging tools for
+DTensors, enabling users to view when and why collective operations are happening when using DTensors,
+effectively addressing this issue.
 
 
 How to use CommDebugMode
 ------------------------
-Using CommDebugMode and getting its output is very simple.
+
+Here is how you can use ``CommDebugMode``:
 
 .. code-block:: python
 
@@ -46,6 +48,8 @@ Using CommDebugMode and getting its output is very simple.
     # used in the visual browser below
     comm_mode.generate_json_dump(noise_level=2)
 
+.. code-block:: python
+
     """
     This is what the output looks like for a MLPModule at noise level 0
     Expected Output:
@@ -62,19 +66,18 @@ Using CommDebugMode and getting its output is very simple.
                     *c10d_functional.all_reduce: 1
     """
 
-All users have to do is wrap the code running the model in CommDebugMode and call the API that
-they want to use to display the data. One important thing to note is that the users can use a noise_level
-arguement to control how much information is displayed to the user. The information below shows what each
-noise level displays
+To use ``CommDebugMode``, you must wrap the code running the model in ``CommDebugMode`` and call the API that
+you want to use to display the data. You can also use a ``noise_level`` argument to control the verbosity
+level of displayed information. Here is what each noise level displays:
 
-| 0. prints module-level collective counts
-| 1. prints dTensor operations not included in trivial operations, module information
-| 2. prints operations not included in trivial operations
-| 3. prints all operations
+| 0. Prints module-level collective counts
+| 1. Prints dTensor operations not included in trivial operations, module information
+| 2. Prints operations not included in trivial operations
+| 3. Prints all operations
 
-In the example above, users can see in the first picture that the collective operation, all_reduce, occurs
-once in the forward pass of the MLPModule. The second picture provides a greater level of detail, allowing
-users to pinpoint that the all-reduce operation happens in the second linear layer of the MLPModule.
+In the example above, you can see that the collective operation, all_reduce, occurs once in the forward pass
+of the ``MLPModule``. Furthermore, you can use ``CommDebugMode`` pinpoint that the all-reduce operation happens
+in the second linear layer of the ``MLPModule``.
 
 
 Below is the interactive module tree visualization that users can upload their JSON dump to:
@@ -190,8 +193,10 @@ Below is the interactive module tree visualization that users can upload their J
 
 Conclusion
 ------------------------------------------
-In conclusion, we have learned how to use CommDebugMode in order to debug Distributed Tensors
-and can use future json dumps in the embedded visual browser.
 
-For more detailed information about CommDebugMode, please see
-https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/examples/comm_mode_features_example.py
+In this recipe, we have learned how to use ``CommDebugMode`` to debug Distributed Tensors. You can use your
+own JSON outputs in the embedded visual browser.
+
+For more detailed information about ``CommDebugMode``, see
+`comm_mode_features_example.py
+<https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/examples/comm_mode_features_example.py>`_