You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: prototype_source/flight_recorder_tutorial.rst
+7-6Lines changed: 7 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -46,18 +46,18 @@ Flight Recorder consists of two core parts:
46
46
47
47
Enabling Flight Recorder
48
48
------------------------
49
-
There are two required environment variables to get the initial version of Flight Recorder working.
49
+
There are three required environment variables to get the initial version of Flight Recorder working.
50
50
51
51
- ``TORCH_NCCL_TRACE_BUFFER_SIZE = (0, N)``: Setting ``N`` to a positive number enables collection.
52
52
``N`` represents the number of entries that will be kept internally in a circular buffer.
53
-
We recommended to set this value at *2000*.
53
+
We recommended to set this value at *2000*. The default value is ``2000``.
54
54
- ``TORCH_NCCL_DUMP_ON_TIMEOUT = (true, false)``: Setting this to ``true`` will write out diagnostic files to disk on job timeout.
55
-
If enabled, there will be one file per rank output in the job's running directory.
55
+
If enabled, there will be one file per rank output in the job's running directory. The default value is ``false``.
56
+
- ``TORCH_NCCL_DEBUG_INFO_TEMP_FILE``: Setting the path where the flight recorder will be dumped with file prefix. One file per
57
+
rank. The default value is ``/tmp/nccl_trace_rank_``.
56
58
57
59
**Optional settings:**
58
60
59
-
- ``TORCH_NCCL_DEBUG_INFO_TEMP_FILE``: Setting the path where the flight recorder will be dumped with file prefix. One file per
60
-
rank. The default value is ``/tmp/nccl_trace_rank_``.
61
61
- ``TORCH_NCCL_TRACE_CPP_STACK = (true, false)``: Setting this to true enables C++ stack traces to be captured in Flight Recorder.
62
62
C++ stack traces can be useful in providing the exact code path from a PyTorch Python call down to the primitive
63
63
C++ implementation. Also see ``TORCH_SYMBOLIZE_MODE`` in additional settings.
@@ -74,7 +74,8 @@ Additional Settings
74
74
``fast`` is a new experimental mode that is shown to be much faster than the traditional ``addr2line``.
75
75
Use this setting in conjunction with ``TORCH_NCCL_TRACE_CPP_STACK`` to collect C++ traces in the Flight Recorder data.
76
76
- If you prefer not to have the flight recorder data dumped into the local disk but rather onto your own storage, you can define your own writer class.
77
-
This class should inherit from class ``::c10d::DebugInfoWriter`` `(code) <https://github.com/pytorch/pytorch/blob/release/2.5/torch/csrc/distributed/c10d/NCCLUtils.hpp#L237>`__ and then register the new writer using ``::c10d::DebugInfoWriter::registerWriter``
77
+
This class should inherit from class ``::c10d::DebugInfoWriter`` `(code) <https://github.com/pytorch/pytorch/blob/release/2.5/torch/csrc/distributed/c10d/NCCLUtils.hpp#L237>`__
78
+
and then register the new writer using ``::c10d::DebugInfoWriter::registerWriter`` `(code) <https://github.com/pytorch/pytorch/blob/release/2.5/torch/csrc/distributed/c10d/NCCLUtils.hpp#L242>`__
0 commit comments