File tree Expand file tree Collapse file tree 1 file changed +5
-5
lines changed Expand file tree Collapse file tree 1 file changed +5
-5
lines changed Original file line number Diff line number Diff line change @@ -49,16 +49,16 @@ Enabling Flight Recorder
49
49
There are two required environment variables to get the initial version of Flight Recorder working.
50
50
51
51
- ``TORCH_NCCL_TRACE_BUFFER_SIZE = (0, N) ``: Setting ``N `` to a positive number enables collection.
52
- ``N `` represents the number of entries that will be kept internally in a circular buffer.
53
- We recommended to set this value at *2000 *.
52
+ ``N `` represents the number of entries that will be kept internally in a circular buffer.
53
+ We recommended to set this value at *2000 *.
54
54
- ``TORCH_NCCL_DUMP_ON_TIMEOUT = (true, false) ``: Setting this to ``true `` will write out diagnostic files to disk on job timeout.
55
- If enabled, there will be one file per rank output in the job's running directory.
55
+ If enabled, there will be one file per rank output in the job's running directory.
56
56
57
57
**Optional settings: **
58
58
59
59
- ``TORCH_NCCL_TRACE_CPP_STACK = (true, false) ``: Setting this to true enables C++ stack traces to be captured in Flight Recorder.
60
- C++ stack traces can be useful in providing the exact code path from a PyTorch Python call down to the primitive
61
- C++ implementation. Also see ``TORCH_SYMBOLIZE_MODE `` in additional settings.
60
+ C++ stack traces can be useful in providing the exact code path from a PyTorch Python call down to the primitive
61
+ C++ implementation. Also see ``TORCH_SYMBOLIZE_MODE `` in additional settings.
62
62
- ``TORCH_NCCL_ENABLE_TIMING = (true, false) ``: Setting this to ``true `` will enable additional cuda events at the start of each collective and
63
63
records the *duration * of each collective. This may incur some CPU overhead. In the collected data, the
64
64
*duration * field indicates how long each collective took to execute.
You can’t perform that action at this time.
0 commit comments