File tree Expand file tree Collapse file tree 1 file changed +6
-2
lines changed Expand file tree Collapse file tree 1 file changed +6
-2
lines changed Original file line number Diff line number Diff line change @@ -195,11 +195,15 @@ def init_model():
195
195
# GPU compute and the observed speedup may be less significant.
196
196
#
197
197
# You may also see different speedup results depending on the chosen ``mode``
198
- # argument. Since our model and data are small, we want to reduce overhead as
199
- # much as possible, and so we chose ``"reduce-overhead"`` . For your own models,
198
+ # argument. The ``"reduce-overhead"`` mode uses CUDA graphs to further reduce
199
+ # the overhead of Python . For your own models,
200
200
# you may need to experiment with different modes to maximize speedup. You can
201
201
# read more about modes `here <https://pytorch.org/get-started/pytorch-2.0/#user-experience>`__.
202
202
#
203
+ # You may might also notice that the second time we run our model with ``torch.compile`` is significantly
204
+ # slower than the other runs, although it is much faster than the first run. This is because the ``"reduce-overhead"``
205
+ # mode runs a few warm-up iterations for CUDA graphs.
206
+ #
203
207
# For general PyTorch benchmarking, you can try using ``torch.utils.benchmark`` instead of the ``timed``
204
208
# function we defined above. We wrote our own timing function in this tutorial to show
205
209
# ``torch.compile``'s compilation latency.
You can’t perform that action at this time.
0 commit comments