File tree Expand file tree Collapse file tree 1 file changed +8
-3
lines changed Expand file tree Collapse file tree 1 file changed +8
-3
lines changed Original file line number Diff line number Diff line change @@ -334,9 +334,14 @@ def timer(cmd):
334
334
# the hood, a pageable tensor must be copied to pinned memory before being sent to GPU.
335
335
#
336
336
# However, contrary to a somewhat common belief, calling :meth:`~torch.Tensor.pin_memory()` on a pageable tensor before
337
- # casting it to GPU should not bring any speed-up, on the contrary this call is usually slower than just executing
338
- # the transfer. This makes sense, since we're actually asking python to execute an operation that CUDA will perform
339
- # anyway before copying the data from host to device.
337
+ # casting it to GPU should not bring any significant speed-up, on the contrary this call is usually slower than just
338
+ # executing the transfer. This makes sense, since we're actually asking python to execute an operation that CUDA will
339
+ # perform anyway before copying the data from host to device.
340
+ #
341
+ # .. note:: Here too, the observation may vary depending on the available hardware.
342
+ # The pytorch implementation of
343
+ # `pin_memory <https://github.com/pytorch/pytorch/blob/5298acb5c76855bc5a99ae10016efc86b27949bd/aten/src/ATen/native/Memory.cpp#L58>`_
344
+ # could be, in rare cases, faster than the corresponding CUDA version.
340
345
#
341
346
# ``non_blocking=True``
342
347
# ~~~~~~~~~~~~~~~~~~~~~
You can’t perform that action at this time.
0 commit comments