Skip to content

Commit 83abe5f

Browse files
author
Vincent Moens
committed
amend
1 parent d6318f7 commit 83abe5f

File tree

1 file changed

+8
-3
lines changed

1 file changed

+8
-3
lines changed

intermediate_source/pinmem_nonblock.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -334,9 +334,14 @@ def timer(cmd):
334334
# the hood, a pageable tensor must be copied to pinned memory before being sent to GPU.
335335
#
336336
# However, contrary to a somewhat common belief, calling :meth:`~torch.Tensor.pin_memory()` on a pageable tensor before
337-
# casting it to GPU should not bring any speed-up, on the contrary this call is usually slower than just executing
338-
# the transfer. This makes sense, since we're actually asking python to execute an operation that CUDA will perform
339-
# anyway before copying the data from host to device.
337+
# casting it to GPU should not bring any significant speed-up, on the contrary this call is usually slower than just
338+
# executing the transfer. This makes sense, since we're actually asking python to execute an operation that CUDA will
339+
# perform anyway before copying the data from host to device.
340+
#
341+
# .. note:: Here too, the observation may vary depending on the available hardware.
342+
# The pytorch implementation of
343+
# `pin_memory <https://github.com/pytorch/pytorch/blob/5298acb5c76855bc5a99ae10016efc86b27949bd/aten/src/ATen/native/Memory.cpp#L58>`_
344+
# could be, in rare cases, faster than the corresponding CUDA version.
340345
#
341346
# ``non_blocking=True``
342347
# ~~~~~~~~~~~~~~~~~~~~~

0 commit comments

Comments
 (0)