Skip to content

Commit b0d2c64

Browse files
author
Svetlana Karslioglu
authored
Merge branch 'main' into 2.0-RC-TEST
2 parents 66a6e4f + 33492c7 commit b0d2c64

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

recipes_source/recipes/amp_recipe.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,15 +11,15 @@
1111
range of ``float32``. Mixed precision tries to match each op to its appropriate datatype,
1212
which can reduce your network's runtime and memory footprint.
1313
14-
Ordinarily, "automatic mixed precision training" uses `torch.autocast <https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.autocast>`_ and
14+
Ordinarily, "automatic mixed precision training" uses `torch.autocast <https://pytorch.org/docs/stable/amp.html#torch.autocast>`_ and
1515
`torch.cuda.amp.GradScaler <https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.GradScaler>`_ together.
1616
1717
This recipe measures the performance of a simple network in default precision,
1818
then walks through adding ``autocast`` and ``GradScaler`` to run the same network in
1919
mixed precision with improved performance.
2020
2121
You may download and run this recipe as a standalone Python script.
22-
The only requirements are Pytorch 1.6+ and a CUDA-capable GPU.
22+
The only requirements are PyTorch 1.6 or later and a CUDA-capable GPU.
2323
2424
Mixed precision primarily benefits Tensor Core-enabled architectures (Volta, Turing, Ampere).
2525
This recipe should show significant (2-3X) speedup on those architectures.
@@ -105,7 +105,7 @@ def make_model(in_size, out_size, num_layers):
105105
##########################################################
106106
# Adding autocast
107107
# ---------------
108-
# Instances of `torch.cuda.amp.autocast <https://pytorch.org/docs/stable/amp.html#autocasting>`_
108+
# Instances of `torch.autocast <https://pytorch.org/docs/stable/amp.html#autocasting>`_
109109
# serve as context managers that allow regions of your script to run in mixed precision.
110110
#
111111
# In these regions, CUDA ops run in a dtype chosen by autocast
@@ -310,7 +310,7 @@ def make_model(in_size, out_size, num_layers):
310310
# 1. Disable ``autocast`` or ``GradScaler`` individually (by passing ``enabled=False`` to their constructor) and see if infs/NaNs persist.
311311
# 2. If you suspect part of your network (e.g., a complicated loss function) overflows , run that forward region in ``float32``
312312
# and see if infs/NaNs persist.
313-
# `The autocast docstring <https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.autocast>`_'s last code snippet
313+
# `The autocast docstring <https://pytorch.org/docs/stable/amp.html#torch.autocast>`_'s last code snippet
314314
# shows forcing a subregion to run in ``float32`` (by locally disabling autocast and casting the subregion's inputs).
315315
#
316316
# Type mismatch error (may manifest as CUDNN_STATUS_BAD_PARAM)

0 commit comments

Comments
 (0)