Skip to content

Correct occurences of autocast in amp recipe #2238

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 6, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions recipes_source/recipes/amp_recipe.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,15 @@
range of ``float32``. Mixed precision tries to match each op to its appropriate datatype,
which can reduce your network's runtime and memory footprint.

Ordinarily, "automatic mixed precision training" uses `torch.autocast <https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.autocast>`_ and
Ordinarily, "automatic mixed precision training" uses `torch.autocast <https://pytorch.org/docs/stable/amp.html#torch.autocast>`_ and
`torch.cuda.amp.GradScaler <https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.GradScaler>`_ together.

This recipe measures the performance of a simple network in default precision,
then walks through adding ``autocast`` and ``GradScaler`` to run the same network in
mixed precision with improved performance.

You may download and run this recipe as a standalone Python script.
The only requirements are Pytorch 1.6+ and a CUDA-capable GPU.
The only requirements are PyTorch 1.6 or later and a CUDA-capable GPU.

Mixed precision primarily benefits Tensor Core-enabled architectures (Volta, Turing, Ampere).
This recipe should show significant (2-3X) speedup on those architectures.
Expand Down Expand Up @@ -105,7 +105,7 @@ def make_model(in_size, out_size, num_layers):
##########################################################
# Adding autocast
# ---------------
# Instances of `torch.cuda.amp.autocast <https://pytorch.org/docs/stable/amp.html#autocasting>`_
# Instances of `torch.autocast <https://pytorch.org/docs/stable/amp.html#autocasting>`_
# serve as context managers that allow regions of your script to run in mixed precision.
#
# In these regions, CUDA ops run in a dtype chosen by autocast
Expand Down Expand Up @@ -310,7 +310,7 @@ def make_model(in_size, out_size, num_layers):
# 1. Disable ``autocast`` or ``GradScaler`` individually (by passing ``enabled=False`` to their constructor) and see if infs/NaNs persist.
# 2. If you suspect part of your network (e.g., a complicated loss function) overflows , run that forward region in ``float32``
# and see if infs/NaNs persist.
# `The autocast docstring <https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.autocast>`_'s last code snippet
# `The autocast docstring <https://pytorch.org/docs/stable/amp.html#torch.autocast>`_'s last code snippet
# shows forcing a subregion to run in ``float32`` (by locally disabling autocast and casting the subregion's inputs).
#
# Type mismatch error (may manifest as CUDNN_STATUS_BAD_PARAM)
Expand Down