Skip to content

Commit b970022

Browse files
authored
Merge branch 'master' into ray-tune
2 parents 37e4049 + 3191c0c commit b970022

File tree

9 files changed

+88
-40
lines changed

9 files changed

+88
-40
lines changed

.circleci/config.yml

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -562,10 +562,11 @@ workflows:
562562
branches:
563563
only:
564564
- master
565-
- pytorch_windows_build_worker:
566-
name: win_test_worker
567-
filters:
568-
branches:
569-
only:
570-
- master
565+
# - pytorch_windows_build_worker:
566+
# name: win_test_worker
567+
# type: approval
568+
# filters:
569+
# branches:
570+
# only:
571+
# - master
571572

advanced_source/dispatcher.rst

Lines changed: 61 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -229,38 +229,75 @@ Autocast
229229
^^^^^^^^
230230

231231
The Autocast dispatch key implements support for
232-
`automatic mixed precision <https://developer.nvidia.com/automatic-mixed-precision>`_
233-
(AMP). An autocast kernel typically modifies the operation of an operator by casting the
234-
input arguments to some precision before carrying out the operation. For some
235-
operations, it is numerically safe to cast to lower precision, which is how AMP
236-
can achieve speed ups and reduced memory usage without sacrificing much
237-
accuracy. A nontrivial autocast kernel looks something like this:
232+
`automatic mixed precision (AMP) <https://pytorch.org/docs/stable/amp.html>`_.
233+
An autocast wrapper kernel typically casts incoming ``float16`` or ``float32`` CUDA tensors
234+
to some preferred precision before running the op.
235+
For example, matmuls and convolutions on floating-point CUDA tensors usually run faster
236+
and use less memory in ``float16`` without impairing convergence.
237+
Autocast wrappers only have an effect in
238+
`autocast-enabled contexts <https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.autocast>`_.
239+
240+
Here's an autocast wrapper for a hypothetical custom matmul, along with its registration:
238241

239242
.. code-block:: cpp
240243
244+
// Autocast-specific helper functions
245+
#include <ATen/autocast_mode.h>
246+
241247
Tensor mymatmul_autocast(const Tensor& self, const Tensor& other) {
242248
c10::impl::ExcludeDispatchKeyGuard no_autocast(c10::DispatchKey::Autocast);
243-
return mymatmul(autocast::_cast(at::kHalf, self), autocast::_cast(at::kHalf, other));
249+
return mymatmul(at::autocast::cached_cast(at::kHalf, self),
250+
at::autocast::cached_cast(at::kHalf, other));
251+
}
252+
253+
TORCH_LIBRARY_IMPL(myops, Autocast, m) {
254+
m.impl("mymatmul", mymatmul_autocast);
244255
}
245256
257+
``cached_cast(kHalf, tensor)`` casts ``tensor`` to ``float16`` if ``tensor`` is CUDA and ``float32``,
258+
otherwise, it leaves ``tensor`` unchanged (c.f. the
259+
`eligibility policy <https://pytorch.org/docs/stable/amp.html#op-eligibility>`_ for natively autocasted ops).
260+
This ensures if the network calls ``mymatmul`` on any mixture of ``float16`` and ``float32`` CUDA tensors,
261+
``mymatmul`` runs in ``float16``. Meanwhile, calls to ``mymatmul`` with non-CUDA, integer-type, or ``float64``
262+
inputs are unaffected. Using ``cached_cast`` to follow the native eligibility policy in your own autocast wrapper
263+
is recommended, but not required. For example, if you wanted to force ``float16`` execution for all input types,
264+
you could ``return mymatmul(self.half(), other.half());`` instead of using ``cached_cast``.
265+
246266
Notice that, like our autograd kernels, we exclude the ``Autocast`` key from
247-
dispatch before redispatching. By default, if no autocast kernel is provided,
248-
we simply fallthrough directly to the regular operator implementation (no
249-
autocasting occurs.) (We didn't use ``myadd`` for this example, since pointwise
250-
addition doesn't do autocasting and should just fall through).
251-
252-
When should an autocast kernel be registered? Unfortunately, there aren't
253-
cut-and-dry rules for when you should cast to a lower precision. You can
254-
get a sense for what operators have autocasting behavior by looking at
255-
the `AMP documentation
256-
<https://pytorch.org/docs/master/amp.html#op-specific-behavior>`_. Some other
257-
general rules:
258-
259-
* Operations that do reductions should be carried out in float32,
260-
* Any operation with multiple float tensor inputs has to standardize them
261-
to a common precision, and
262-
* Any operation that does a convolution or gemm under the hood should
263-
probably be float16
267+
dispatch before redispatching.
268+
269+
By default, if no autocast wrapper is provided,
270+
we fallthrough directly to the regular operator implementation (no
271+
autocasting occurs). (We didn't use ``myadd`` for this example, since pointwise
272+
addition doesn't need autocasting and should just fall through.)
273+
274+
When should an autocast wrapper be registered? Unfortunately, there aren't
275+
cut-and-dried rules for an op's preferred precision. You can
276+
get a sense for some native ops' preferred precisions by looking at the
277+
`cast lists <https://pytorch.org/docs/master/amp.html#op-specific-behavior>`_.
278+
General guidance:
279+
280+
* Ops that do reductions should probably execute in ``float32``,
281+
* Any op that does a convolution or gemm under the hood should
282+
probably execute in ``float16``, and
283+
* Other ops with multiple floating-point tensor inputs should standardize
284+
them to a common precision (unless the implementation supports inputs with different precisions).
285+
286+
If your custom op falls into the third category, the ``promote_type`` template
287+
helps figure out the widest floating-point type present among input tensors, which is
288+
the safest choice for the execution type:
289+
290+
.. code-block:: cpp
291+
292+
#include <ATen/autocast_mode.h>
293+
294+
Tensor my_multiple_input_op_autocast(const Tensor& t0, const Tensor& t1) {
295+
c10::impl::ExcludeDispatchKeyGuard no_autocast(c10::DispatchKey::Autocast);
296+
// The required at::kHalf argument is an optimistic initial guess.
297+
auto exec_type = at::autocast::promote_type(at::kHalf, t0, t1);
298+
return my_multiple_input_op(at::autocast::cached_cast(exec_type, t0),
299+
at::autocast::cached_cast(exec_type, t1));
300+
}
264301
265302
Batched
266303
^^^^^^^

beginner_source/data_loading_tutorial.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -374,7 +374,7 @@ def __call__(self, sample):
374374
#
375375

376376
dataloader = DataLoader(transformed_dataset, batch_size=4,
377-
shuffle=True, num_workers=4)
377+
shuffle=True, num_workers=0)
378378

379379

380380
# Helper function to show a batch

beginner_source/dcgan_faces_tutorial.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -591,7 +591,7 @@ def forward(self, input):
591591
# Format batch
592592
real_cpu = data[0].to(device)
593593
b_size = real_cpu.size(0)
594-
label = torch.full((b_size,), real_label, device=device)
594+
label = torch.full((b_size,), real_label, dtype=torch.float, device=device)
595595
# Forward pass real batch through D
596596
output = netD(real_cpu).view(-1)
597597
# Calculate loss on all-real batch

beginner_source/nlp/sequence_models_tutorial.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
part-of-speech tags, and a myriad of other things.
2222
2323
24-
LSTM's in Pytorch
24+
LSTMs in Pytorch
2525
~~~~~~~~~~~~~~~~~
2626
2727
Before getting to the example, note a few things. Pytorch's LSTM expects

index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -206,7 +206,7 @@ Welcome to PyTorch Tutorials
206206
:header: (prototype) Introduction to Named Tensors in PyTorch
207207
:card_description: Learn how to use PyTorch to train a Deep Q Learning (DQN) agent on the CartPole-v0 task from the OpenAI Gym.
208208
:image: _static/img/thumbnails/cropped/experimental-Introduction-to-Named-Tensors-in-PyTorch.png
209-
:link: intermediate/memory_format_tutorial.html
209+
:link: intermediate/named_tensor_tutorial.html
210210
:tags: Frontend-APIs,Named-Tensor,Best-Practice
211211

212212
.. customcarditem::

intermediate_source/memory_format_tutorial.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -261,14 +261,17 @@ def check_cl(*args, **kwargs):
261261
return result
262262
return check_cl
263263

264+
old_attrs = dict()
264265

265266
def attribute(m):
267+
old_attrs[m] = dict()
266268
for i in dir(m):
267269
e = getattr(m, i)
268270
exclude_functions = ['is_cuda', 'has_names', 'numel',
269271
'stride', 'Tensor', 'is_contiguous', '__class__']
270272
if i not in exclude_functions and not i.startswith('_') and '__call__' in dir(e):
271273
try:
274+
old_attrs[m][i] = e
272275
setattr(m, i, check_wrapper(e))
273276
except Exception as e:
274277
print(i)
@@ -286,6 +289,13 @@ def attribute(m):
286289
# guide https://github.com/pytorch/pytorch/wiki/Writing-memory-format-aware-operators.
287290
#
288291

292+
######################################################################
293+
# Code below is to recover the attributes of torch.
294+
295+
for (m, attrs) in old_attrs.items():
296+
for (k,v) in attrs.items():
297+
setattr(m, k, v)
298+
289299
######################################################################
290300
# Work to do
291301
# ----------

intermediate_source/model_parallel_tutorial.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ def forward(self, x):
8686
#
8787
# It is also possible to run an existing single-GPU module on multiple GPUs
8888
# with just a few lines of changes. The code below shows how to decompose
89-
# ``torchvision.models.reset50()`` to two GPUs. The idea is to inherit from
89+
# ``torchvision.models.resnet50()`` to two GPUs. The idea is to inherit from
9090
# the existing ``ResNet`` module, and split the layers to two GPUs during
9191
# construction. Then, override the ``forward`` method to stitch two
9292
# sub-networks by moving the intermediate outputs accordingly.
@@ -136,7 +136,7 @@ def forward(self, x):
136136
#
137137
# Let us run an experiment to get a more quantitative view of the execution
138138
# time. In this experiment, we train ``ModelParallelResNet50`` and the existing
139-
# ``torchvision.models.reset50()`` by running random inputs and labels through
139+
# ``torchvision.models.resnet50()`` by running random inputs and labels through
140140
# them. After the training, the models will not produce any useful predictions,
141141
# but we can get a reasonable understanding of the execution times.
142142

recipes_source/recipes_index.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
PyTorch Recipes
22
---------------------------------------------
3-
Recipes are bite-sized bite-sized, actionable examples of how to use specific PyTorch features, different from our full-length tutorials.
3+
Recipes are bite-sized, actionable examples of how to use specific PyTorch features, different from our full-length tutorials.
44

55
.. raw:: html
66

@@ -40,14 +40,14 @@ Recipes are bite-sized bite-sized, actionable examples of how to use specific Py
4040

4141
.. customcarditem::
4242
:header: Defining a Neural Network
43-
:card_description: Learn how to use PyTorch's torch.nn package to create and define a neural network the MNIST dataset.
43+
:card_description: Learn how to use PyTorch's torch.nn package to create and define a neural network for the MNIST dataset.
4444
:image: ../_static/img/thumbnails/cropped/defining-a-network.PNG
4545
:link: ../recipes/recipes/defining_a_neural_network.html
4646
:tags: Basics
4747

4848
.. customcarditem::
4949
:header: What is a state_dict in PyTorch
50-
:card_description: Learn how state_dict objects, Python dictionaries, are used in saving or loading models from PyTorch.
50+
:card_description: Learn how state_dict objects and Python dictionaries are used in saving or loading models from PyTorch.
5151
:image: ../_static/img/thumbnails/cropped/what-is-a-state-dict.PNG
5252
:link: ../recipes/recipes/what_is_state_dict.html
5353
:tags: Basics
@@ -90,7 +90,7 @@ Recipes are bite-sized bite-sized, actionable examples of how to use specific Py
9090

9191
.. customcarditem::
9292
:header: Zeroing out gradients in PyTorch
93-
:card_description: Learn when you should zero out graidents and how doing so can help increase the accuracy of your model.
93+
:card_description: Learn when you should zero out gradients and how doing so can help increase the accuracy of your model.
9494
:image: ../_static/img/thumbnails/cropped/zeroing-out-gradients.PNG
9595
:link: ../recipes/recipes/zeroing_out_gradients.html
9696
:tags: Basics

0 commit comments

Comments
 (0)