Merge branch 'master' into jlin27-cpp-frontend-remove-conv5

Will Feng · web-flow · commit fa40da433e21 · 2019-10-06T21:09:04.000-07:00
diff --git a/advanced_source/cpp_export.rst b/advanced_source/cpp_export.rst
@@ -1,4 +1,4 @@
-3. Loading a TorchScript Model in C++
+Loading a TorchScript Model in C++
 =====================================
 
 **This tutorial was updated to work with PyTorch 1.2**
diff --git a/advanced_source/cpp_extension.rst b/advanced_source/cpp_extension.rst
@@ -946,7 +946,8 @@ without having to convert to a single pointer:
 Accessor objects have a relatively high level interface, with ``.size()`` and
 ``.stride()`` methods and multi-dimensional indexing. The ``.accessor<>``
 interface is designed to access data efficiently on cpu tensor. The equivalent
-for cuda tensors is the ``packed_accessor<>``, which produces a Packed Accessor.
+for cuda tensors are ``packed_accessor64<>`` and ``packed_accessor32<>``, which
+produce Packed Accessors with either 64-bit or 32-bit integer indexing.
 
 The fundamental difference with Accessor is that a Packed Accessor copies size
 and stride data inside of its structure instead of pointing to it. It allows us
@@ -957,34 +958,34 @@ We can design a function that takes Packed Accessors instead of pointers.
 .. code-block:: cpp
 
   __global__ void lltm_cuda_forward_kernel(
-      const torch::PackedTensorAccessor<scalar_t,3,torch::RestrictPtrTraits,size_t> gates,
-      const torch::PackedTensorAccessor<scalar_t,2,torch::RestrictPtrTraits,size_t> old_cell,
-      torch::PackedTensorAccessor<scalar_t,2,torch::RestrictPtrTraits,size_t> new_h,
-      torch::PackedTensorAccessor<scalar_t,2,torch::RestrictPtrTraits,size_t> new_cell,
-      torch::PackedTensorAccessor<scalar_t,2,torch::RestrictPtrTraits,size_t> input_gate,
-      torch::PackedTensorAccessor<scalar_t,2,torch::RestrictPtrTraits,size_t> output_gate,
-      torch::PackedTensorAccessor<scalar_t,2,torch::RestrictPtrTraits,size_t> candidate_cell)
+      const torch::PackedTensorAccessor32<scalar_t,3,torch::RestrictPtrTraits> gates,
+      const torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> old_cell,
+      torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> new_h,
+      torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> new_cell,
+      torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> input_gate,
+      torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> output_gate,
+      torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> candidate_cell)
 
 Let's decompose the template used here. the first two arguments ``scalar_t`` and
 ``2`` are the same as regular Accessor. The argument
 ``torch::RestrictPtrTraits`` indicates that the ``__restrict__`` keyword must be
-used. Finally, the argument ``size_t`` indicates that sizes and strides must be
-stored in a ``size_t`` integer. This is important as by default ``int64_t`` is
-used and can make the kernel slower.
+used. Note also that we've used the ``PackedAccessor32`` variant which store the
+sizes and strides in an ``int32_t``. This is important as using the 64-bit
+variant (``PackedAccessor64``) can make the kernel slower.
 
 The function declaration becomes
 
 .. code-block:: cpp
 
   template <typename scalar_t>
   __global__ void lltm_cuda_forward_kernel(
-      const torch::PackedTensorAccessor<scalar_t,3,torch::RestrictPtrTraits,size_t> gates,
-      const torch::PackedTensorAccessor<scalar_t,2,torch::RestrictPtrTraits,size_t> old_cell,
-      torch::PackedTensorAccessor<scalar_t,2,torch::RestrictPtrTraits,size_t> new_h,
-      torch::PackedTensorAccessor<scalar_t,2,torch::RestrictPtrTraits,size_t> new_cell,
-      torch::PackedTensorAccessor<scalar_t,2,torch::RestrictPtrTraits,size_t> input_gate,
-      torch::PackedTensorAccessor<scalar_t,2,torch::RestrictPtrTraits,size_t> output_gate,
-      torch::PackedTensorAccessor<scalar_t,2,torch::RestrictPtrTraits,size_t> candidate_cell) {
+      const torch::PackedTensorAccessor32<scalar_t,3,torch::RestrictPtrTraits> gates,
+      const torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> old_cell,
+      torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> new_h,
+      torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> new_cell,
+      torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> input_gate,
+      torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> output_gate,
+      torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> candidate_cell) {
     //batch index
     const int n = blockIdx.y;
     // column index
@@ -1000,7 +1001,7 @@ The function declaration becomes
   }
 
 The implementation is much more readable! This function is then called by
-creating Packed Accessors with the ``.packed_accessor<>`` method within the
+creating Packed Accessors with the ``.packed_accessor32<>`` method within the
 host function.
 
 .. code-block:: cpp
@@ -1029,13 +1030,13 @@ host function.
 
     AT_DISPATCH_FLOATING_TYPES(gates.type(), "lltm_forward_cuda", ([&] {
       lltm_cuda_forward_kernel<scalar_t><<<blocks, threads>>>(
-          gates.packed_accessor<scalar_t,3,torch::RestrictPtrTraits,size_t>(),
-          old_cell.packed_accessor<scalar_t,2,torch::RestrictPtrTraits,size_t>(),
-          new_h.packed_accessor<scalar_t,2,torch::RestrictPtrTraits,size_t>(),
-          new_cell.packed_accessor<scalar_t,2,torch::RestrictPtrTraits,size_t>(),
-          input_gate.packed_accessor<scalar_t,2,torch::RestrictPtrTraits,size_t>(),
-          output_gate.packed_accessor<scalar_t,2,torch::RestrictPtrTraits,size_t>(),
-          candidate_cell.packed_accessor<scalar_t,2,torch::RestrictPtrTraits,size_t>());
+          gates.packed_accessor32<scalar_t,3,torch::RestrictPtrTraits>(),
+          old_cell.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
+          new_h.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
+          new_cell.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
+          input_gate.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
+          output_gate.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
+          candidate_cell.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>());
     }));
 
     return {new_h, new_cell, input_gate, output_gate, candidate_cell, X, gates};
@@ -1048,15 +1049,15 @@ on it:
 
   template <typename scalar_t>
   __global__ void lltm_cuda_backward_kernel(
-      torch::PackedTensorAccessor<scalar_t,2,torch::RestrictPtrTraits,size_t> d_old_cell,
-      torch::PackedTensorAccessor<scalar_t,3,torch::RestrictPtrTraits,size_t> d_gates,
-      const torch::PackedTensorAccessor<scalar_t,2,torch::RestrictPtrTraits,size_t> grad_h,
-      const torch::PackedTensorAccessor<scalar_t,2,torch::RestrictPtrTraits,size_t> grad_cell,
-      const torch::PackedTensorAccessor<scalar_t,2,torch::RestrictPtrTraits,size_t> new_cell,
-      const torch::PackedTensorAccessor<scalar_t,2,torch::RestrictPtrTraits,size_t> input_gate,
-      const torch::PackedTensorAccessor<scalar_t,2,torch::RestrictPtrTraits,size_t> output_gate,
-      const torch::PackedTensorAccessor<scalar_t,2,torch::RestrictPtrTraits,size_t> candidate_cell,
-      const torch::PackedTensorAccessor<scalar_t,3,torch::RestrictPtrTraits,size_t> gate_weights) {
+      torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> d_old_cell,
+      torch::PackedTensorAccessor32<scalar_t,3,torch::RestrictPtrTraits> d_gates,
+      const torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> grad_h,
+      const torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> grad_cell,
+      const torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> new_cell,
+      const torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> input_gate,
+      const torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> output_gate,
+      const torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> candidate_cell,
+      const torch::PackedTensorAccessor32<scalar_t,3,torch::RestrictPtrTraits> gate_weights) {
     //batch index
     const int n = blockIdx.y;
     // column index
@@ -1102,15 +1103,15 @@ on it:
 
     AT_DISPATCH_FLOATING_TYPES(X.type(), "lltm_forward_cuda", ([&] {
       lltm_cuda_backward_kernel<scalar_t><<<blocks, threads>>>(
-          d_old_cell.packed_accessor<scalar_t,2,torch::RestrictPtrTraits,size_t>(),
-          d_gates.packed_accessor<scalar_t,3,torch::RestrictPtrTraits,size_t>(),
-          grad_h.packed_accessor<scalar_t,2,torch::RestrictPtrTraits,size_t>(),
-          grad_cell.packed_accessor<scalar_t,2,torch::RestrictPtrTraits,size_t>(),
-          new_cell.packed_accessor<scalar_t,2,torch::RestrictPtrTraits,size_t>(),
-          input_gate.packed_accessor<scalar_t,2,torch::RestrictPtrTraits,size_t>(),
-          output_gate.packed_accessor<scalar_t,2,torch::RestrictPtrTraits,size_t>(),
-          candidate_cell.packed_accessor<scalar_t,2,torch::RestrictPtrTraits,size_t>(),
-          gates.packed_accessor<scalar_t,3,torch::RestrictPtrTraits,size_t>());
+          d_old_cell.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
+          d_gates.packed_accessor32<scalar_t,3,torch::RestrictPtrTraits>(),
+          grad_h.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
+          grad_cell.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
+          new_cell.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
+          input_gate.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
+          output_gate.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
+          candidate_cell.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
+          gates.packed_accessor32<scalar_t,3,torch::RestrictPtrTraits>());
     }));
 
     auto d_gate_weights = d_gates.reshape({batch_size, 3*state_size});
diff --git a/advanced_source/super_resolution_with_onnxruntime.py b/advanced_source/super_resolution_with_onnxruntime.py
@@ -1,5 +1,5 @@
 """
-4. (optional) Exporting a Model from PyTorch to ONNX and Running it using ONNX Runtime
+(optional) Exporting a Model from PyTorch to ONNX and Running it using ONNX Runtime
 ========================================================================
 
 In this tutorial, we describe how to convert a model defined
diff --git a/beginner_source/Intro_to_TorchScript_tutorial.py b/beginner_source/Intro_to_TorchScript_tutorial.py
@@ -1,5 +1,5 @@
 """
-2. Introduction to TorchScript
+Introduction to TorchScript
 ===========================
 
 *James Reed (jamesreed@fb.com), Michael Suo (suo@fb.com)*, rev2
@@ -24,7 +24,7 @@
 -  How to compose both approaches
 -  Saving and loading TorchScript modules
 
-We hope that after you complete this tutorial, you proceed to go through
+We hope that after you complete this tutorial, you will proceed to go through
 `the follow-on tutorial <https://pytorch.org/tutorials/advanced/cpp_export.html>`_
 which will walk you through an example of actually calling a TorchScript
 model from C++.
diff --git a/beginner_source/aws_distributed_training_tutorial.py b/beginner_source/aws_distributed_training_tutorial.py
@@ -1,5 +1,5 @@
 """
-4. (advanced) PyTorch 1.0 Distributed Trainer with Amazon AWS
+(advanced) PyTorch 1.0 Distributed Trainer with Amazon AWS
 =============================================================
 
 **Author**: `Nathan Inkawhich <https://github.com/inkawhich>`_
diff --git a/beginner_source/blitz/cifar10_tutorial.py b/beginner_source/blitz/cifar10_tutorial.py
@@ -185,6 +185,15 @@ def forward(self, x):
 print('Finished Training')
 
 ########################################################################
+# Let's quickly save our trained model:
+
+PATH = './cifar_net.pth'
+torch.save(net.state_dict(), PATH)
+
+########################################################################
+# See `here <https://pytorch.org/docs/stable/notes/serialization.html>`_
+# for more details on saving PyTorch models.
+#
 # 5. Test the network on the test data
 # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 #
@@ -204,6 +213,13 @@ def forward(self, x):
 imshow(torchvision.utils.make_grid(images))
 print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))
 
+########################################################################
+# Next, let's load back in our saved model (note: saving and re-loading the model
+# wasn't necessary here, we only did it to illustrate how to do so):
+
+net = Net()
+net.load_state_dict(torch.load(PATH))
+
 ########################################################################
 # Okay, now let us see what the neural network thinks these examples above are:
 
diff --git a/beginner_source/chatbot_tutorial.py b/beginner_source/chatbot_tutorial.py
@@ -537,7 +537,7 @@ def outputVar(l, voc):
     max_target_len = max([len(indexes) for indexes in indexes_batch])
     padList = zeroPadding(indexes_batch)
     mask = binaryMatrix(padList)
-    mask = torch.ByteTensor(mask)
+    mask = torch.BoolTensor(mask)
     padVar = torch.LongTensor(padList)
     return padVar, mask, max_target_len
 
diff --git a/beginner_source/data_loading_tutorial.py b/beginner_source/data_loading_tutorial.py
@@ -1,7 +1,7 @@
 # -*- coding: utf-8 -*-
 """
-Data Loading and Processing Tutorial
-====================================
+Writing Custom Datasets, DataLoaders and Transforms
+===================================================
 **Author**: `Sasank Chilamkurthy <https://chsasank.github.io>`_
 
 A lot of effort in solving any machine learning problem goes in to
diff --git a/beginner_source/pytorch_with_examples.rst b/beginner_source/pytorch_with_examples.rst
@@ -123,43 +123,6 @@ network:
 
 .. includenodoc:: /beginner/examples_autograd/two_layer_net_custom_function.py
 
-TensorFlow: Static Graphs
--------------------------
-
-PyTorch autograd looks a lot like TensorFlow: in both frameworks we
-define a computational graph, and use automatic differentiation to
-compute gradients. The biggest difference between the two is that
-TensorFlow's computational graphs are **static** and PyTorch uses
-**dynamic** computational graphs.
-
-In TensorFlow, we define the computational graph once and then execute
-the same graph over and over again, possibly feeding different input
-data to the graph. In PyTorch, each forward pass defines a new
-computational graph.
-
-Static graphs are nice because you can optimize the graph up front; for
-example a framework might decide to fuse some graph operations for
-efficiency, or to come up with a strategy for distributing the graph
-across many GPUs or many machines. If you are reusing the same graph
-over and over, then this potentially costly up-front optimization can be
-amortized as the same graph is rerun over and over.
-
-One aspect where static and dynamic graphs differ is control flow. For
-some models we may wish to perform different computation for each data
-point; for example a recurrent network might be unrolled for different
-numbers of time steps for each data point; this unrolling can be
-implemented as a loop. With a static graph the loop construct needs to
-be a part of the graph; for this reason TensorFlow provides operators
-such as ``tf.scan`` for embedding loops into the graph. With dynamic
-graphs the situation is simpler: since we build graphs on-the-fly for
-each example, we can use normal imperative flow control to perform
-computation that differs for each input.
-
-To contrast with the PyTorch autograd example above, here we use
-TensorFlow to fit a simple two-layer net:
-
-.. includenodoc:: /beginner/examples_autograd/tf_two_layer_net.py
-
 `nn` module
 ===========
 
diff --git a/beginner_source/transfer_learning_tutorial.py b/beginner_source/transfer_learning_tutorial.py
@@ -1,12 +1,12 @@
 # -*- coding: utf-8 -*-
 """
-Transfer Learning Tutorial
-==========================
+Transfer Learning for Computer Vision Tutorial
+==============================================
 **Author**: `Sasank Chilamkurthy <https://chsasank.github.io>`_
 
-In this tutorial, you will learn how to train your network using
-transfer learning. You can read more about the transfer learning at `cs231n
-notes <https://cs231n.github.io/transfer-learning/>`__
+In this tutorial, you will learn how to train a convolutional neural network for
+image classification using transfer learning. You can read more about the transfer
+learning at `cs231n notes <https://cs231n.github.io/transfer-learning/>`__
 
 Quoting these notes,
 
diff --git a/index.rst b/index.rst
diff --git a/intermediate_source/ddp_tutorial.rst b/intermediate_source/ddp_tutorial.rst
diff --git a/intermediate_source/dist_tuto.rst b/intermediate_source/dist_tuto.rst
diff --git a/intermediate_source/flask_rest_api_tutorial.py b/intermediate_source/flask_rest_api_tutorial.py
diff --git a/intermediate_source/model_parallel_tutorial.py b/intermediate_source/model_parallel_tutorial.py

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-3. Loading a TorchScript Model in C++`
	`1`	`+Loading a TorchScript Model in C++`
`2`	`2`	`=====================================`
`3`	`3`
`4`	`4`	`This tutorial was updated to work with PyTorch 1.2`