diff --git a/beginner_source/introyt.rst b/beginner_source/introyt.rst new file mode 100644 index 00000000000..6a47517667d --- /dev/null +++ b/beginner_source/introyt.rst @@ -0,0 +1,29 @@ +`Introduction `_ || +`Tensors `_ || +`Autograd `_ || +`Building Models `_ || +`TensorBoard Support `_ || +`Training Models `_ || +`Model Understanding `_ + +Introduction to PyTorch - YouTube Series +======================================== + +Authors: +`Brad Heintz `_ + +This tutorial follows along with the `PyTorch Beginner Series `_ on youtube. + +`This tutorial assumes a basic familiarity with Python and Deep Learning concepts.` + +Running the Tutorial Code +------------------------- +You can run this tutorial in a couple of ways: + +- **In the cloud**: This is the easiest way to get started! Each section has a Colab link at the top, which opens a notebook with the code in a fully-hosted environment. Pro tip: Use Colab with a GPU runtime to speed up operations *Runtime > Change runtime type > GPU* +- **Locally**: This option requires you to setup PyTorch and TorchVision first on your local machine (`installation instructions `_). Download the notebook or copy the code into your favorite IDE. + +.. include:: /beginner_source/introyt/tocyt.txt + +.. toctree:: + :hidden: diff --git a/beginner_source/introyt/README.txt b/beginner_source/introyt/README.txt new file mode 100644 index 00000000000..ebe8f2e9c21 --- /dev/null +++ b/beginner_source/introyt/README.txt @@ -0,0 +1,34 @@ +Introduction to PyTorch on YouTube +---------------------------------- + +1. introyt.rst + Introduction to PyTorch - Youtube Series + https://pytorch.org/tutorials/beginner/introyt/introyt.html + +2. introyt1_tutorial.py + Introduction to PyTorch + https://pytorch.org/tutorials/beginner/introyt/introyt1_tutorial.html + +3. tensors_deeper_tutorial.py + PyTorch Tensors + https://pytorch.org/tutorials/beginner/introyt/tensors_deeper_tutorial.html + +4. autogradyt_tutorial.py + The Fundamentals of Autograd + https://pytorch.org/tutorials/beginner/introyt/autogradyt_tutorial.html + +5. modelsyt_tutorial.py + Building Models with PyTorch + https://pytorch.org/tutorials/beginner/introyt/modelsyt_tutorial.html + +6. tensorboardyt_tutorial.py + PyTorch TensorBoard Support + https://pytorch.org/tutorials/beginner/introyt/tensorboardyt_tutorial.html + +7. trainingyt_tutorial.py + Training with PyTorch + https://pytorch.org/tutorials/beginner/introyt/trainingyt_tutorial.html + +8. captumyt_tutorial.py + Model Understanding with Captum + https://pytorch.org/tutorials/beginner/introyt/captumyt_tutorial.html diff --git a/beginner_source/introyt/autogradyt_tutorial.py b/beginner_source/introyt/autogradyt_tutorial.py new file mode 100644 index 00000000000..5a466e1d36c --- /dev/null +++ b/beginner_source/introyt/autogradyt_tutorial.py @@ -0,0 +1,655 @@ +""" +`Introduction `_ || +`Tensors `_ || +**Autograd** || +`Building Models `_ || +`TensorBoard Support `_ || +`Training Models `_ || +`Model Understanding `_ + +The Fundamentals of Autograd +============================ + +Follow along with the video below or on `youtube `__. + +.. raw:: html + +
+ +
+ +PyTorch’s *Autograd* feature is part of what make PyTorch flexible and +fast for building machine learning projects. It allows for the rapid and +easy computation of multiple partial derivatives (also referred to as +*gradients)* over a complex computation. This operation is central to +backpropagation-based neural network learning. + +The power of autograd comes from the fact that it traces your +computation dynamically *at runtime,* meaning that if your model has +decision branches, or loops whose lengths are not known until runtime, +the computation will still be traced correctly, and you’ll get correct +gradients to drive learning. This, combined with the fact that your +models are built in Python, offers far more flexibility than frameworks +that rely on static analysis of a more rigidly-structured model for +computing gradients. + +What Do We Need Autograd For? +----------------------------- + +""" + +########################################################################### +# A machine learning model is a *function*, with inputs and outputs. For +# this discussion, we’ll treat the inputs a as an *i*-dimensional vector +# :math:`\vec{x}`, with elements :math:`x_{i}`. We can then express the +# model, *M*, as a vector-valued function of the input: :math:`\vec{y} = +# \vec{M}(\vec{x})`. (We treat the value of M’s output as +# a vector because in general, a model may have any number of outputs.) +# +# Since we’ll mostly be discussing autograd in the context of training, +# our output of interest will be the model’s loss. The *loss function* +# L(:math:`\vec{y}`) = L(:math:`\vec{M}`\ (:math:`\vec{x}`)) is a +# single-valued scalar function of the model’s output. This function +# expresses how far off our model’s prediction was from a particular +# input’s *ideal* output. *Note: After this point, we will often omit the +# vector sign where it should be contextually clear - e.g.,* :math:`y` +# instead of :math:`\vec y`. +# +# In training a model, we want to minimize the loss. In the idealized case +# of a perfect model, that means adjusting its learning weights - that is, +# the adjustable parameters of the function - such that loss is zero for +# all inputs. In the real world, it means an iterative process of nudging +# the learning weights until we see that we get a tolerable loss for a +# wide variety of inputs. +# +# How do we decide how far and in which direction to nudge the weights? We +# want to *minimize* the loss, which means making its first derivative +# with respect to the input equal to 0: +# :math:`\frac{\partial L}{\partial x} = 0`. +# +# Recall, though, that the loss is not *directly* derived from the input, +# but a function of the model’s output (which is a function of the input +# directly), :math:`\frac{\partial L}{\partial x}` = +# :math:`\frac{\partial {L({\vec y})}}{\partial x}`. By the chain rule of +# differential calculus, we have +# :math:`\frac{\partial {L({\vec y})}}{\partial x}` = +# :math:`\frac{\partial L}{\partial y}\frac{\partial y}{\partial x}` = +# :math:`\frac{\partial L}{\partial y}\frac{\partial M(x)}{\partial x}`. +# +# :math:`\frac{\partial M(x)}{\partial x}` is where things get complex. +# The partial derivatives of the model’s outputs with respect to its +# inputs, if we were to expand the expression using the chain rule again, +# would involve many local partial derivatives over every multiplied +# learning weight, every activation function, and every other mathematical +# transformation in the model. The full expression for each such partial +# derivative is the sum of the products of the local gradient of *every +# possible path* through the computation graph that ends with the variable +# whose gradient we are trying to measure. +# +# In particular, the gradients over the learning weights are of interest +# to us - they tell us *what direction to change each weight* to get the +# loss function closer to zero. +# +# Since the number of such local derivatives (each corresponding to a +# separate path through the model’s computation graph) will tend to go up +# exponentially with the depth of a neural network, so does the complexity +# in computing them. This is where autograd comes in: It tracks the +# history of every computation. Every computed tensor in your PyTorch +# model carries a history of its input tensors and the function used to +# create it. Combined with the fact that PyTorch functions meant to act on +# tensors each have a built-in implementation for computing their own +# derivatives, this greatly speeds the computation of the local +# derivatives needed for learning. +# +# A Simple Example +# ---------------- +# +# That was a lot of theory - but what does it look like to use autograd in +# practice? +# +# Let’s start with a straightforward example. First, we’ll do some imports +# to let us graph our results: +# + +# %matplotlib inline + +import torch + +import matplotlib.pyplot as plt +import matplotlib.ticker as ticker +import math + + +######################################################################### +# Next, we’ll create an input tensor full of evenly spaced values on the +# interval :math:`[0, 2{\pi}]`, and specify ``requires_grad=True``. (Like +# most functions that create tensors, ``torch.linspace()`` accepts an +# optional ``requires_grad`` option.) Setting this flag means that in +# every computation that follows, autograd will be accumulating the +# history of the computation in the output tensors of that computation. +# + +a = torch.linspace(0., 2. * math.pi, steps=25, requires_grad=True) +print(a) + + +######################################################################## +# Next, we’ll perform a computation, and plot its output in terms of its +# inputs: +# + +b = torch.sin(a) +plt.plot(a.detach(), b.detach()) + + +######################################################################## +# Let’s have a closer look at the tensor ``b``. When we print it, we see +# an indicator that it is tracking its computation history: +# + +print(b) + + +####################################################################### +# This ``grad_fn`` gives us a hint that when we execute the +# backpropagation step and compute gradients, we’ll need to compute the +# derivative of :math:`sin(x)` for all this tensor’s inputs. +# +# Let’s perform some more computations: +# + +c = 2 * b +print(c) + +d = c + 1 +print(d) + + +########################################################################## +# Finally, let’s compute a single-element output. When you call +# ``.backward()`` on a tensor with no arguments, it expects the calling +# tensor to contain only a single element, as is the case when computing a +# loss function. +# + +out = d.sum() +print(out) + + +########################################################################## +# Each ``grad_fn`` stored with our tensors allows you to walk the +# computation all the way back to its inputs with its ``next_functions`` +# property. We can see below that drilling down on this property on ``d`` +# shows us the gradient functions for all the prior tensors. Note that +# ``a.grad_fn`` is reported as ``None``, indicating that this was an input +# to the function with no history of its own. +# + +print('d:') +print(d.grad_fn) +print(d.grad_fn.next_functions) +print(d.grad_fn.next_functions[0][0].next_functions) +print(d.grad_fn.next_functions[0][0].next_functions[0][0].next_functions) +print(d.grad_fn.next_functions[0][0].next_functions[0][0].next_functions[0][0].next_functions) +print('\nc:') +print(c.grad_fn) +print('\nb:') +print(b.grad_fn) +print('\na:') +print(a.grad_fn) + + +###################################################################### +# With all this machinery in place, how do we get derivatives out? You +# call the ``backward()`` method on the output, and check the input’s +# ``grad`` property to inspect the gradients: +# + +out.backward() +print(a.grad) +plt.plot(a.detach(), a.grad.detach()) + + +######################################################################### +# Recall the computation steps we took to get here: +# +# :: +# +# a = torch.linspace(0., 2. * math.pi, steps=25, requires_grad=True) +# b = torch.sin(a) +# c = 2 * b +# d = c + 1 +# out = d.sum() +# +# Adding a constant, as we did to compute ``d``, does not change the +# derivative. That leaves :math:`c = 2 * b = 2 * sin(a)`, the derivative +# of which should be :math:`2 * cos(a)`. Looking at the graph above, +# that’s just what we see. +# +# Be aware than only *leaf nodes* of the computation have their gradients +# computed. If you tried, for example, ``print(c.grad)`` you’d get back +# ``None``. In this simple example, only the input is a leaf node, so only +# it has gradients computed. +# +# Autograd in Training +# -------------------- +# +# We’ve had a brief look at how autograd works, but how does it look when +# it’s used for its intended purpose? Let’s define a small model and +# examine how it changes after a single training batch. First, define a +# few constants, our model, and some stand-ins for inputs and outputs: +# + +BATCH_SIZE = 16 +DIM_IN = 1000 +HIDDEN_SIZE = 100 +DIM_OUT = 10 + +class TinyModel(torch.nn.Module): + + def __init__(self): + super(TinyModel, self).__init__() + + self.layer1 = torch.nn.Linear(1000, 100) + self.relu = torch.nn.ReLU() + self.layer2 = torch.nn.Linear(100, 10) + + def forward(self, x): + x = self.layer1(x) + x = self.relu(x) + x = self.layer2(x) + return x + +some_input = torch.randn(BATCH_SIZE, DIM_IN, requires_grad=False) +ideal_output = torch.randn(BATCH_SIZE, DIM_OUT, requires_grad=False) + +model = TinyModel() + + +########################################################################## +# One thing you might notice is that we never specify +# ``requires_grad=True`` for the model’s layers. Within a subclass of +# ``torch.nn.module``, it’s assumed that we want to track gradients on the +# layers’ weights for learning. +# +# If we look at the layers of the model, we can examine the values of the +# weights, and verify that no gradients have been computed yet: +# + +print(model.layer2.weight[0][0:10]) # just a small slice +print(model.layer2.weight.grad) + + +########################################################################## +# Let’s see how this changes when we run through one training batch. For a +# loss function, we’ll just use the square of the Euclidean distance +# between our ``prediction`` and the ``ideal_output``, and we’ll use a +# basic stochastic gradient descent optimizer. +# + +optimizer = torch.optim.SGD(model.parameters(), lr=0.001) + +prediction = model(some_input) + +loss = (ideal_output - prediction).pow(2).sum() +print(loss) + + +###################################################################### +# Now, let’s call ``loss.backward()`` and see what happens: +# + +loss.backward() +print(model.layer2.weight[0][0:10]) +print(model.layer2.weight.grad[0][0:10]) + + +######################################################################## +# We can see that the gradients have been computed for each learning +# weight, but the weights remain unchanged, because we haven’t run the +# optimizer yet. The optimizer is responsible for updating model weights +# based on the computed gradients. +# + +optimizer.step() +print(model.layer2.weight[0][0:10]) +print(model.layer2.weight.grad[0][0:10]) + + +###################################################################### +# You should see that ``layer2``\ ’s weights have changed. +# +# One important thing about the process: After calling +# ``optimizer.step()``, you need to call ``optimizer.zero_grad()``, or +# else every time you run ``loss.backward()``, the gradients on the +# learning weights will accumulate: +# + +print(model.layer2.weight.grad[0][0:10]) + +for i in range(0, 5): + prediction = model(some_input) + loss = (ideal_output - prediction).pow(2).sum() + loss.backward() + +print(model.layer2.weight.grad[0][0:10]) + +optimizer.zero_grad() + +print(model.layer2.weight.grad[0][0:10]) + + +######################################################################### +# After running the cell above, you should see that after running +# ``loss.backward()`` multiple times, the magnitudes of most of the +# gradients will be much larger. Failing to zero the gradients before +# running your next training batch will cause the gradients to blow up in +# this manner, causing incorrect and unpredictable learning results. +# +# Turning Autograd Off and On +# --------------------------- +# +# There are situations where you will need fine-grained control over +# whether autograd is enabled. There are multiple ways to do this, +# depending on the situation. +# +# The simplest is to change the ``requires_grad`` flag on a tensor +# directly: +# + +a = torch.ones(2, 3, requires_grad=True) +print(a) + +b1 = 2 * a +print(b1) + +a.requires_grad = False +b2 = 2 * a +print(b2) + + +########################################################################## +# In the cell above, we see that ``b1`` has a ``grad_fn`` (i.e., a traced +# computation history), which is what we expect, since it was derived from +# a tensor, ``a``, that had autograd turned on. When we turn off autograd +# explicitly with ``a.requires_grad = False``, computation history is no +# longer tracked, as we see when we compute ``b2``. +# +# If you only need autograd turned off temporarily, a better way is to use +# the ``torch.no_grad()``: +# + +a = torch.ones(2, 3, requires_grad=True) * 2 +b = torch.ones(2, 3, requires_grad=True) * 3 + +c1 = a + b +print(c1) + +with torch.no_grad(): + c2 = a + b + +print(c2) + +c3 = a * b +print(c3) + + +########################################################################## +# ``torch.no_grad()`` can also be used as a function or method dectorator: +# + +def add_tensors1(x, y): + return x + y + +@torch.no_grad() +def add_tensors2(x, y): + return x + y + + +a = torch.ones(2, 3, requires_grad=True) * 2 +b = torch.ones(2, 3, requires_grad=True) * 3 + +c1 = add_tensors1(a, b) +print(c1) + +c2 = add_tensors2(a, b) +print(c2) + + +########################################################################## +# There’s a corresponding context manager, ``torch.enable_grad()``, for +# turning autograd on when it isn’t already. It may also be used as a +# decorator. +# +# Finally, you may have a tensor that requires gradient tracking, but you +# want a copy that does not. For this we have the ``Tensor`` object’s +# ``detach()`` method - it creates a copy of the tensor that is *detached* +# from the computation history: +# + +x = torch.rand(5, requires_grad=True) +y = x.detach() + +print(x) +print(y) + + +######################################################################### +# We did this above when we wanted to graph some of our tensors. This is +# because ``matplotlib`` expects a NumPy array as input, and the implicit +# conversion from a PyTorch tensor to a NumPy array is not enabled for +# tensors with requires_grad=True. Making a detached copy lets us move +# forward. +# +# Autograd and In-place Operations +# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +# +# In every example in this notebook so far, we’ve used variables to +# capture the intermediate values of a computation. Autograd needs these +# intermediate values to perform gradient computations. *For this reason, +# you must be careful about using in-place operations when using +# autograd.* Doing so can destroy information you need to compute +# derivatives in the ``backward()`` call. PyTorch will even stop you if +# you attempt an in-place operation on leaf variable that requires +# autograd, as shown below. +# +# .. note:: +# The following code cell throws a runtime error. This is expected. +# +# :: +# +# a = torch.linspace(0., 2. * math.pi, steps=25, requires_grad=True) +# torch.sin_(a) +# + +######################################################################### +# Autograd Profiler +# ----------------- +# +# Autograd tracks every step of your computation in detail. Such a +# computation history, combined with timing information, would make a +# handy profiler - and autograd has that feature baked in. Here’s a quick +# example usage: +# + +device = torch.device('cpu') +run_on_gpu = False +if torch.cuda.is_available(): + device = torch.device('cuda') + run_on_gpu = True + +x = torch.randn(2, 3, requires_grad=True) +y = torch.rand(2, 3, requires_grad=True) +z = torch.ones(2, 3, requires_grad=True) + +with torch.autograd.profiler.profile(use_cuda=run_on_gpu) as prf: + for _ in range(1000): + z = (z / x) * y + +print(prf.key_averages().table(sort_by='self_cpu_time_total')) + + +########################################################################## +# The profiler can also label individual sub-blocks of code, break out the +# data by input tensor shape, and export data as a Chrome tracing tools +# file. For full details of the API, see the +# `documentation `__. +# +# Advanced Topic: More Autograd Detail and the High-Level API +# ----------------------------------------------------------- +# +# If you have a function with an n-dimensional input and m-dimensional +# output, :math:`\vec{y}=f(\vec{x})`, the complete gradient is a matrix of +# the derivative of every output with respect to every input, called the +# *Jacobian:* +# +# .. math:: +# +# J +# = +# \left(\begin{array}{ccc} +# \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}}\\ +# \vdots & \ddots & \vdots\\ +# \frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}} +# \end{array}\right) +# +# If you have a second function, :math:`l=g\left(\vec{y}\right)` that +# takes m-dimensional input (that is, the same dimensionality as the +# output above), and returns a scalar output, you can express its +# gradients with respect to :math:`\vec{y}` as a column vector, +# :math:`v=\left(\begin{array}{ccc}\frac{\partial l}{\partial y_{1}} & \cdots & \frac{\partial l}{\partial y_{m}}\end{array}\right)^{T}` +# - which is really just a one-column Jacobian. +# +# More concretely, imagine the first function as your PyTorch model (with +# potentially many inputs and many outputs) and the second function as a +# loss function (with the model’s output as input, and the loss value as +# the scalar output). +# +# If we multiply the first function’s Jacobian by the gradient of the +# second function, and apply the chain rule, we get: +# +# .. math:: +# +# J^{T}\cdot v=\left(\begin{array}{ccc} +# \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{1}}\\ +# \vdots & \ddots & \vdots\\ +# \frac{\partial y_{1}}{\partial x_{n}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}} +# \end{array}\right)\left(\begin{array}{c} +# \frac{\partial l}{\partial y_{1}}\\ +# \vdots\\ +# \frac{\partial l}{\partial y_{m}} +# \end{array}\right)=\left(\begin{array}{c} +# \frac{\partial l}{\partial x_{1}}\\ +# \vdots\\ +# \frac{\partial l}{\partial x_{n}} +# \end{array}\right) +# +# Note: You could also use the equivalent operation :math:`v^{T}\cdot J`, +# and get back a row vector. +# +# The resulting column vector is the *gradient of the second function with +# respect to the inputs of the first* - or in the case of our model and +# loss function, the gradient of the loss with respect to the model +# inputs. +# +# **``torch.autograd`` is an engine for computing these products.** This +# is how we accumulate the gradients over the learning weights during the +# backward pass. +# +# For this reason, the ``backward()`` call can *also* take an optional +# vector input. This vector represents a set of gradients over the tensor, +# which are multiplied by the Jacobian of the autograd-traced tensor that +# precedes it. Let’s try a specific example with a small vector: +# + +x = torch.randn(3, requires_grad=True) + +y = x * 2 +while y.data.norm() < 1000: + y = y * 2 + +print(y) + + +########################################################################## +# If we tried to call ``y.backward()`` now, we’d get a runtime error and a +# message that gradients can only be *implicitly* computed for scalar +# outputs. For a multi-dimensional output, autograd expects us to provide +# gradients for those three outputs that it can multiply into the +# Jacobian: +# + +v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float) # stand-in for gradients +y.backward(v) + +print(x.grad) + + +########################################################################## +# (Note that the output gradients are all related to powers of two - which +# we’d expect from a repeated doubling operation.) +# +# The High-Level API +# ~~~~~~~~~~~~~~~~~~ +# +# There is an API on autograd that gives you direct access to important +# differential matrix and vector operations. In particular, it allows you +# to calculate the Jacobian and the *Hessian* matrices of a particular +# function for particular inputs. (The Hessian is like the Jacobian, but +# expresses all partial *second* derivatives.) It also provides methods +# for taking vector products with these matrices. +# +# Let’s take the Jacobian of a simple function, evaluated for a 2 +# single-element inputs: +# + +def exp_adder(x, y): + return 2 * x.exp() + 3 * y + +inputs = (torch.rand(1), torch.rand(1)) # arguments for the function +print(inputs) +torch.autograd.functional.jacobian(exp_adder, inputs) + + +######################################################################## +# If you look closely, the first output should equal :math:`2e^x` (since +# the derivative of :math:`e^x` is :math:`e^x`), and the second value +# should be 3. +# +# You can, of course, do this with higher-order tensors: +# + +inputs = (torch.rand(3), torch.rand(3)) # arguments for the function +print(inputs) +torch.autograd.functional.jacobian(exp_adder, inputs) + + +######################################################################### +# The ``torch.autograd.functional.hessian()`` method works identically +# (assuming your function is twice differentiable), but returns a matrix +# of all second derivatives. +# +# There is also a function to directly compute the vector-Jacobian +# product, if you provide the vector: +# + +def do_some_doubling(x): + y = x * 2 + while y.data.norm() < 1000: + y = y * 2 + return y + +inputs = torch.randn(3) +my_gradients = torch.tensor([0.1, 1.0, 0.0001]) +torch.autograd.functional.vjp(do_some_doubling, inputs, v=my_gradients) + + +############################################################################## +# The ``torch.autograd.functional.jvp()`` method performs the same matrix +# multiplication as ``vjp()`` with the operands reversed. The ``vhp()`` +# and ``hvp()`` methods do the same for a vector-Hessian product. +# +# For more information, including preformance notes on the `docs for the +# functional +# API `__ +# diff --git a/beginner_source/introyt/captumyt.py b/beginner_source/introyt/captumyt.py new file mode 100644 index 00000000000..0a6f2ad3332 --- /dev/null +++ b/beginner_source/introyt/captumyt.py @@ -0,0 +1,492 @@ +""" +`Introduction `_ || +`Tensors `_ || +`Autograd `_ || +`Building Models `_ || +`TensorBoard Support `_ || +`Training Models `_ || +**Model Understanding** + +Model Understanding with Captum +=============================== + +Follow along with the video below or on `youtube `__. Download the notebook and corresponding files +`here `__. + +.. raw:: html + +
+ +
+ +`Captum `__ (“comprehension” in Latin) is an open +source, extensible library for model interpretability built on PyTorch. + +With the increase in model complexity and the resulting lack of +transparency, model interpretability methods have become increasingly +important. Model understanding is both an active area of research as +well as an area of focus for practical applications across industries +using machine learning. Captum provides state-of-the-art algorithms, +including Integrated Gradients, to provide researchers and developers +with an easy way to understand which features are contributing to a +model’s output. + +Full documentation, an API reference, and a suite of tutorials on +specific topics are available at the `captum.ai `__ +website. + +Introduction +------------ + +Captum’s approach to model interpretability is in terms of +*attributions.* There are three kinds of attributions available in +Captum: + +- **Feature Attribution** seeks to explain a particular output in terms + of features of the input that generated it. Explaining whether a + movie review was positive or negative in terms of certain words in + the review is an example of feature attribution. +- **Layer Attribution** examines the activity of a model’s hidden layer + subsequent to a particular input. Examining the spatially-mapped + output of a convolutional layer in response to an input image in an + example of layer attribution. +- **Neuron Attribution** is analagous to layer attribution, but focuses + on the activity of a single neuron. + +In this interactive notebook, we’ll look at Feature Attribution and +Layer Attribution. + +Each of the three attribution types has multiple **attribution +algorithms** associated with it. Many attribution algorithms fall into +two broad categories: + +- **Gradient-based algorithms** calculate the backward gradients of a + model output, layer output, or neuron activation with respect to the + input. **Integrated Gradients** (for features), **Layer Gradient \* + Activation**, and **Neuron Conductance** are all gradient-based + algorithms. +- **Perturbation-based algorithms** examine the changes in the output + of a model, layer, or neuron in response to changes in the input. The + input perturbations may be directed or random. **Occlusion,** + **Feature Ablation,** and **Feature Permutation** are all + perturbation-based algorithms. + +We’ll be examining algorithms of both types below. + +Especially where large models are involved, it can be valuable to +visualize attribution data in ways that relate it easily to the input +features being examined. While it is certainly possible to create your +own visualizations with Matplotlib, Plotly, or similar tools, Captum +offers enhanced tools specific to its attributions: + +- The ``captum.attr.visualization`` module (imported below as ``viz``) + provides helpful functions for visualizing attributions related to + images. +- **Captum Insights** is an easy-to-use API on top of Captum that + provides a visualization widget with ready-made visualizations for + image, text, and arbitrary model types. + +Both of these visualization toolsets will be demonstrated in this +notebook. The first few examples will focus on computer vision use +cases, but the Captum Insights section at the end will demonstrate +visualization of attributions in a multi-model, visual +question-and-answer model. + +Installation +------------ + +Before you get started, you need to have a Python environment with: + +- Python version 3.6 or higher +- For the Captum Insights example, Flask 1.1 or higher +- PyTorch version 1.2 or higher (the latest version is recommended) +- TorchVision version 0.6 or higher (the latest version is recommended) +- Captum (the latest version is recommended) + +To install Captum in an Anaconda or pip virtual environment, use the +appropriate command for your environment below: + +With ``conda``: + +``conda install pytorch torchvision captum -c pytorch`` + +With ``pip``: + +``pip install torch torchvision captum`` + +Restart this notebook in the environment you set up, and you’re ready to +go! + + +A First Example +--------------- + +To start, let’s take a simple, visual example. We’ll start with a ResNet +model pretrained on the ImageNet dataset. We’ll get a test input, and +use different **Feature Attribution** algorithms to examine how the +input images affect the output, and see a helpful visualization of this +input attribution map for some test images. + +First, some imports: + +""" + +import torch +import torch.nn.functional as F +import torchvision.transforms as transforms +import torchvision.models as models + +import captum +from captum.attr import IntegratedGradients, Occlusion, LayerGradCam, LayerAttribution +from captum.attr import visualization as viz + +import os, sys +import json + +import numpy as np +from PIL import Image +import matplotlib.pyplot as plt +from matplotlib.colors import LinearSegmentedColormap + + +######################################################################### +# Now we’ll use the TorchVision model library to download a pretrained +# ResNet. Since we’re not training, we’ll place it in evaluation mode for +# now. +# + +model = models.resnet101(pretrained=True) +model = model.eval() + + +####################################################################### +# The place where you got this interactive notebook should also have an +# ``img`` folder with a file ``cat.jpg`` in it. +# + +test_img = Image.open('img/cat.jpg') +test_img_data = np.asarray(test_img) +plt.imshow(test_img_data) +plt.show() + + +########################################################################## +# Our ResNet model was trained on the ImageNet dataset, and expects images +# to be of a certain size, with the channel data normalized to a specific +# range of values. We’ll also pull in the list of human-readable labels +# for the categories our model recognizes - that should be in the ``img`` +# folder as well. +# + +# model expects 224x224 3-color image +transform = transforms.Compose([ + transforms.Resize(224), + transforms.CenterCrop(224), + transforms.ToTensor() +]) + +# standard ImageNet normalization +transform_normalize = transforms.Normalize( + mean=[0.485, 0.456, 0.406], + std=[0.229, 0.224, 0.225] + ) + +transformed_img = transform(test_img) +input_img = transform_normalize(transformed_img) +input_img = input_img.unsqueeze(0) # the model requires a dummy batch dimension + +labels_path = 'img/imagenet_class_index.json' +with open(labels_path) as json_data: + idx_to_labels = json.load(json_data) + + +###################################################################### +# Now, we can ask the question: What does our model think this image +# represents? +# + +output = model(input_img) +output = F.softmax(output, dim=1) +prediction_score, pred_label_idx = torch.topk(output, 1) +pred_label_idx.squeeze_() +predicted_label = idx_to_labels[str(pred_label_idx.item())][1] +print('Predicted:', predicted_label, '(', prediction_score.squeeze().item(), ')') + + +###################################################################### +# We’ve confirmed that ResNet thinks our image of a cat is, in fact, a +# cat. But *why* does the model think this is an image of a cat? +# +# For the answer to that, we turn to Captum. +# + + +########################################################################## +# Feature Attribution with Integrated Gradients +# --------------------------------------------- +# +# **Feature attribution** attributes a particular output to features of +# the input. It uses a specific input - here, our test image - to generate +# a map of the relative importance of each input feature to a particular +# output feature. +# +# `Integrated +# Gradients `__ is one of +# the feature attribution algorithms available in Captum. Integrated +# Gradients assigns an importance score to each input feature by +# approximating the integral of the gradients of the model’s output with +# respect to the inputs. +# +# In our case, we’re going to be taking a specific element of the output +# vector - that is, the one indicating the model’s confidence in its +# chosen category - and use Integrated Gradients to understand what parts +# of the input image contributed to this output. +# +# Once we have the importance map from Integrated Gradients, we’ll use the +# visualization tools in Captum to give a helpful representation of the +# importance map. Captum’s ``visualize_image_attr()`` function provides a +# variety of options for customizing display of your attribution data. +# Here, we pass in a custom Matplotlib color map. +# +# Running the cell with the ``integrated_gradients.attribute()`` call will +# usually take a minute or two. +# + +# Initialize the attribution algorithm with the model +integrated_gradients = IntegratedGradients(model) + +# Ask the algorithm to attribute our output target to +attributions_ig = integrated_gradients.attribute(input_img, target=pred_label_idx, n_steps=200) + +# Show the original image for comparison +_ = viz.visualize_image_attr(None, np.transpose(transformed_img.squeeze().cpu().detach().numpy(), (1,2,0)), + method="original_image", title="Original Image") + +default_cmap = LinearSegmentedColormap.from_list('custom blue', + [(0, '#ffffff'), + (0.25, '#0000ff'), + (1, '#0000ff')], N=256) + +_ = viz.visualize_image_attr(np.transpose(attributions_ig.squeeze().cpu().detach().numpy(), (1,2,0)), + np.transpose(transformed_img.squeeze().cpu().detach().numpy(), (1,2,0)), + method='heat_map', + cmap=default_cmap, + show_colorbar=True, + sign='positive', + title='Integrated Gradients') + + +####################################################################### +# In the image above, you should see that Integrated Gradients gives us +# the strongest signal around the cat’s location in the image. +# + + +########################################################################## +# Feature Attribution with Occlusion +# ---------------------------------- +# +# Gradient-based attribution methods help to understand the model in terms +# of directly computing out the output changes with respect to the input. +# *Perturbation-based attribution* methods approach this more directly, by +# introducing changes to the output to measure the effect on the output. +# `Occlusion `__ is one such method. +# It involves replacing sections of the input image, and examining the +# effect on the output signal. +# +# Below, we set up Occlusion attribution. Similarly to configuring a +# convolutional neural network, you can specify the size of the target +# region, and a stride length to determine the spacing of individual +# measurements. We’ll visualize the output of our Occlusion attribution +# with ``visualize_image_attr_multiple()``, showing heat maps of both +# positive and negative attribution by region, and by masking the original +# image with the positive attribution regions. The masking gives a very +# instructive view of what regions of our cat photo the model found to be +# most “cat-like”. +# + +occlusion = Occlusion(model) + +attributions_occ = occlusion.attribute(input_img, + target=pred_label_idx, + strides=(3, 8, 8), + sliding_window_shapes=(3,15, 15), + baselines=0) + + +_ = viz.visualize_image_attr_multiple(np.transpose(attributions_occ.squeeze().cpu().detach().numpy(), (1,2,0)), + np.transpose(transformed_img.squeeze().cpu().detach().numpy(), (1,2,0)), + ["original_image", "heat_map", "heat_map", "masked_image"], + ["all", "positive", "negative", "positive"], + show_colorbar=True, + titles=["Original", "Positive Attribution", "Negative Attribution", "Masked"], + fig_size=(18, 6) + ) + + +###################################################################### +# Again, we see greater significance placed on the region of the image +# that contains the cat. +# + + +######################################################################### +# Layer Attribution with Layer GradCAM +# ------------------------------------ +# +# **Layer Attribution** allows you to attribute the activity of hidden +# layers within your model to features of your input. Below, we’ll use a +# layer attribution algorithm to examine the activity of one of the +# convolutional layers within our model. +# +# GradCAM computes the gradients of the target output with respect to the +# given layer, averages for each output channel (dimension 2 of output), +# and multiplies the average gradient for each channel by the layer +# activations. The results are summed over all channels. GradCAM is +# designed for convnets; since the activity of convolutional layers often +# maps spatially to the input, GradCAM attributions are often upsampled +# and used to mask the input. +# +# Layer attribution is set up similarly to input attribution, except that +# in addition to the model, you must specify a hidden layer within the +# model that you wish to examine. As above, when we call ``attribute()``, +# we specify the target class of interest. +# + +layer_gradcam = LayerGradCam(model, model.layer3[1].conv2) +attributions_lgc = layer_gradcam.attribute(input_img, target=pred_label_idx) + +_ = viz.visualize_image_attr(attributions_lgc[0].cpu().permute(1,2,0).detach().numpy(), + sign="all", + title="Layer 3 Block 1 Conv 2") + + +########################################################################## +# We’ll use the convenience method ``interpolate()`` in the +# `LayerAttribution `__ +# base class to upsample this attribution data for comparison to the input +# image. +# + +upsamp_attr_lgc = LayerAttribution.interpolate(attributions_lgc, input_img.shape[2:]) + +print(attributions_lgc.shape) +print(upsamp_attr_lgc.shape) +print(input_img.shape) + +_ = viz.visualize_image_attr_multiple(upsamp_attr_lgc[0].cpu().permute(1,2,0).detach().numpy(), + transformed_img.permute(1,2,0).numpy(), + ["original_image","blended_heat_map","masked_image"], + ["all","positive","positive"], + show_colorbar=True, + titles=["Original", "Positive Attribution", "Masked"], + fig_size=(18, 6)) + + +####################################################################### +# Visualizations such as this can give you novel insights into how your +# hidden layers respond to your input. +# + + +########################################################################## +# Visualization with Captum Insights +# ---------------------------------- +# +# Captum Insights is an interpretability visualization widget built on top +# of Captum to facilitate model understanding. Captum Insights works +# across images, text, and other features to help users understand feature +# attribution. It allows you to visualize attribution for multiple +# input/output pairs, and provides visualization tools for image, text, +# and arbitrary data. +# +# In this section of the notebook, we’ll visualize multiple image +# classification inferences with Captum Insights. +# +# First, let’s gather some image and see what the model thinks of them. +# For variety, we’ll take our cat, a teapot, and a trilobite fossil: +# + +imgs = ['img/cat.jpg', 'img/teapot.jpg', 'img/trilobite.jpg'] + +for img in imgs: + img = Image.open(img) + transformed_img = transform(img) + input_img = transform_normalize(transformed_img) + input_img = input_img.unsqueeze(0) # the model requires a dummy batch dimension + + output = model(input_img) + output = F.softmax(output, dim=1) + prediction_score, pred_label_idx = torch.topk(output, 1) + pred_label_idx.squeeze_() + predicted_label = idx_to_labels[str(pred_label_idx.item())][1] + print('Predicted:', predicted_label, '/', pred_label_idx.item(), ' (', prediction_score.squeeze().item(), ')') + + +########################################################################## +# …and it looks like our model is identifying them all correctly - but of +# course, we want to dig deeper. For that we’ll use the Captum Insights +# widget, which we configure with an ``AttributionVisualizer`` object, +# imported below. The ``AttributionVisualizer`` expects batches of data, +# so we’ll bring in Captum’s ``Batch`` helper class. And we’ll be looking +# at images specifically, so well also import ``ImageFeature``. +# +# We configure the ``AttributionVisualizer`` with the following arguments: +# +# - An array of models to be examined (in our case, just the one) +# - A scoring function, which allows Captum Insights to pull out the +# top-k predictions from a model +# - An ordered, human-readable list of classes our model is trained on +# - A list of features to look for - in our case, an ``ImageFeature`` +# - A dataset, which is an iterable object returning batches of inputs +# and labels - just like you’d use for training +# + +from captum.insights import AttributionVisualizer, Batch +from captum.insights.attr_vis.features import ImageFeature + +# Baseline is all-zeros input - this may differ depending on your data +def baseline_func(input): + return input * 0 + +# merging our image transforms from above +def full_img_transform(input): + i = Image.open(input) + i = transform(i) + i = transform_normalize(i) + i = i.unsqueeze(0) + return i + + +input_imgs = torch.cat(list(map(lambda i: full_img_transform(i), imgs)), 0) + +visualizer = AttributionVisualizer( + models=[model], + score_func=lambda o: torch.nn.functional.softmax(o, 1), + classes=list(map(lambda k: idx_to_labels[k][1], idx_to_labels.keys())), + features=[ + ImageFeature( + "Photo", + baseline_transforms=[baseline_func], + input_transforms=[], + ) + ], + dataset=[Batch(input_imgs, labels=[282,849,69])] +) + + +######################################################################### +# Note that running the cell above didn’t take much time at all, unlike +# our attributions above. That’s because Captum Insights lets you +# configure different attribution algorithms in a visual widget, after +# which it will compute and display the attributions. *That* process will +# take a few minutes. +# +# Running the cell below will render the Captum Insights widget. You can +# then choose attributions methods and their arguments, filter model +# responses based on predicted class or prediction correctness, see the +# model’s predictions with associated probabilities, and view heatmaps of +# the attribution compared with the original image. +# + +visualizer.render() diff --git a/beginner_source/introyt/introyt1_tutorial.py b/beginner_source/introyt/introyt1_tutorial.py new file mode 100644 index 00000000000..8540e782387 --- /dev/null +++ b/beginner_source/introyt/introyt1_tutorial.py @@ -0,0 +1,613 @@ +""" +**Introduction** || +`Tensors `_ || +`Autograd `_ || +`Building Models `_ || +`TensorBoard Support `_ || +`Training Models `_ || +`Model Understanding `_ + +Introduction to PyTorch +======================= + +Follow along with the video below or on `youtube `__. + +.. raw:: html + +
+ +
+ +PyTorch Tensors +--------------- + +Follow along with the video beginning at `03:50 `__. + +First, we’ll import pytorch. + +""" + +import torch + +###################################################################### +# Let’s see a few basic tensor manipulations. First, just a few of the +# ways to create tensors: +# + +z = torch.zeros(5, 3) +print(z) +print(z.dtype) + + +######################################################################### +# Above, we create a 5x3 matrix filled with zeros, and query its datatype +# to find out that the zeros are 32-bit floating point numbers, which is +# the default PyTorch. +# +# What if you wanted integers instead? You can always override the +# default: +# + +i = torch.ones((5, 3), dtype=torch.int16) +print(i) + + +###################################################################### +# You can see that when we do change the default, the tensor helpfully +# reports this when printed. +# +# It’s common to initialize learning weights randomly, often with a +# specific seed for the PRNG for reproducibility of results: +# + +torch.manual_seed(1729) +r1 = torch.rand(2, 2) +print('A random tensor:') +print(r1) + +r2 = torch.rand(2, 2) +print('\nA different random tensor:') +print(r2) # new values + +torch.manual_seed(1729) +r3 = torch.rand(2, 2) +print('\nShould match r1:') +print(r3) # repeats values of r1 because of re-seed + + +####################################################################### +# PyTorch tensors perform arithmetic operations intuitively. Tensors of +# similar shapes may be added, multiplied, etc. Operations with scalars +# are distributed over the tensor: +# + +ones = torch.ones(2, 3) +print(ones) + +twos = torch.ones(2, 3) * 2 # every element is multiplied by 2 +print(twos) + +threes = ones + twos # additon allowed because shapes are similar +print(threes) # tensors are added element-wise +print(threes.shape) # this has the same dimensions as input tensors + +r1 = torch.rand(2, 3) +r2 = torch.rand(3, 2) +# uncomment this line to get a runtime error +# r3 = r1 + r2 + + +###################################################################### +# Here’s a small sample of the mathematical operations available: +# + +r = torch.rand(2, 2) - 0.5 * 2 # values between -1 and 1 +print('A random matrix, r:') +print(r) + +# Common mathematical operations are supported: +print('\nAbsolute value of r:') +print(torch.abs(r)) + +# ...as are trigonometric functions: +print('\nInverse sine of r:') +print(torch.asin(r)) + +# ...and linear algebra operations like determinant and singular value decomposition +print('\nDeterminant of r:') +print(torch.det(r)) +print('\nSingular value decomposition of r:') +print(torch.svd(r)) + +# ...and statistical and aggregate operations: +print('\nAverage and standard deviation of r:') +print(torch.std_mean(r)) +print('\nMaximum value of r:') +print(torch.max(r)) + + +########################################################################## +# There’s a good deal more to know about the power of PyTorch tensors, +# including how to set them up for parallel computations on GPU - we’ll be +# going into more depth in another video. +# +# PyTorch Models +# -------------- +# +# Follow along with the video beginning at `10:00 `__. +# +# Let’s talk about how we can express models in PyTorch +# + +import torch # for all things PyTorch +import torch.nn as nn # for torch.nn.Module, the parent object for PyTorch models +import torch.nn.functional as F # for the activation function + + +######################################################################### +# .. figure:: /_static/img/mnist.png +# :alt: le-net-5 diagram +# +# *Figure: LeNet-5* +# +# Above is a diagram of LeNet-5, one of the earliest convolutional neural +# nets, and one of the drivers of the explosion in Deep Learning. It was +# built to read small images of handwritten numbers (the MNIST dataset), +# and correctly classify which digit was represented in the image. +# +# Here’s the abridged version of how it works: +# +# - Layer C1 is a convolutional layer, meaning that it scans the input +# image for features it learned during training. It outputs a map of +# where it saw each of its learned features in the image. This +# “activation map” is downsampled in layer S2. +# - Layer C3 is another convolutional layer, this time scanning C1’s +# activation map for *combinations* of features. It also puts out an +# activation map describing the spatial locations of these feature +# combinations, which is downsampled in layer S4. +# - Finally, the fully-connected layers at the end, F5, F6, and OUTPUT, +# are a *classifier* that takes the final activation map, and +# classifies it into one of ten bins representing the 10 digits. +# +# How do we express this simple neural network in code? +# + +class LeNet(nn.Module): + + def __init__(self): + super(LeNet, self).__init__() + # 1 input image channel (black & white), 6 output channels, 3x3 square convolution + # kernel + self.conv1 = nn.Conv2d(1, 6, 3) + self.conv2 = nn.Conv2d(6, 16, 3) + # an affine operation: y = Wx + b + self.fc1 = nn.Linear(16 * 6 * 6, 120) # 6*6 from image dimension + self.fc2 = nn.Linear(120, 84) + self.fc3 = nn.Linear(84, 10) + + def forward(self, x): + # Max pooling over a (2, 2) window + x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) + # If the size is a square you can only specify a single number + x = F.max_pool2d(F.relu(self.conv2(x)), 2) + x = x.view(-1, self.num_flat_features(x)) + x = F.relu(self.fc1(x)) + x = F.relu(self.fc2(x)) + x = self.fc3(x) + return x + + def num_flat_features(self, x): + size = x.size()[1:] # all dimensions except the batch dimension + num_features = 1 + for s in size: + num_features *= s + return num_features + + +############################################################################ +# Looking over this code, you should be able to spot some structural +# similarities with the diagram above. +# +# This demonstrates the structure of a typical PyTorch model: +# +# - It inherits from ``torch.nn.Module`` - modules may be nested - in fact, +# even the ``Conv2d`` and ``Linear`` layer classes inherit from +# ``torch.nn.Module``. +# - A model will have an ``__init__()`` function, where it instantiates +# its layers, and loads any data artifacts it might +# need (e.g., an NLP model might load a vocabulary). +# - A model will have a ``forward()`` function. This is where the actual +# computation happens: An input is passed through the network layers +# and various functions to generate an output. +# - Other than that, you can build out your model class like any other +# Python class, adding whatever properties and methods you need to +# support your model’s computation. +# +# Let’s instantiate this object and run a sample input through it. +# + +net = LeNet() +print(net) # what does the object tell us about itself? + +input = torch.rand(1, 1, 32, 32) # stand-in for a 32x32 black & white image +print('\nImage batch shape:') +print(input.shape) + +output = net(input) # we don't call forward() directly +print('\nRaw output:') +print(output) +print(output.shape) + + +########################################################################## +# There are a few important things happening above: +# +# First, we instantiate the ``LeNet`` class, and we print the ``net`` +# object. A subclass of ``torch.nn.Module`` will report the layers it has +# created and their shapes and parameters. This can provide a handy +# overview of a model if you want to get the gist of its processing. +# +# Below that, we create a dummy input representing a 32x32 image with 1 +# color channel. Normally, you would load an image tile and convert it to +# a tensor of this shape. +# +# You may have noticed an extra dimension to our tensor - the *batch +# dimension.* PyTorch models assume they are working on *batches* of data +# - for example, a batch of 16 of our image tiles would have the shape +# ``(16, 1, 32, 32)``. Since we’re only using one image, we create a batch +# of 1 with shape ``(1, 1, 32, 32)``. +# +# We ask the model for an inference by calling it like a function: +# ``net(input)``. The output of this call represents the model’s +# confidence that the input represents a particular digit. (Since this +# instance of the model hasn’t learned anything yet, we shouldn’t expect +# to see any signal in the output.) Looking at the shape of ``output``, we +# can see that it also has a batch dimension, the size of which should +# always match the input batch dimension. If we had passed in an input +# batch of 16 instances, ``output`` would have a shape of ``(16, 10)``. +# +# Datasets and Dataloaders +# ------------------------ +# +# Follow along with the video beginning at `14:00 `__. +# +# Below, we’re going to demonstrate using one of the ready-to-download, +# open-access datasets from TorchVision, how to transform the images for +# consumption by your model, and how to use the DataLoader to feed batches +# of data to your model. +# +# The first thing we need to do is transform our incoming images into a +# PyTorch tensor. +# + +#%matplotlib inline + +import torch +import torchvision +import torchvision.transforms as transforms + +transform = transforms.Compose( + [transforms.ToTensor(), + transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) + + +########################################################################## +# Here, we specify two transformations for our input: +# +# - ``transforms.ToTensor()`` converts images loaded by Pillow into +# PyTorch tensors. +# - ``transforms.Normalize()`` adjusts the values of the tensor so +# that their average is zero and their standard deviation is 0.5. Most +# activation functions have their strongest gradients around x = 0, so +# centering our data there can speed learning. +# +# There are many more transforms available, including cropping, centering, +# rotation, and reflection. +# +# Next, we’ll create an instance of the CIFAR10 dataset. This is a set of +# 32x32 color image tiles representing 10 classes of objects: 6 of animals +# (bird, cat, deer, dog, frog, horse) and 4 of vehicles (airplane, +# automobile, ship, truck): +# + +trainset = torchvision.datasets.CIFAR10(root='./data', train=True, + download=True, transform=transform) + + +########################################################################## +# .. note:: +# When you run the cell above, it may take a little time for the +# dataset to download. +# +# This is an example of creating a dataset object in PyTorch. Downloadable +# datasets (like CIFAR-10 above) are subclasses of +# ``torch.utils.data.Dataset``. ``Dataset`` classes in PyTorch include the +# downloadable datasets in TorchVision, Torchtext, and TorchAudio, as well +# as utility dataset classes such as ``torchvision.datasets.ImageFolder``, +# which will read a folder of labeled images. You can also create your own +# subclasses of ``Dataset``. +# +# When we instantiate our dataset, we need to tell it a few things: +# +# - The filesystem path to where we want the data to go. +# - Whether or not we are using this set for training; most datasets +# will be split into training and test subsets. +# - Whether we would like to download the dataset if we haven’t already. +# - The transformations we want to apply to the data. +# +# Once your dataset is ready, you can give it to the ``DataLoader``: +# + +trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, + shuffle=True, num_workers=2) + + +########################################################################## +# A ``Dataset`` subclass wraps access to the data, and is specialized to +# the type of data it’s serving. The ``DataLoader`` knows *nothing* about +# the data, but organizes the input tensors served by the ``Dataset`` into +# batches with the parameters you specify. +# +# In the example above, we’ve asked a ``DataLoader`` to give us batches of +# 4 images from ``trainset``, randomizing their order (``shuffle=True``), +# and we told it to spin up two workers to load data from disk. +# +# It’s good practice to visualize the batches your ``DataLoader`` serves: +# + +import matplotlib.pyplot as plt +import numpy as np + +classes = ('plane', 'car', 'bird', 'cat', + 'deer', 'dog', 'frog', 'horse', 'ship', 'truck') + +def imshow(img): + img = img / 2 + 0.5 # unnormalize + npimg = img.numpy() + plt.imshow(np.transpose(npimg, (1, 2, 0))) + + +# get some random training images +dataiter = iter(trainloader) +images, labels = dataiter.next() + +# show images +imshow(torchvision.utils.make_grid(images)) +# print labels +print(' '.join('%5s' % classes[labels[j]] for j in range(4))) + + +######################################################################## +# Running the above cell should show you a strip of four images, and the +# correct label for each. +# +# Training Your PyTorch Model +# --------------------------- +# +# Follow along with the video beginning at `17:10 `__. +# +# Let’s put all the pieces together, and train a model: +# + +#%matplotlib inline + +import torch +import torch.nn as nn +import torch.nn.functional as F +import torch.optim as optim + +import torchvision +import torchvision.transforms as transforms + +import matplotlib +import matplotlib.pyplot as plt +import numpy as np + + +######################################################################### +# First, we’ll need training and test datasets. If you haven’t already, +# run the cell below to make sure the dataset is downloaded. (It may take +# a minute.) +# + +transform = transforms.Compose( + [transforms.ToTensor(), + transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) + +trainset = torchvision.datasets.CIFAR10(root='./data', train=True, + download=True, transform=transform) +trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, + shuffle=True, num_workers=2) + +testset = torchvision.datasets.CIFAR10(root='./data', train=False, + download=True, transform=transform) +testloader = torch.utils.data.DataLoader(testset, batch_size=4, + shuffle=False, num_workers=2) + +classes = ('plane', 'car', 'bird', 'cat', + 'deer', 'dog', 'frog', 'horse', 'ship', 'truck') + + +###################################################################### +# We’ll run our check on the output from ``DataLoader``: +# + +import matplotlib.pyplot as plt +import numpy as np + +# functions to show an image + + +def imshow(img): + img = img / 2 + 0.5 # unnormalize + npimg = img.numpy() + plt.imshow(np.transpose(npimg, (1, 2, 0))) + + +# get some random training images +dataiter = iter(trainloader) +images, labels = dataiter.next() + +# show images +imshow(torchvision.utils.make_grid(images)) +# print labels +print(' '.join('%5s' % classes[labels[j]] for j in range(4))) + + +########################################################################## +# This is the model we’ll train. If it looks familiar, that’s because it’s +# a variant of LeNet - discussed earlier in this video - adapted for +# 3-color images. +# + +class Net(nn.Module): + def __init__(self): + super(Net, self).__init__() + self.conv1 = nn.Conv2d(3, 6, 5) + self.pool = nn.MaxPool2d(2, 2) + self.conv2 = nn.Conv2d(6, 16, 5) + self.fc1 = nn.Linear(16 * 5 * 5, 120) + self.fc2 = nn.Linear(120, 84) + self.fc3 = nn.Linear(84, 10) + + def forward(self, x): + x = self.pool(F.relu(self.conv1(x))) + x = self.pool(F.relu(self.conv2(x))) + x = x.view(-1, 16 * 5 * 5) + x = F.relu(self.fc1(x)) + x = F.relu(self.fc2(x)) + x = self.fc3(x) + return x + + +net = Net() + + +###################################################################### +# The last ingredients we need are a loss function and an optimizer: +# + +criterion = nn.CrossEntropyLoss() +optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9) + + +########################################################################## +# The loss function, as discussed earlier in this video, is a measure of +# how far from our ideal output the model’s prediction was. Cross-entropy +# loss is a typical loss function for classification models like ours. +# +# The **optimizer** is what drives the learning. Here we have created an +# optimizer that implements *stochastic gradient descent,* one of the more +# straightforward optimization algorithms. Besides parameters of the +# algorithm, like the learning rate (``lr``) and momentum, we also pass in +# ``net.parameters()``, which is a collection of all the learning weights +# in the model - which is what the optimizer adjusts. +# +# Finally, all of this is assembled into the training loop. Go ahead and +# run this cell, as it will likely take a few minutes to execute: +# + +for epoch in range(2): # loop over the dataset multiple times + + running_loss = 0.0 + for i, data in enumerate(trainloader, 0): + # get the inputs + inputs, labels = data + + # zero the parameter gradients + optimizer.zero_grad() + + # forward + backward + optimize + outputs = net(inputs) + loss = criterion(outputs, labels) + loss.backward() + optimizer.step() + + # print statistics + running_loss += loss.item() + if i % 2000 == 1999: # print every 2000 mini-batches + print('[%d, %5d] loss: %.3f' % + (epoch + 1, i + 1, running_loss / 2000)) + running_loss = 0.0 + +print('Finished Training') + + +######################################################################## +# Here, we are doing only **2 training epochs** (line 1) - that is, two +# passes over the training dataset. Each pass has an inner loop that +# **iterates over the training data** (line 4), serving batches of +# transformed input images and their correct labels. +# +# **Zeroing the gradients** (line 9) is an important step. Gradients are +# accumulated over a batch; if we do not reset them for every batch, they +# will keep accumulating, which will provide incorrect gradient values, +# making learning impossible. +# +# In line 12, we **ask the model for its predictions** on this batch. In +# the following line (13), we compute the loss - the difference between +# ``outputs`` (the model prediction) and ``labels`` (the correct output). +# +# In line 14, we do the ``backward()`` pass, and calculate the gradients +# that will direct the learning. +# +# In line 15, the optimizer performs one learning step - it uses the +# gradients from the ``backward()`` call to nudge the learning weights in +# the direction it thinks will reduce the loss. +# +# The remainder of the loop does some light reporting on the epoch number, +# how many training instances have been completed, and what the collected +# loss is over the training loop. +# +# **When you run the cell above,** you should see something like this: +# +# :: +# +# [1, 2000] loss: 2.235 +# [1, 4000] loss: 1.940 +# [1, 6000] loss: 1.713 +# [1, 8000] loss: 1.573 +# [1, 10000] loss: 1.507 +# [1, 12000] loss: 1.442 +# [2, 2000] loss: 1.378 +# [2, 4000] loss: 1.364 +# [2, 6000] loss: 1.349 +# [2, 8000] loss: 1.319 +# [2, 10000] loss: 1.284 +# [2, 12000] loss: 1.267 +# Finished Training +# +# Note that the loss is monotonically descending, indicating that our +# model is continuing to improve its performance on the training dataset. +# +# As a final step, we should check that the model is actually doing +# *general* learning, and not simply “memorizing” the dataset. This is +# called **overfitting,** and usually indicates that the dataset is too +# small (not enough examples for general learning), or that the model has +# more learning parameters than it needs to correctly model the dataset. +# +# This is the reason datasets are split into training and test subsets - +# to test the generality of the model, we ask it to make predictions on +# data it hasn’t trained on: +# + +correct = 0 +total = 0 +with torch.no_grad(): + for data in testloader: + images, labels = data + outputs = net(images) + _, predicted = torch.max(outputs.data, 1) + total += labels.size(0) + correct += (predicted == labels).sum().item() + +print('Accuracy of the network on the 10000 test images: %d %%' % ( + 100 * correct / total)) + + +######################################################################### +# If you followed along, you should see that the model is roughly 50% +# accurate at this point. That’s not exactly state-of-the-art, but it’s +# far better than the 10% accuracy we’d expect from a random output. This +# demonstrates that some general learning did happen in the model. +# diff --git a/beginner_source/introyt/modelsyt_tutorial.py b/beginner_source/introyt/modelsyt_tutorial.py new file mode 100644 index 00000000000..cf6a3e286ce --- /dev/null +++ b/beginner_source/introyt/modelsyt_tutorial.py @@ -0,0 +1,422 @@ +""" +`Introduction `_ || +`Tensors `_ || +`Autograd `_ || +**Building Models** || +`TensorBoard Support `_ || +`Training Models `_ || +`Model Understanding `_ + +Building Models with PyTorch +============================ + +Follow along with the video below or on `youtube `__. + +.. raw:: html + +
+ +
+ +``torch.nn.Module`` and ``torch.nn.Parameter`` +---------------------------------------------- + +In this video, we’ll be discussing some of the tools PyTorch makes +available for building deep learning networks. + +Except for ``Parameter``, the classes we discuss in this video are all +subclasses of ``torch.nn.Module``. This is the PyTorch base class meant +to encapsulate behaviors specific to PyTorch Models and their +components. + +One important behavior of ``torch.nn.Module`` is registering parameters. +If a particular ``Module`` subclass has learning weights, these weights +are expressed as instances of ``torch.nn.Parameter``. The ``Parameter`` +class is a subclass of ``torch.Tensor``, with the special behavior that +when they are assigned as attributes of a ``Module``, they are added to +the list of that modules parameters. These parameters may be accessed +through the ``parameters()`` method on the ``Module`` class. + +As a simple example, here’s a very simple model with two linear layers +and an activation function. We’ll create an instance of it and ask it to +report on its parameters: + +""" + +import torch + +class TinyModel(torch.nn.Module): + + def __init__(self): + super(TinyModel, self).__init__() + + self.linear1 = torch.nn.Linear(100, 200) + self.activation = torch.nn.ReLU() + self.linear2 = torch.nn.Linear(200, 10) + self.softmax = torch.nn.Softmax() + + def forward(self, x): + x = self.linear1(x) + x = self.activation(x) + x = self.linear2(x) + x = self.softmax(x) + return x + +tinymodel = TinyModel() + +print('The model:') +print(tinymodel) + +print('\n\nJust one layer:') +print(tinymodel.linear2) + +print('\n\nModel params:') +for param in tinymodel.parameters(): + print(param) + +print('\n\nLayer params:') +for param in tinymodel.linear2.parameters(): + print(param) + + +######################################################################### +# This shows the fundamental structure of a PyTorch model: there is an +# ``__init__()`` method that defines the layers and other components of a +# model, and a ``forward()`` method where the computation gets done. Note +# that we can print the model, or any of its submodules, to learn about +# its structure. +# +# Common Layer Types +# ------------------ +# +# Linear Layers +# ~~~~~~~~~~~~~ +# +# The most basic type of neural network layer is a *linear* or *fully +# connected* layer. This is a layer where every input influences every +# output of the layer to a degree specified by the layer’s weights. If a +# model has *m* inputs and *n* outputs, the weights will be an *m*x*n* +# matrix. For example: +# + +lin = torch.nn.Linear(3, 2) +x = torch.rand(1, 3) +print('Input:') +print(x) + +print('\n\nWeight and Bias parameters:') +for param in lin.parameters(): + print(param) + +y = lin(x) +print('\n\nOutput:') +print(y) + + +######################################################################### +# If you do the matrix multiplication of ``x`` by the linear layer’s +# weights, and add the biases, you’ll find that you get the output vector +# ``y``. +# +# One other important feature to note: When we checked the weights of our +# layer with ``lin.weight``, it reported itself as a ``Parameter`` (which +# is a subclass of ``Tensor``), and let us know that it’s tracking +# gradients with autograd. This is a default behavior for ``Parameter`` +# that differs from ``Tensor``. +# +# Linear layers are used widely in deep learning models. One of the most +# common places you’ll see them is in classifier models, which will +# usually have one or more linear layers at the end, where the last layer +# will have *n* outputs, where *n* is the number of classes the classifier +# addresses. +# +# Convolutional Layers +# ~~~~~~~~~~~~~~~~~~~~ +# +# *Convolutional* layers are built to handle data with a high degree of +# spatial correlation. They are very commonly used in computer vision, +# where they detect close groupings of features which the compose into +# higher-level features. They pop up in other contexts too - for example, +# in NLP applications, where the a word’s immediate context (that is, the +# other words nearby in the sequence) can affect the meaning of a +# sentence. +# +# We saw convolutional layers in action in LeNet5 in an earlier video: +# + +import torch.functional as F + + +class LeNet(torch.nn.Module): + + def __init__(self): + super(LeNet, self).__init__() + # 1 input image channel (black & white), 6 output channels, 3x3 square convolution + # kernel + self.conv1 = torch.nn.Conv2d(1, 6, 5) + self.conv2 = torch.nn.Conv2d(6, 16, 3) + # an affine operation: y = Wx + b + self.fc1 = torch.nn.Linear(16 * 6 * 6, 120) # 6*6 from image dimension + self.fc2 = torch.nn.Linear(120, 84) + self.fc3 = torch.nn.Linear(84, 10) + + def forward(self, x): + # Max pooling over a (2, 2) window + x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) + # If the size is a square you can only specify a single number + x = F.max_pool2d(F.relu(self.conv2(x)), 2) + x = x.view(-1, self.num_flat_features(x)) + x = F.relu(self.fc1(x)) + x = F.relu(self.fc2(x)) + x = self.fc3(x) + return x + + def num_flat_features(self, x): + size = x.size()[1:] # all dimensions except the batch dimension + num_features = 1 + for s in size: + num_features *= s + return num_features + + +########################################################################## +# Let’s break down what’s happening in the convolutional layers of this +# model. Starting with ``conv1``: +# +# - LeNet5 is meant to take in a 1x32x32 black & white image. **The first +# argument to a convolutional layer’s constructor is the number of +# input channels.** Here, it is 1. If we were building this model to +# look at 3-color channels, it would be 3. +# - A convolutional layer is like a window that scans over the image, +# looking for a pattern it recognizes. These patterns are called +# *features,* and one of the parameters of a convolutional layer is the +# number of features we would like it to learn. **This is the second +# argument to the constructor is the number of output features.** Here, +# we’re asking our layer to learn 6 features. +# - Just above, I likened the convolutional layer to a window - but how +# big is the window? **The third argument is the window or kernel +# size.** Here, the “5” means we’ve chosen a 5x5 kernel. (If you want a +# kernel with height different from width, you can specify a tuple for +# this argument - e.g., ``(3, 5)`` to get a 3x5 convolution kernel.) +# +# The output of a convolutional layer is an *activation map* - a spatial +# representation of the presence of features in the input tensor. +# ``conv1`` will give us an output tensor of 6x28x28; 6 is the number of +# features, and 28 is the height and width of our map. (The 28 comes from +# the fact that when scanning a 5-pixel window over a 32-pixel row, there +# are only 28 valid positions.) +# +# We then pass the output of the convolution through a ReLU activation +# function (more on activation functions later), then through a max +# pooling layer. The max pooling layer takes features near each other in +# the activation map and groups them together. It does this by reducing +# the tensor, merging every 2x2 group of cells in the output into a single +# cell, and assigning that cell the maximum value of the 4 cells that went +# into it. This gives us a lower-resolution version of the activation map, +# with dimensions 6x14x14. +# +# Our next convolutional layer, ``conv2``, expects 6 input channels +# (corresponding to the 6 features sought by the first layer), has 16 +# output channels, and a 3x3 kernel. It puts out a 16x12x12 activation +# map, which is again reduced by a max pooling layer to 16x6x6. Prior to +# passing this output to the linear layers, it is reshaped to a 16 \* 6 \* +# 6 = 576-element vector for consumption by the next layer. +# +# There are convolutional layers for addressing 1D, 2D, and 3D tensors. +# There are also many more optional arguments for a conv layer +# constructor, including stride length(e.g., only scanning every second or +# every third position) in the input, padding (so you can scan out to the +# edges of the input), and more. See the +# `documentation `__ +# for more information. +# +# Recurrent Layers +# ~~~~~~~~~~~~~~~~ +# +# *Recurrent neural networks* (or *RNNs)* are used for sequential data - +# anything from time-series measurements from a scientific instrument to +# natural language sentences to DNA nucleotides. An RNN does this by +# maintaining a *hidden state* that acts as a sort of memory for what it +# has seen in the sequence so far. +# +# The internal structure of an RNN layer - or its variants, the LSTM (long +# short-term memory) and GRU (gated recurrent unit) - is moderately +# complex and beyond the scope of this video, but we’ll show you what one +# looks like in action with an LSTM-based part-of-speech tagger (a type of +# classifier that tells you if a word is a noun, verb, etc.): +# + +class LSTMTagger(torch.nn.Module): + + def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size): + super(LSTMTagger, self).__init__() + self.hidden_dim = hidden_dim + + self.word_embeddings = torch.nn.Embedding(vocab_size, embedding_dim) + + # The LSTM takes word embeddings as inputs, and outputs hidden states + # with dimensionality hidden_dim. + self.lstm = torch.nn.LSTM(embedding_dim, hidden_dim) + + # The linear layer that maps from hidden state space to tag space + self.hidden2tag = torch.nn.Linear(hidden_dim, tagset_size) + + def forward(self, sentence): + embeds = self.word_embeddings(sentence) + lstm_out, _ = self.lstm(embeds.view(len(sentence), 1, -1)) + tag_space = self.hidden2tag(lstm_out.view(len(sentence), -1)) + tag_scores = F.log_softmax(tag_space, dim=1) + return tag_scores + + +######################################################################## +# The constructor has four arguments: +# +# - ``vocab_size`` is the number of words in the input vocabulary. Each +# word is a one-hot vector (or unit vector) in a +# ``vocab_size``-dimensional space. +# - ``tagset_size`` is the number of tags in the output set. +# - ``embedding_dim`` is the size of the *embedding* space for the +# vocabulary. An embedding maps a vocabulary onto a low-dimensional +# space, where words with similar meanings are close together in the +# space. +# - ``hidden_dim`` is the size of the LSTM’s memory. +# +# The input will be a sentence with the words represented as indices of +# one-hot vectors. The embedding layer will then map these down to an +# ``embedding_dim``-dimensional space. The LSTM takes this sequence of +# embeddings and iterates over it, fielding an output vector of length +# ``hidden_dim``. The final linear layer acts as a classifier; applying +# ``log_softmax()`` to the output of the final layer converts the output +# into a normalized set of estimated probabilities that a given word maps +# to a given tag. +# +# If you’d like to see this network in action, check out the `Sequence +# Models and LSTM +# Networks `__ +# tutorial on pytorch.org. +# +# Transformers +# ~~~~~~~~~~~~ +# +# *Transformers* are multi-purpose networks that have taken over the state +# of the art in NLP with models like BERT. A discussion of transformer +# architecture is beyond the scope of this video, but PyTorch has a +# ``Transformer`` class that allows you to define the overall parameters +# of a transformer model - the number of attention heads, the number of +# encoder & decoder layers, dropout and activation functions, etc. (You +# can even build the BERT model from this single class, with the right +# parameters!) The ``torch.nn.Transformer`` class also has classes to +# encapsulate the individual components (``TransformerEncoder``, +# ``TransformerDecoder``) and subcomponents (``TransformerEncoderLayer``, +# ``TransformerDecoderLayer``). For details, check out the +# `documentation `__ +# on transformer classes, and the relevant +# `tutorial `__ +# on pytorch.org. +# +# Other Layers and Functions +# -------------------------- +# +# Data Manipulation Layers +# ~~~~~~~~~~~~~~~~~~~~~~~~ +# +# There are other layer types that perform important functions in models, +# but don’t participate in the learning process themselves. +# +# **Max pooling** (and its twin, min pooling) reduce a tensor by combining +# cells, and assigning the maximum value of the input cells to the output +# cell (we saw this). For example: +# + +my_tensor = torch.rand(1, 6, 6) +print(my_tensor) + +maxpool_layer = torch.nn.MaxPool2d(3) +print(maxpool_layer(my_tensor)) + + +######################################################################### +# If you look closely at the values above, you’ll see that each of the +# values in the maxpooled output is the maximum value of each quadrant of +# the 6x6 input. +# +# **Normalization layers** re-center and normalize the output of one layer +# before feeding it to another. Centering the and scaling the intermediate +# tensors has a number of beneficial effects, such as letting you use +# higher learning rates without exploding/vanishing gradients. +# + +my_tensor = torch.rand(1, 4, 4) * 20 + 5 +print(my_tensor) + +print(my_tensor.mean()) + +norm_layer = torch.nn.BatchNorm1d(4) +normed_tensor = norm_layer(my_tensor) +print(normed_tensor) + +print(normed_tensor.mean()) + + + +########################################################################## +# Running the cell above, we’ve added a large scaling factor and offset to +# an input tensor; you should see the input tensor’s ``mean()`` somewhere +# in the neighborhood of 15. After running it through the normalization +# layer, you can see that the values are smaller, and grouped around zero +# - in fact, the mean should be very small (> 1e-8). +# +# This is beneficial because many activation functions (discussed below) +# have their strongest gradients near 0, but sometimes suffer from +# vanishing or exploding gradients for inputs that drive them far away +# from zero. Keeping the data centered around the area of steepest +# gradient will tend to mean faster, better learning and higher feasible +# learning rates. +# +# **Dropout layers** are a tool for encouraging *sparse representations* +# in your model - that is, pushing it to do inference with less data. +# +# Dropout layers work by randomly setting parts of the input tensor +# *during training* - dropout layers are always turned off for inference. +# This forces the model to learn against this masked or reduced dataset. +# For example: +# + +my_tensor = torch.rand(1, 4, 4) + +dropout = torch.nn.Dropout(p=0.4) +print(dropout(my_tensor)) +print(dropout(my_tensor)) + + +########################################################################## +# Above, you can see the effect of dropout on a sample tensor. You can use +# the optional ``p`` argument to set the probability of an individual +# weight dropping out; if you don’t it defaults to 0.5. +# +# Activation Functions +# ~~~~~~~~~~~~~~~~~~~~ +# +# Activation functions make deep learning possible. A neural network is +# really a program - with many parameters - that *simulates a mathematical +# function*. If all we did was multiple tensors by layer weights +# repeatedly, we could only simulate *linear functions;* further, there +# would be no point to having many layers, as the whole network would +# reduce could be reduced to a single matrix multiplication. Inserting +# *non-linear* activation functions between layers is what allows a deep +# learning model to simulate any function, rather than just linear ones. +# +# ``torch.nn.Module`` has objects encapsulating all of the major +# activation functions including ReLU and its many variants, Tanh, +# Hardtanh, sigmoid, and more. It also includes other functions, such as +# Softmax, that are most useful at the output stage of a model. +# +# Loss Functions +# ~~~~~~~~~~~~~~ +# +# Loss functions tell us how far a model’s prediction is from the correct +# answer. PyTorch contains a variety of loss functions, including common +# MSE (mean squared error = L2 norm), Cross Entropy Loss and Negative +# Likelihood Loss (useful for classifiers), and others. +# diff --git a/beginner_source/introyt/tensorboardyt_tutorial.py b/beginner_source/introyt/tensorboardyt_tutorial.py new file mode 100644 index 00000000000..47844829a4b --- /dev/null +++ b/beginner_source/introyt/tensorboardyt_tutorial.py @@ -0,0 +1,315 @@ +""" +`Introduction `_ || +`Tensors `_ || +`Autograd `_ || +`Building Models `_ || +**TensorBoard Support** || +`Training Models `_ || +`Model Understanding `_ + +PyTorch TensorBoard Support +=========================== + +Follow along with the video below or on `youtube `__. + +.. raw:: html + +
+ +
+ +Before You Start +---------------- + +To run this tutorial, you’ll need to install PyTorch, TorchVision, +Matplotlib, and TensorBoard. + +With ``conda``: + +``conda install pytorch torchvision -c pytorch`` +``conda install matplotlib tensorboard`` + +With ``pip``: + +``pip install torch torchvision matplotlib tensorboard`` + +Once the dependencies are installed, restart this notebook in the Python +environment where you installed them. + + +Introduction +------------ + +In this notebook, we’ll be training a variant of LeNet-5 against the +Fashion-MNIST dataset. Fashion-MNIST is a set of image tiles depicting +various garments, with ten class labels indicating the type of garment +depicted. + +""" + +# PyTorch model and training necessities +import torch +import torch.nn as nn +import torch.nn.functional as F +import torch.optim as optim + +# Image datasets and image manipulation +import torchvision +import torchvision.transforms as transforms + +# Image display +import matplotlib.pyplot as plt +import numpy as np + +# PyTorch TensorBoard support +from torch.utils.tensorboard import SummaryWriter + + +###################################################################### +# Showing Images in TensorBoard +# ----------------------------- +# +# Let’s start by adding sample images from our dataset to TensorBoard: +# + +# Gather datasets and prepare them for consumption +transform = transforms.Compose( + [transforms.ToTensor(), + transforms.Normalize((0.5,), (0.5,))]) + +# Store separate training and validations splits in ./data +training_set = torchvision.datasets.FashionMNIST('./data', + download=True, + train=True, + transform=transform) +validation_set = torchvision.datasets.FashionMNIST('./data', + download=True, + train=False, + transform=transform) + +training_loader = torch.utils.data.DataLoader(training_set, + batch_size=4, + shuffle=True, + num_workers=2) + + +validation_loader = torch.utils.data.DataLoader(validation_set, + batch_size=4, + shuffle=False, + num_workers=2) + +# Class labels +classes = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', + 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle Boot') + +# Helper function for inline image display +def matplotlib_imshow(img, one_channel=False): + if one_channel: + img = img.mean(dim=0) + img = img / 2 + 0.5 # unnormalize + npimg = img.numpy() + if one_channel: + plt.imshow(npimg, cmap="Greys") + else: + plt.imshow(np.transpose(npimg, (1, 2, 0))) + +# Extract a batch of 4 images +dataiter = iter(training_loader) +images, labels = dataiter.next() + +# Create a grid from the images and show them +img_grid = torchvision.utils.make_grid(images) +matplotlib_imshow(img_grid, one_channel=True) + + +######################################################################## +# Above, we used TorchVision and Matplotlib to create a visual grid of a +# minibatch of our input data. Below, we use the ``add_image()`` call on +# ``SummaryWriter`` to log the image for consumption by TensorBoard, and +# we also call ``flush()`` to make sure it’s written to disk right away. +# + +# Default log_dir argument is "runs" - but it's good to be specific +# torch.utils.tensorboard.SummaryWriter is imported above +writer = SummaryWriter('runs/fashion_mnist_experiment_1') + +# Write image data to TensorBoard log dir +writer.add_image('Four Fashion-MNIST Images', img_grid) +writer.flush() + +# To view, start TensorBoard on the command line with: +# tensorboard --logdir=runs +# ...and open a browser tab to http://localhost:6006/ + + +########################################################################## +# If you start TensorBoard at the command line and open it in a new +# browser tab (usually at `localhost:6006 `__), you should +# see the image grid under the IMAGES tab. +# +# Graphing Scalars to Visualize Training +# -------------------------------------- +# +# TensorBoard is useful for tracking the progress and efficacy of your +# training. Below, we’ll run a training loop, track some metrics, and save +# the data for TensorBoard’s consumption. +# +# Let’s define a model to categorize our image tiles, and an optimizer and +# loss function for training: +# + +class Net(nn.Module): + def __init__(self): + super(Net, self).__init__() + self.conv1 = nn.Conv2d(1, 6, 5) + self.pool = nn.MaxPool2d(2, 2) + self.conv2 = nn.Conv2d(6, 16, 5) + self.fc1 = nn.Linear(16 * 4 * 4, 120) + self.fc2 = nn.Linear(120, 84) + self.fc3 = nn.Linear(84, 10) + + def forward(self, x): + x = self.pool(F.relu(self.conv1(x))) + x = self.pool(F.relu(self.conv2(x))) + x = x.view(-1, 16 * 4 * 4) + x = F.relu(self.fc1(x)) + x = F.relu(self.fc2(x)) + x = self.fc3(x) + return x + + +net = Net() +criterion = nn.CrossEntropyLoss() +optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9) + + +########################################################################## +# Now let’s train a single epoch, and evaluate the training vs. validation +# set losses every 1000 batches: +# + +print(len(validation_loader)) +for epoch in range(1): # loop over the dataset multiple times + running_loss = 0.0 + + for i, data in enumerate(training_loader, 0): + # basic training loop + inputs, labels = data + optimizer.zero_grad() + outputs = net(inputs) + loss = criterion(outputs, labels) + loss.backward() + optimizer.step() + + running_loss += loss.item() + if i % 1000 == 999: # Every 1000 mini-batches... + print('Batch {}'.format(i + 1)) + # Check against the validation set + running_vloss = 0.0 + + net.train(False) # Don't need to track gradents for validation + for j, vdata in enumerate(validation_loader, 0): + vinputs, vlabels = vdata + voutputs = net(vinputs) + vloss = criterion(voutputs, vlabels) + running_vloss += vloss.item() + net.train(True) # Turn gradients back on for training + + avg_loss = running_loss / 1000 + avg_vloss = running_vloss / len(validation_loader) + + # Log the running loss averaged per batch + writer.add_scalars('Training vs. Validation Loss', + { 'Training' : avg_loss, 'Validation' : avg_vloss }, + epoch * len(training_loader) + i) + + running_loss = 0.0 +print('Finished Training') + +writer.flush() + + +######################################################################### +# Switch to your open TensorBoard and have a look at the SCALARS tab. +# +# Visualizing Your Model +# ---------------------- +# +# TensorBoard can also be used to examine the data flow within your model. +# To do this, call the ``add_graph()`` method with a model and sample +# input. When you open +# + +# Again, grab a single mini-batch of images +dataiter = iter(training_loader) +images, labels = dataiter.next() + +# add_graph() will trace the sample input through your model, +# and render it as a graph. +writer.add_graph(net, images) +writer.flush() + + +######################################################################### +# When you switch over to TensorBoard, you should see a GRAPHS tab. +# Double-click the “NET” node to see the layers and data flow within your +# model. +# +# Visualizing Your Dataset with Embeddings +# ---------------------------------------- +# +# The 28-by-28 image tiles we’re using can be modeled as 784-dimensional +# vectors (28 \* 28 = 784). It can be instructive to project this to a +# lower-dimensional representation. The ``add_embedding()`` method will +# project a set of data onto the three dimensions with highest variance, +# and display them as an interactive 3D chart. The ``add_embedding()`` +# method does this automatically by projecting to the three dimensions +# with highest variance. +# +# Below, we’ll take a sample of our data, and generate such an embedding: +# + +# Select a random subset of data and corresponding labels +def select_n_random(data, labels, n=100): + assert len(data) == len(labels) + + perm = torch.randperm(len(data)) + return data[perm][:n], labels[perm][:n] + +# Extract a random subset of data +images, labels = select_n_random(training_set.data, training_set.targets) + +# get the class labels for each image +class_labels = [classes[label] for label in labels] + +# log embeddings +features = images.view(-1, 28 * 28) +writer.add_embedding(features, + metadata=class_labels, + label_img=images.unsqueeze(1)) +writer.flush() +writer.close() + + +####################################################################### +# Now if you switch to TensorBoard and select the PROJECTOR tab, you +# should see a 3D representation of the projection. You can rotate and +# zoom the model. Examine it at large and small scales, and see whether +# you can spot patterns in the projected data and the clustering of +# labels. +# +# For better visibility, it’s recommended to: +# +# - Select “label” from the “Color by” drop-down on the left. +# - Toggle the Night Mode icon along the top to place the +# light-colored images on a dark background. +# +# Other Resources +# --------------- +# +# For more information, have a look at: +# +# - PyTorch documentation on `torch.utils.tensorboard.SummaryWriter `__ +# - Tensorboard tutorial content in the `PyTorch.org Tutorials `__ +# - For more information about TensorBoard, see the `TensorBoard +# documentation `__ diff --git a/beginner_source/introyt/tensors_deeper_tutorial.py b/beginner_source/introyt/tensors_deeper_tutorial.py new file mode 100644 index 00000000000..1f6d7248851 --- /dev/null +++ b/beginner_source/introyt/tensors_deeper_tutorial.py @@ -0,0 +1,951 @@ +""" +`Introduction `_ || +**Tensors** || +`Autograd `_ || +`Building Models `_ || +`TensorBoard Support `_ || +`Training Models `_ || +`Model Understanding `_ + +Introduction to PyTorch Tensors +=============================== + +Follow along with the video below or on `youtube `__. + +.. raw:: html + +
+ +
+ +Tensors are the central data abstraction in PyTorch. This interactive +notebook provides an in-depth introduction to the ``torch.Tensor`` +class. + +First things first, let’s import the PyTorch module. We’ll also add +Python’s math module to facilitate some of the examples. + +""" + +import torch +import math + + +######################################################################### +# Creating Tensors +# ---------------- +# +# The simplest way to create a tensor is with the ``torch.empty()`` call: +# + +x = torch.empty(3, 4) +print(type(x)) +print(x) + + +########################################################################## +# Let’s unpack what we just did: +# +# - We created a tensor using one of the numerous factory methods +# attached to the ``torch`` module. +# - The tensor itself is 2-dimensional, having 3 rows and 4 columns. +# - The type of the object returned is ``torch.Tensor``, which is an +# alias for ``torch.FloatTensor``; by default, PyTorch tensors are +# populated with 32-bit floating point numbers. (More on data types +# below.) +# - You will probably see some random-looking values when printing your +# tensor. The ``torch.empty()`` call allocates memory for the tensor, +# but does not initialize it with any values - so what you’re seeing is +# whatever was in memory at the time of allocation. +# +# A brief note about tensors and their number of dimensions, and +# terminology: +# +# - You will sometimes see a 1-dimensional tensor called a +# *vector.* +# - Likewise, a 2-dimensional tensor is often referred to as a +# *matrix.* +# - Anything with more than two dimensions is generally just +# called a tensor. +# +# More often than not, you’ll want to initialize your tensor with some +# value. Common cases are all zeros, all ones, or random values, and the +# ``torch`` module provides factory methods for all of these: +# + +zeros = torch.zeros(2, 3) +print(zeros) + +ones = torch.ones(2, 3) +print(ones) + +torch.manual_seed(1729) +random = torch.rand(2, 3) +print(random) + + +######################################################################### +# The factory methods all do just what you’d expect - we have a tensor +# full of zeros, another full of ones, and another with random values +# between 0 and 1. +# +# Random Tensors and Seeding +# ~~~~~~~~~~~~~~~~~~~~~~~~~~ +# +# Speaking of the random tensor, did you notice the call to +# ``torch.manual_seed()`` immediately preceding it? Initializing tensors, +# such as a model’s learning weights, with random values is common but +# there are times - especially in research settings - where you’ll want +# some assurance of the reproducibility of your results. Manually setting +# your random number generator’s seed is the way to do this. Let’s look +# more closely: +# + +torch.manual_seed(1729) +random1 = torch.rand(2, 3) +print(random1) + +random2 = torch.rand(2, 3) +print(random2) + +torch.manual_seed(1729) +random3 = torch.rand(2, 3) +print(random3) + +random4 = torch.rand(2, 3) +print(random4) + + +############################################################################ +# What you should see above is that ``random1`` and ``random3`` carry +# identical values, as do ``random2`` and ``random4``. Manually setting +# the RNG’s seed resets it, so that identical computations depending on +# random number should, in most settings, provide identical results. +# +# For more information, see the `PyTorch documentation on +# reproducibility `__. +# +# Tensor Shapes +# ~~~~~~~~~~~~~ +# +# Often, when you’re performing operations on two or more tensors, they +# will need to be of the same *shape* - that is, having the same number of +# dimensions and the same number of cells in each dimension. For that, we +# have the ``torch.*_like()`` methods: +# + +x = torch.empty(2, 2, 3) +print(x.shape) +print(x) + +empty_like_x = torch.empty_like(x) +print(empty_like_x.shape) +print(empty_like_x) + +zeros_like_x = torch.zeros_like(x) +print(zeros_like_x.shape) +print(zeros_like_x) + +ones_like_x = torch.ones_like(x) +print(ones_like_x.shape) +print(ones_like_x) + +rand_like_x = torch.rand_like(x) +print(rand_like_x.shape) +print(rand_like_x) + + +######################################################################### +# The first new thing in the code cell above is the use of the ``.shape`` +# property on a tensor. This property contains a list of the extent of +# each dimension of a tensor - in our case, ``x`` is a three-dimensional +# tensor with shape 2 x 2 x 3. +# +# Below that, we call the ``.empty_like()``, ``.zeros_like()``, +# ``.ones_like()``, and ``.rand_like()`` methods. Using the ``.shape`` +# property, we can verify that each of these methods returns a tensor of +# identical dimensionality and extent. +# +# The last way to create a tensor that will cover is to specify its data +# directly from a PyTorch collection: +# + +some_constants = torch.tensor([[3.1415926, 2.71828], [1.61803, 0.0072897]]) +print(some_constants) + +some_integers = torch.tensor((2, 3, 5, 7, 11, 13, 17, 19)) +print(some_integers) + +more_integers = torch.tensor(((2, 4, 6), [3, 6, 9])) +print(more_integers) + + +###################################################################### +# Using ``torch.tensor()`` is the most straightforward way to create a +# tensor if you already have data in a Python tuple or list. As shown +# above, nesting the collections will result in a multi-dimensional +# tensor. +# +# .. note:: +# ``torch.tensor()`` creates a copy of the data. +# +# Tensor Data Types +# ~~~~~~~~~~~~~~~~~ +# +# Setting the datatype of a tensor is possible a couple of ways: +# + +a = torch.ones((2, 3), dtype=torch.int16) +print(a) + +b = torch.rand((2, 3), dtype=torch.float64) * 20. +print(b) + +c = b.to(torch.int32) +print(c) + + +########################################################################## +# The simplest way to set the underlying data type of a tensor is with an +# optional argument at creation time. In the first line of the cell above, +# we set ``dtype=torch.int16`` for the tensor ``a``. When we print ``a``, +# we can see that it’s full of ``1`` rather than ``1.`` - Python’s subtle +# cue that this is an integer type rather than floating point. +# +# Another thing to notice about printing ``a`` is that, unlike when we +# left ``dtype`` as the default (32-bit floating point), printing the +# tensor also specifies its ``dtype``. +# +# You may have also spotted that we went from specifying the tensor’s +# shape as a series of integer arguments, to grouping those arguments in a +# tuple. This is not strictly necessary - PyTorch will take a series of +# initial, unlabeled integer arguments as a tensor shape - but when adding +# the optional arguments, it can make your intent more readable. +# +# The other way to set the datatype is with the ``.to()`` method. In the +# cell above, we create a random floating point tensor ``b`` in the usual +# way. Following that, we create ``c`` by converting ``b`` to a 32-bit +# integer with the ``.to()`` method. Note that ``c`` contains all the same +# values as ``b``, but truncated to integers. +# +# Available data types include: +# +# - ``torch.bool`` +# - ``torch.int8`` +# - ``torch.uint8`` +# - ``torch.int16`` +# - ``torch.int32`` +# - ``torch.int64`` +# - ``torch.half`` +# - ``torch.float`` +# - ``torch.double`` +# - ``torch.bfloat`` +# +# Math & Logic with PyTorch Tensors +# --------------------------------- +# +# Now that you know some of the ways to create a tensor… what can you do +# with them? +# +# Let’s look at basic arithmetic first, and how tensors interact with +# simple scalars: +# + +ones = torch.zeros(2, 2) + 1 +twos = torch.ones(2, 2) * 2 +threes = (torch.ones(2, 2) * 7 - 1) / 2 +fours = twos ** 2 +sqrt2s = twos ** 0.5 + +print(ones) +print(twos) +print(threes) +print(fours) +print(sqrt2s) + + +########################################################################## +# As you can see above, arithmetic operations between tensors and scalars, +# such as addition, subtraction, multiplication, division, and +# exponentiation are distributed over every element of the tensor. Because +# the output of such an operation will be a tensor, you can chain them +# together with the usual operator precedence rules, as in the line where +# we create ``threes``. +# +# Similar operations between two tensors also behave like you’d +# intuitively expect: +# + +powers2 = twos ** torch.tensor([[1, 2], [3, 4]]) +print(powers2) + +fives = ones + fours +print(fives) + +dozens = threes * fours +print(dozens) + + +########################################################################## +# It’s important to note here that all of the tensors in the previous code +# cell were of identical shape. What happens when we try to perform a +# binary operation on tensors if dissimilar shape? +# +# .. note:: +# The following cell throws a run-time error. This is intentional. +# +# :: +# +# a = torch.rand(2, 3) +# b = torch.rand(3, 2) +# +# print(a * b) +# + + +########################################################################## +# In the general case, you cannot operate on tensors of different shape +# this way, even in a case like the cell above, where the tensors have an +# identical number of elements. +# +# In Brief: Tensor Broadcasting +# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +# +# .. note:: +# If you are familiar with broadcasting semantics in NumPy +# ndarrays, you’ll find the same rules apply here. +# +# The exception to the same-shapes rule is *tensor broadcasting.* Here’s +# an example: +# + +rand = torch.rand(2, 4) +doubled = rand * (torch.ones(1, 4) * 2) + +print(rand) +print(doubled) + + +######################################################################### +# What’s the trick here? How is it we got to multiply a 2x4 tensor by a +# 1x4 tensor? +# +# Broadcasting is a way to perform an operation between tensors that have +# similarities in their shapes. In the example above, the one-row, +# four-column tensor is multiplied by *both rows* of the two-row, +# four-column tensor. +# +# This is an important operation in Deep Learning. The common example is +# multiplying a tensor of learning weights by a *batch* of input tensors, +# applying the operation to each instance in the batch separately, and +# returning a tensor of identical shape - just like our (2, 4) \* (1, 4) +# example above returned a tensor of shape (2, 4). +# +# The rules for broadcasting are: +# +# - Each tensor must have at least one dimension - no empty tensors. +# +# - Comparing the dimension sizes of the two tensors, *going from last to +# first:* +# +# - Each dimension must be equal, *or* +# +# - One of the dimensions must be of size 1, *or* +# +# - The dimension does not exist in one of the tensors +# +# Tensors of identical shape, of course, are trivially “broadcastable”, as +# you saw earlier. +# +# Here are some examples of situations that honor the above rules and +# allow broadcasting: +# + +a = torch.ones(4, 3, 2) + +b = a * torch.rand( 3, 2) # 3rd & 2nd dims identical to a, dim 1 absent +print(b) + +c = a * torch.rand( 3, 1) # 3rd dim = 1, 2nd dim identical to a +print(c) + +d = a * torch.rand( 1, 2) # 3rd dim identical to a, 2nd dim = 1 +print(d) + + +############################################################################# +# Look closely at the values of each tensor above: +# +# - The multiplication operation that created ``b`` was +# broadcast over every “layer” of ``a``. +# - For ``c``, the operation was broadcast over ever layer and row of +# ``a`` - every 3-element column is identical. +# - For ``d``, we switched it around - now every *row* is identical, +# across layers and columns. +# +# For more information on broadcasting, see the `PyTorch +# documentation `__ +# on the topic. +# +# Here are some examples of attempts at broadcasting that will fail: +# +# .. note:: +# The following cell throws a run-time error. This is intentional. +# +# :: +# +# a = torch.ones(4, 3, 2) +# +# b = a * torch.rand(4, 3) # dimensions must match last-to-first +# +# c = a * torch.rand( 2, 3) # both 3rd & 2nd dims different +# +# d = a * torch.rand((0, )) # can't broadcast with an empty tensor +# + + +########################################################################### +# More Math with Tensors +# ~~~~~~~~~~~~~~~~~~~~~~ +# +# PyTorch tensors have over three hundred operations that can be performed +# on them. +# +# Here is a small sample from some of the major categories of operations: +# + +# common functions +a = torch.rand(2, 4) * 2 - 1 +print('Common functions:') +print(torch.abs(a)) +print(torch.ceil(a)) +print(torch.floor(a)) +print(torch.clamp(a, -0.5, 0.5)) + +# trigonometric functions and their inverses +angles = torch.tensor([0, math.pi / 4, math.pi / 2, 3 * math.pi / 4]) +sines = torch.sin(angles) +inverses = torch.asin(sines) +print('\nSine and arcsine:') +print(angles) +print(sines) +print(inverses) + +# bitwise operations +print('\nBitwise XOR:') +b = torch.tensor([1, 5, 11]) +c = torch.tensor([2, 7, 10]) +print(torch.bitwise_xor(b, c)) + +# comparisons: +print('\nBroadcasted, element-wise equality comparison:') +d = torch.tensor([[1., 2.], [3., 4.]]) +e = torch.ones(1, 2) # many comparison ops support broadcasting! +print(torch.eq(d, e)) # returns a tensor of type bool + +# reductions: +print('\nReduction ops:') +print(torch.max(d)) # returns a single-element tensor +print(torch.max(d).item()) # extracts the value from the returned tensor +print(torch.mean(d)) # average +print(torch.std(d)) # standard deviation +print(torch.prod(d)) # product of all numbers +print(torch.unique(torch.tensor([1, 2, 1, 2, 1, 2]))) # filter unique elements + +# vector and linear algebra operations +v1 = torch.tensor([1., 0., 0.]) # x unit vector +v2 = torch.tensor([0., 1., 0.]) # y unit vector +m1 = torch.rand(2, 2) # random matrix +m2 = torch.tensor([[3., 0.], [0., 3.]]) # three times identity matrix + +print('\nVectors & Matrices:') +print(torch.cross(v2, v1)) # negative of z unit vector (v1 x v2 == -v2 x v1) +print(m1) +m3 = torch.matmul(m1, m2) +print(m3) # 3 times m1 +print(torch.svd(m3)) # singular value decomposition + + +################################################################################## +# This is a small sample of operations. For more details and the full inventory of +# math functions, have a look at the +# `documentation `__. +# +# Altering Tensors in Place +# ~~~~~~~~~~~~~~~~~~~~~~~~~ +# +# Most binary operations on tensors will return a third, new tensor. When +# we say ``c = a * b`` (where ``a`` and ``b`` are tensors), the new tensor +# ``c`` will occupy a region of memory distinct from the other tensors. +# +# There are times, though, that you may wish to alter a tensor in place - +# for example, if you’re doing an element-wise computation where you can +# discard intermediate values. For this, most of the math functions have a +# version with an appended underscore (``_``) that will alter a tensor in +# place. +# +# For example: +# + +a = torch.tensor([0, math.pi / 4, math.pi / 2, 3 * math.pi / 4]) +print('a:') +print(a) +print(torch.sin(a)) # this operation creates a new tensor in memory +print(a) # a has not changed + +b = torch.tensor([0, math.pi / 4, math.pi / 2, 3 * math.pi / 4]) +print('\nb:') +print(b) +print(torch.sin_(b)) # note the underscore +print(b) # b has changed + + +####################################################################### +# For arithmetic operations, there are functions that behave similarly: +# + +a = torch.ones(2, 2) +b = torch.rand(2, 2) + +print('Before:') +print(a) +print(b) +print('\nAfter adding:') +print(a.add_(b)) +print(a) +print(b) +print('\nAfter multiplying') +print(b.mul_(b)) +print(b) + + +########################################################################## +# Note that these in-place arithmetic functions are methods on the +# ``torch.Tensor`` object, not attached to the ``torch`` module like many +# other functions (e.g., ``torch.sin()``). As you can see from +# ``a.add_(b)``, *the calling tensor is the one that gets changed in +# place.* +# +# There is another option for placing the result of a computation in an +# existing, allocated tensor. Many of the methods and functions we’ve seen +# so far - including creation methods! - have an ``out`` argument that +# lets you specify a tensor to receive the output. If the ``out`` tensor +# is the correct shape and ``dtype``, this can happen without a new memory +# allocation: +# + +a = torch.rand(2, 2) +b = torch.rand(2, 2) +c = torch.zeros(2, 2) +old_id = id(c) + +print(c) +d = torch.matmul(a, b, out=c) +print(c) # contents of c have changed + +assert c is d # test c & d are same object, not just containing equal values +assert id(c), old_id # make sure that our new c is the same object as the old one + +torch.rand(2, 2, out=c) # works for creation too! +print(c) # c has changed again +assert id(c), old_id # still the same object! + + +########################################################################## +# Copying Tensors +# --------------- +# +# As with any object in Python, assigning a tensor to a variable makes the +# variable a *label* of the tensor, and does not copy it. For example: +# + +a = torch.ones(2, 2) +b = a + +a[0][1] = 561 # we change a... +print(b) # ...and b is also altered + + +###################################################################### +# But what if you want a separate copy of the data to work on? The +# ``clone()`` method is there for you: +# + +a = torch.ones(2, 2) +b = a.clone() + +assert b is not a # different objects in memory... +print(torch.eq(a, b)) # ...but still with the same contents! + +a[0][1] = 561 # a changes... +print(b) # ...but b is still all ones + + +######################################################################### +# **There is an important thing to be aware of when using ``clone()``.** +# If your source tensor has autograd, enabled then so will the clone. +# **This will be covered more deeply in the video on autograd,** but if +# you want the light version of the details, continue on. +# +# *In many cases, this will be what you want.* For example, if your model +# has multiple computation paths in its ``forward()`` method, and *both* +# the original tensor and its clone contribute to the model’s output, then +# to enable model learning you want autograd turned on for both tensors. +# If your source tensor has autograd enabled (which it generally will if +# it’s a set of learning weights or derived from a computation involving +# the weights), then you’ll get the result you want. +# +# On the other hand, if you’re doing a computation where *neither* the +# original tensor nor its clone need to track gradients, then as long as +# the source tensor has autograd turned off, you’re good to go. +# +# *There is a third case,* though: Imagine you’re performing a computation +# in your model’s ``forward()`` function, where gradients are turned on +# for everything by default, but you want to pull out some values +# mid-stream to generate some metrics. In this case, you *don’t* want the +# cloned copy of your source tensor to track gradients - performance is +# improved with autograd’s history tracking turned off. For this, you can +# use the ``.detach()`` method on the source tensor: +# + +a = torch.rand(2, 2, requires_grad=True) # turn on autograd +print(a) + +b = a.clone() +print(b) + +c = a.detach().clone() +print(c) + +print(a) + + +######################################################################### +# What’s happening here? +# +# - We create ``a`` with ``requires_grad=True`` turned on. **We haven’t +# covered this optional argument yet, but will during the unit on +# autograd.** +# - When we print ``a``, it informs us that the property +# ``requires_grad=True`` - this means that autograd and computation +# history tracking are turned on. +# - We clone ``a`` and label it ``b``. When we print ``b``, we can see +# that it’s tracking its computation history - it has inherited +# ``a``\ ’s autograd settings, and added to the computation history. +# - We clone ``a`` into ``c``, but we call ``detach()`` first. +# - Printing ``c``, we see no computation history, and no +# ``requires_grad=True``. +# +# The ``detach()`` method *detaches the tensor from its computation +# history.* It says, “do whatever comes next as if autograd was off.” It +# does this *without* changing ``a`` - you can see that when we print +# ``a`` again at the end, it retains its ``requires_grad=True`` property. +# +# Moving to GPU +# ------------- +# +# One of the major advantages of PyTorch is its robust acceleration on +# CUDA-compatible Nvidia GPUs. (“CUDA” stands for *Compute Unified Device +# Architecture*, which is Nvidia’s platform for parallel computing.) So +# far, everything we’ve done has been on CPU. How do we move to the faster +# hardware? +# +# First, we should check whether a GPU is available, with the +# ``is_available()`` method. +# +# .. note:: +# If you do not have a CUDA-compatible GPU and CUDA drivers +# installed, the executable cells in this section will not execute any +# GPU-related code. +# + +if torch.cuda.is_available(): + print('We have a GPU!') +else: + print('Sorry, CPU only.') + + +########################################################################## +# Once we’ve determined that one or more GPUs is available, we need to put +# our data someplace where the GPU can see it. Your CPU does computation +# on data in your computer’s RAM. Your GPU has dedicated memory attached +# to it. Whenever you want to perform a computation on a device, you must +# move *all* the data needed for that computation to memory accessible by +# that device. (Colloquially, “moving the data to memory accessible by the +# GPU” is shorted to, “moving the data to the GPU”.) +# +# There are multiple ways to get your data onto your target device. You +# may do it at creation time: +# + +if torch.cuda.is_available(): + gpu_rand = torch.rand(2, 2, device='cuda') + print(gpu_rand) +else: + print('Sorry, CPU only.') + + +########################################################################## +# By default, new tensors are created on the CPU, so we have to specify +# when we want to create our tensor on the GPU with the optional +# ``device`` argument. You can see when we print the new tensor, PyTorch +# informs us which device it’s on (if it’s not on CPU). +# +# You can query the number of GPUs with ``torch.cuda.device_count()``. If +# you have more than one GPU, you can specify them by index: +# ``device='cuda:0'``, ``device='cuda:1'``, etc. +# +# As a coding practice, specifying our devices everywhere with string +# constants is pretty fragile. In an ideal world, your code would perform +# robustly whether you’re on CPU or GPU hardware. You can do this by +# creating a device handle that can be passed to your tensors instead of a +# string: +# + +if torch.cuda.is_available(): + my_device = torch.device('cuda') +else: + my_device = torch.device('cpu') +print('Device: {}'.format(my_device)) + +x = torch.rand(2, 2, device=my_device) +print(x) + + +######################################################################### +# If you have an existing tensor living on one device, you can move it to +# another with the ``to()`` method. The following line of code creates a +# tensor on CPU, and moves it to whichever device handle you acquired in +# the previous cell. +# + +y = torch.rand(2, 2) +y = y.to(my_device) + + +########################################################################## +# It is important to know that in order to do computation involving two or +# more tensors, *all of the tensors must be on the same device*. The +# following code will throw a runtime error, regardless of whether you +# have a GPU device available: +# +# :: +# +# x = torch.rand(2, 2) +# y = torch.rand(2, 2, device='gpu') +# z = x + y # exception will be thrown +# + + +########################################################################### +# Manipulating Tensor Shapes +# -------------------------- +# +# Sometimes, you’ll need to change the shape of your tensor. Below, we’ll +# look at a few common cases, and how to handle them. +# +# Changing the Number of Dimensions +# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +# +# One case where you might need to change the number of dimensions is +# passing a single instance of input to your model. PyTorch models +# generally expect *batches* of input. +# +# For example, imagine having a model that works on 3 x 226 x 226 images - +# a 226-pixel square with 3 color channels. When you load and transform +# it, you’ll get a tensor of shape ``(3, 226, 226)``. Your model, though, +# is expecting input of shape ``(N, 3, 226, 226)``, where ``N`` is the +# number of images in the batch. So how do you make a batch of one? +# + +a = torch.rand(3, 226, 226) +b = a.unsqueeze(0) + +print(a.shape) +print(b.shape) + + +########################################################################## +# The ``unsqueeze()`` method adds a dimension of extent 1. +# ``unsqueeze(0)`` adds it as a new zeroth dimension - now you have a +# batch of one! +# +# So if that’s *un*\ squeezing? What do we mean by squeezing? We’re taking +# advantage of the fact that any dimension of extent 1 *does not* change +# the number of elements in the tensor. +# + +c = torch.rand(1, 1, 1, 1, 1) +print(c) + + +########################################################################## +# Continuing the example above, let’s say the model’s output is a +# 20-element vector for each input. You would then expect the output to +# have shape ``(N, 20)``, where ``N`` is the number of instances in the +# input batch. That means that for our single-input batch, we’ll get an +# output of shape ``(1, 20)``. +# +# What if you want to do some *non-batched* computation with that output - +# something that’s just expecting a 20-element vector? +# + +a = torch.rand(1, 20) +print(a.shape) +print(a) + +b = a.squeeze(0) +print(b.shape) +print(b) + +c = torch.rand(2, 2) +print(c.shape) + +d = c.squeeze(0) +print(d.shape) + + +######################################################################### +# You can see from the shapes that our 2-dimensional tensor is now +# 1-dimensional, and if you look closely at the output of the cell above +# you’ll see that printing ``a`` shows an “extra” set of square brackets +# ``[]`` due to having an extra dimension. +# +# You may only ``squeeze()`` dimensions of extent 1. See above where we +# try to squeeze a dimension of size 2 in ``c``, and get back the same +# shape we started with. Calls to ``squeeze()`` and ``unsqueeze()`` can +# only act on dimensions of extent 1 because to do otherwise would change +# the number of elements in the tensor. +# +# Another place you might use ``unsqueeze()`` is to ease broadcasting. +# Recall the example above where we had the following code: +# +# :: +# +# a = torch.ones(4, 3, 2) +# +# c = a * torch.rand( 3, 1) # 3rd dim = 1, 2nd dim identical to a +# print(c) +# +# The net effect of that was to broadcast the operation over dimensions 0 +# and 2, causing the random, 3 x 1 tensor to be multiplied element-wise by +# every 3-element column in ``a``. +# +# What if the random vector had just been 3-element vector? We’d lose the +# ability to do the broadcast, because the final dimensions would not +# match up according to the broadcasting rules. ``unsqueeze()`` comes to +# the rescue: +# + +a = torch.ones(4, 3, 2) +b = torch.rand( 3) # trying to multiply a * b will give a runtime error +c = b.unsqueeze(1) # change to a 2-dimensional tensor, adding new dim at the end +print(c.shape) +print(a * c) # broadcasting works again! + + +###################################################################### +# The ``squeeze()`` and ``unsqueeze()`` methods also have in-place +# versions, ``squeeze_()`` and ``unsqueeze_()``: +# + +batch_me = torch.rand(3, 226, 226) +print(batch_me.shape) +batch_me.unsqueeze_(0) +print(batch_me.shape) + + +########################################################################## +# Sometimes you’ll want to change the shape of a tensor more radically, +# while still preserving the number of elements and their contents. One +# case where this happens is at the interface between a convolutional +# layer of a model and a linear layer of the model - this is common in +# image classification models. A convolution kernel will yield an output +# tensor of shape *features x width x height,* but the following linear +# layer expects a 1-dimensional input. ``reshape()`` will do this for you, +# provided that the dimensions you request yield the same number of +# elements as the input tensor has: +# + +output3d = torch.rand(6, 20, 20) +print(output3d.shape) + +input1d = output3d.reshape(6 * 20 * 20) +print(input1d.shape) + +# can also call it as a method on the torch module: +print(torch.reshape(output3d, (6 * 20 * 20,)).shape) + + +############################################################################### +# .. note:: +# The ``(6 * 20 * 20,)`` argument in the final line of the cell +# above is because PyTorch expects a **tuple** when specifying a +# tensor shape - but when the shape is the first argument of a method, it +# lets us cheat and just use a series of integers. Here, we had to add the +# parentheses and comma to convince the method that this is really a +# one-element tuple. +# +# When it can, ``reshape()`` will return a *view* on the tensor to be +# changed - that is, a separate tensor object looking at the same +# underlying region of memory. *This is important:* That means any change +# made to the source tensor will be reflected in the view on that tensor, +# unless you ``clone()`` it. +# +# There *are* conditions, beyond the scope of this introduction, where +# ``reshape()`` has to return a tensor carrying a copy of the data. For +# more information, see the +# `docs `__. +# + + +####################################################################### +# NumPy Bridge +# ------------ +# +# In the section above on broadcasting, it was mentioned that PyTorch’s +# broadcast semantics are compatible with NumPy’s - but the kinship +# between PyTorch and NumPy goes even deeper than that. +# +# If you have existing ML or scientific code with data stored in NumPy +# ndarrays, you may wish to express that same data as PyTorch tensors, +# whether to take advantage of PyTorch’s GPU acceleration, or its +# efficient abstractions for building ML models. It’s easy to switch +# between ndarrays and PyTorch tensors: +# + +import numpy as np + +numpy_array = np.ones((2, 3)) +print(numpy_array) + +pytorch_tensor = torch.from_numpy(numpy_array) +print(pytorch_tensor) + + +########################################################################## +# PyTorch creates a tensor of the same shape and containing the same data +# as the NumPy array, going so far as to keep NumPy’s default 64-bit float +# data type. +# +# The conversion can just as easily go the other way: +# + +pytorch_rand = torch.rand(2, 3) +print(pytorch_rand) + +numpy_rand = pytorch_rand.numpy() +print(numpy_rand) + + +########################################################################## +# It is important to know that these converted objects are using *the same +# underlying memory* as their source objects, meaning that changes to one +# are reflected in the other: +# + +numpy_array[1, 1] = 23 +print(pytorch_tensor) + +pytorch_rand[1, 1] = 17 +print(numpy_rand) diff --git a/beginner_source/introyt/tocyt.txt b/beginner_source/introyt/tocyt.txt new file mode 100644 index 00000000000..f956671c11b --- /dev/null +++ b/beginner_source/introyt/tocyt.txt @@ -0,0 +1,8 @@ +1. `Introduction to PyTorch `_ +2. `Introduction to PyTorch Tensors `_ +3. `The Fundamentals of Autograd `_ +4. `Building Models with PyTorch `_ +5. `PyTorch TensorBoard Support `_ +6. `Training with PyTorch `_ +7. `Model Understanding with Captum `_ +8. `Production Inference Deployment with PyTorch `_ (video only) diff --git a/beginner_source/introyt/trainingyt.py b/beginner_source/introyt/trainingyt.py new file mode 100644 index 00000000000..a2203d4e9dc --- /dev/null +++ b/beginner_source/introyt/trainingyt.py @@ -0,0 +1,364 @@ +""" +`Introduction `_ || +`Tensors `_ || +`Autograd `_ || +`Building Models `_ || +`TensorBoard Support `_ || +**Training Models** || +`Model Understanding `_ + +Training with PyTorch +===================== + +Follow along with the video below or on `youtube `__. + +.. raw:: html + +
+ +
+ +Introduction +------------ + +In past videos, we’ve discussed and demonstrated: + +- Building models with the neural network layers and functions of the torch.nn module +- The mechanics of automated gradient computation, which is central to + gradient-based model training +- Using TensorBoard to visualize training progress and other activities + +In this video, we’ll be adding some new tools to your inventory: + +- We’ll get familiar with the dataset and dataloader abstractions, and how + they ease the process of feeding data to your model during a training loop +- We’ll discuss specific loss functions and when to use them +- We’ll look at PyTorch optimizers, which implement algorithms to adjust + model weights based on the outcome of a loss function + +Finally, we’ll pull all of these together and see a full PyTorch +training loop in action. + + +Dataset and DataLoader +---------------------- + +The ``Dataset`` and ``DataLoader`` classes encapsulate the process of +pulling your data from storage and exposing it to your training loop in +batches. + +The ``Dataset`` is responsible for accessing and processing single +instances of data. + +The ``DataLoader`` pulls instances of data from the ``Dataset`` (either +automatically or with a sampler that you define), collects them in +batches, and returns them for consumption by your training loop. The +``DataLoader`` works with all kinds of datasets, regardless of the type +of data they contain. + +For this tutorial, we’ll be using the Fashion-MNIST dataset provided by +TorchVision. We use ``torchvision.transforms.Normalize()`` to +zero-center and normalize the distribution of the image tile content, +and download both training and validation data splits. + +""" + +import torch +import torchvision +import torchvision.transforms as transforms + +# PyTorch TensorBoard support +from torch.utils.tensorboard import SummaryWriter +from datetime import datetime + + +transform = transforms.Compose( + [transforms.ToTensor(), + transforms.Normalize((0.5,), (0.5,))]) + +# Create datasets for training & validation, download if necessary +training_set = torchvision.datasets.FashionMNIST('./data', train=True, transform=transform, download=True) +validation_set = torchvision.datasets.FashionMNIST('./data', train=False, transform=transform, download=True) + +# Create data loaders for our datasets; shuffle for training, not for validation +training_loader = torch.utils.data.DataLoader(training_set, batch_size=4, shuffle=True, num_workers=2) +validation_loader = torch.utils.data.DataLoader(validation_set, batch_size=4, shuffle=False, num_workers=2) + +# Class labels +classes = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', + 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle Boot') + +# Report split sizes +print('Training set has {} instances'.format(len(training_set))) +print('Validation set has {} instances'.format(len(validation_set))) + + +###################################################################### +# As always, let’s visualize the data as a sanity check: +# + +import matplotlib.pyplot as plt +import numpy as np + +# Helper function for inline image display +def matplotlib_imshow(img, one_channel=False): + if one_channel: + img = img.mean(dim=0) + img = img / 2 + 0.5 # unnormalize + npimg = img.numpy() + if one_channel: + plt.imshow(npimg, cmap="Greys") + else: + plt.imshow(np.transpose(npimg, (1, 2, 0))) + +dataiter = iter(training_loader) +images, labels = dataiter.next() + +# Create a grid from the images and show them +img_grid = torchvision.utils.make_grid(images) +matplotlib_imshow(img_grid, one_channel=True) +print(' '.join(classes[labels[j]] for j in range(4))) + + +######################################################################### +# The Model +# --------- +# +# The model we’ll use in this example is a variant of LeNet-5 - it should +# be familiar if you’ve watched the previous videos in this series. +# + +import torch.nn as nn +import torch.nn.functional as F + +# PyTorch models inherit from torch.nn.Module +class GarmentClassifier(nn.Module): + def __init__(self): + super(GarmentClassifier, self).__init__() + self.conv1 = nn.Conv2d(1, 6, 5) + self.pool = nn.MaxPool2d(2, 2) + self.conv2 = nn.Conv2d(6, 16, 5) + self.fc1 = nn.Linear(16 * 4 * 4, 120) + self.fc2 = nn.Linear(120, 84) + self.fc3 = nn.Linear(84, 10) + + def forward(self, x): + x = self.pool(F.relu(self.conv1(x))) + x = self.pool(F.relu(self.conv2(x))) + x = x.view(-1, 16 * 4 * 4) + x = F.relu(self.fc1(x)) + x = F.relu(self.fc2(x)) + x = self.fc3(x) + return x + + +model = GarmentClassifier() + + +########################################################################## +# Loss Function +# ------------- +# +# For this example, we’ll be using a cross-entropy loss. For demonstration +# purposes, we’ll create batches of dummy output and label values, run +# them through the loss function, and examine the result. +# + +loss_fn = torch.nn.CrossEntropyLoss() + +# NB: Loss functions expect data in batches, so we're creating batches of 4 +# Represents the model's confidence in each of the 10 classes for a given input +dummy_outputs = torch.rand(4, 10) +# Represents the correct class among the 10 being tested +dummy_labels = torch.tensor([1, 5, 3, 7]) + +print(dummy_outputs) +print(dummy_labels) + +loss = loss_fn(dummy_outputs, dummy_labels) +print('Total loss for this batch: {}'.format(loss.item())) + + +################################################################################# +# Optimizer +# --------- +# +# For this example, we’ll be using simple `stochastic gradient +# descent `__ with momentum. +# +# It can be instructive to try some variations on this optimization +# scheme: +# +# - Learning rate determines the size of the steps the optimizer +# takes. What does a different learning rate do to the your training +# results, in terms of accuracy and convergence time? +# - Momentum nudges the optimizer in the direction of strongest gradient over +# multiple steps. What does changing this value do to your results? +# - Try some different optimization algorithms, such as averaged SGD, Adagrad, or +# Adam. How do your results differ? +# + +# Optimizers specified in the torch.optim package +optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9) + + +####################################################################################### +# The Training Loop +# ----------------- +# +# Below, we have a function that performs one training epoch. It +# enumerates data from the DataLoader, and on each pass of the loop does +# the following: +# +# - Gets a batch of training data from the DataLoader +# - Zeros the optimizer’s gradients +# - Performs an inference - that is, gets predictions from the model for an input batch +# - Calculates the loss for that set of predictions vs. the labels on the dataset +# - Calculates the backward gradients over the learning weights +# - Tells the optimizer to perform one learning step - that is, adjust the model’s +# learning weights based on the observed gradients for this batch, according to the +# optimization algorithm we chose +# - It reports on the loss for every 1000 batches. +# - Finally, it reports the average per-batch loss for the last +# 1000 batches, for comparison with a validation run +# + +def train_one_epoch(epoch_index, tb_writer): + running_loss = 0. + last_loss = 0. + + # Here, we use enumerate(training_loader) instead of + # iter(training_loader) so that we can track the batch + # index and do some intra-epoch reporting + for i, data in enumerate(training_loader): + # Every data instance is an input + label pair + inputs, labels = data + + # Zero your gradients for every batch! + optimizer.zero_grad() + + # Make predictions for this batch + outputs = model(inputs) + + # Compute the loss and its gradients + loss = loss_fn(outputs, labels) + loss.backward() + + # Adjust learning weights + optimizer.step() + + # Gather data and report + running_loss += loss.item() + if i % 1000 == 999: + last_loss = running_loss / 1000 # loss per batch + print(' batch {} loss: {}'.format(i + 1, last_loss)) + tb_x = epoch_index * len(training_loader) + i + 1 + tb_writer.add_scalar('Loss/train', last_loss, tb_x) + running_loss = 0. + + return last_loss + + +################################################################################## +# Per-Epoch Activity +# ~~~~~~~~~~~~~~~~~~ +# +# There are a couple of things we’ll want to do once per epoch: +# +# - Perform validation by checking our relative loss on a set of data that was not +# used for training, and report this +# - Save a copy of the model +# +# Here, we’ll do our reporting in TensorBoard. This will require going to +# the command line to start TensorBoard, and opening it in another browser +# tab. +# + +# Initializing in a separate cell so we can easily add more epochs to the same run +timestamp = datetime.now().strftime('%Y%m%d_%H%M%S') +writer = SummaryWriter('runs/fashion_trainer_{}'.format(timestamp)) +epoch_number = 0 + +EPOCHS = 5 + +best_vloss = 1_000_000. + +for epoch in range(EPOCHS): + print('EPOCH {}:'.format(epoch_number + 1)) + + # Make sure gradient tracking is on, and do a pass over the data + model.train(True) + avg_loss = train_one_epoch(epoch_number, writer) + + # We don't need gradients on to do reporting + model.train(False) + + running_vloss = 0.0 + for i, vdata in enumerate(validation_loader): + vinputs, vlabels = vdata + voutputs = model(vinputs) + vloss = loss_fn(voutputs, vlabels) + running_vloss += vloss + + avg_vloss = running_vloss / (i + 1) + print('LOSS train {} valid {}'.format(avg_loss, avg_vloss)) + + # Log the running loss averaged per batch + # for both training and validation + writer.add_scalars('Training vs. Validation Loss', + { 'Training' : avg_loss, 'Validation' : avg_vloss }, + epoch_number + 1) + writer.flush() + + # Track best performance, and save the model's state + if avg_vloss < best_vloss: + best_vloss = avg_vloss + model_path = 'model_{}_{}'.format(timestamp, epoch_number) + torch.save(model.state_dict(), model_path) + + epoch_number += 1 + + +######################################################################### +# To load a saved version of the model: +# +# :: +# +# saved_model = GarmentClassifier() +# saved_model.load_state_dict(torch.load(PATH)) +# +# Once you’ve loaded the model, it’s ready for whatever you need it for - +# more training, inference, or analysis. +# +# Note that if your model has constructor parameters that affect model +# structure, you’ll need to provide them and configure the model +# identically to the state in which it was saved. +# +# Other Resources +# --------------- +# +# - Docs on the `data +# utilities `__, including +# Dataset and DataLoader, at pytorch.org +# - A `note on the use of pinned +# memory `__ +# for GPU training +# - Documentation on the datasets available in +# `TorchVision `__, +# `TorchText `__, and +# `TorchAudio `__ +# - Documentation on the `loss +# functions `__ +# available in PyTorch +# - Documentation on the `torch.optim +# package `__, which +# includes optimizers and related tools, such as learning rate +# scheduling +# - A detailed `tutorial on saving and loading +# models `__ +# - The `Tutorials section of +# pytorch.org `__ contains tutorials on +# a broad variety of training tasks, including classification in +# different domains, generative adversarial networks, reinforcement +# learning, and more +# diff --git a/index.rst b/index.rst index 2603219e85c..28b02b64563 100644 --- a/index.rst +++ b/index.rst @@ -55,6 +55,13 @@ Welcome to PyTorch Tutorials :image: _static/img/thumbnails/cropped/60-min-blitz.png :link: beginner/basics/intro.html :tags: Getting-Started + +.. customcarditem:: + :header: Introduction to PyTorch on YouTube + :card_description: An introduction to building a complete ML workflow with PyTorch. Follows the PyTorch Beginner Series on YouTube. + :image: _static/img/thumbnails/cropped/generic-pytorch-logo.PNG + :link: beginner/introyt.html + :tags: Getting-Started .. customcarditem:: :header: Learning PyTorch with Examples @@ -632,6 +639,21 @@ Additional Resources beginner/basics/autogradqs_tutorial beginner/basics/optimization_tutorial beginner/basics/saveloadrun_tutorial + +.. toctree:: + :maxdepth: 2 + :hidden: + :includehidden: + :caption: Introduction to PyTorch on YouTube + + beginner/introyt + beginner/introyt/introyt1_tutorial + beginner/introyt/tensors_deeper_tutorial + beginner/introyt/autogradyt_tutorial + beginner/introyt/modelsyt_tutorial + beginner/introyt/tensorboardyt_tutorial + beginner/introyt/trainingyt + beginner/introyt/captumyt .. toctree:: :maxdepth: 2 diff --git a/requirements.txt b/requirements.txt index 5ee2abe0e50..c700e9046c7 100644 --- a/requirements.txt +++ b/requirements.txt @@ -17,6 +17,7 @@ awscli==1.16.35 flask spacy==2.3.2 ray[tune] +tensorboard # PyTorch Theme -e git+https://github.com/pytorch/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme