diff --git a/_static/img/dag_autograd.png b/_static/img/dag_autograd.png new file mode 100644 index 00000000000..cdc50fed625 Binary files /dev/null and b/_static/img/dag_autograd.png differ diff --git a/beginner_source/blitz/autograd_tutorial.py b/beginner_source/blitz/autograd_tutorial.py index 98e70a251d6..53659225d82 100644 --- a/beginner_source/blitz/autograd_tutorial.py +++ b/beginner_source/blitz/autograd_tutorial.py @@ -1,198 +1,322 @@ # -*- coding: utf-8 -*- """ -Autograd: Automatic Differentiation -=================================== +A Gentle Introduction to ``torch.autograd`` +--------------------------------- -Central to all neural networks in PyTorch is the ``autograd`` package. -Let’s first briefly visit this, and we will then go to training our -first neural network. +``torch.autograd`` is PyTorch’s automatic differentiation engine that powers +neural network training. In this section, you will get a conceptual +understanding of how autograd helps a neural network train. +Background +~~~~~~~~~~ +Neural networks (NNs) are a collection of nested functions that are +executed on some input data. These functions are defined by *parameters* +(consisting of weights and biases), which in PyTorch are stored in +tensors. -The ``autograd`` package provides automatic differentiation for all operations -on Tensors. It is a define-by-run framework, which means that your backprop is -defined by how your code is run, and that every single iteration can be -different. +Training a NN happens in two steps: -Let us see this in more simple terms with some examples. +**Forward Propagation**: In forward prop, the NN makes its best guess +about the correct output. It runs the input data through each of its +functions to make this guess. -Tensor --------- +**Backward Propagation**: In backprop, the NN adjusts its parameters +proportionate to the error in its guess. It does this by traversing +backwards from the output, collecting the derivatives of the error with +respect to the parameters of the functions (*gradients*), and optimizing +the parameters using gradient descent. For a more detailed walkthrough +of backprop, check out this `video from +3Blue1Brown `__. -``torch.Tensor`` is the central class of the package. If you set its attribute -``.requires_grad`` as ``True``, it starts to track all operations on it. When -you finish your computation you can call ``.backward()`` and have all the -gradients computed automatically. The gradient for this tensor will be -accumulated into ``.grad`` attribute. -To stop a tensor from tracking history, you can call ``.detach()`` to detach -it from the computation history, and to prevent future computation from being -tracked. -To prevent tracking history (and using memory), you can also wrap the code block -in ``with torch.no_grad():``. This can be particularly helpful when evaluating a -model because the model may have trainable parameters with -``requires_grad=True``, but for which we don't need the gradients. -There’s one more class which is very important for autograd -implementation - a ``Function``. +Usage in PyTorch +~~~~~~~~~~~ +Let's take a look at a single training step. +For this example, we load a pretrained resnet18 model from ``torchvision``. +We create a random data tensor to represent a single image with 3 channels, and height & width of 64, +and its corresponding ``label`` initialized to some random values. +""" +import torch, torchvision +model = torchvision.models.resnet18(pretrained=True) +data = torch.rand(1, 3, 64, 64) +labels = torch.rand(1, 1000) + +############################################################ +# Next, we run the input data through the model through each of its layers to make a prediction. +# This is the **forward pass**. +# -``Tensor`` and ``Function`` are interconnected and build up an acyclic -graph, that encodes a complete history of computation. Each tensor has -a ``.grad_fn`` attribute that references a ``Function`` that has created -the ``Tensor`` (except for Tensors created by the user - their -``grad_fn is None``). +prediction = model(data) # forward pass -If you want to compute the derivatives, you can call ``.backward()`` on -a ``Tensor``. If ``Tensor`` is a scalar (i.e. it holds a one element -data), you don’t need to specify any arguments to ``backward()``, -however if it has more elements, you need to specify a ``gradient`` -argument that is a tensor of matching shape. -""" +############################################################ +# We use the model's prediction and the corresponding label to calculate the error (``loss``). +# The next step is to backpropagate this error through the network. +# Backward propagation is kicked off when we call ``.backward()`` on the error tensor. +# Autograd then calculates and stores the gradients for each model parameter in the parameter's ``.grad`` attribute. +# + +loss = (prediction - labels).sum() +loss.backward() # backward pass + +############################################################ +# Next, we load an optimizer, in this case SGD with a learning rate of 0.01 and momentum of 0.9. +# We register all the parameters of the model in the optimizer. +# + +optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9) + +###################################################################### +# Finally, we call ``.step()`` to initiate gradient descent. The optimizer adjusts each parameter by its gradient stored in ``.grad``. +# + +optim.step() #gradient descent + +###################################################################### +# At this point, you have everything you need to train your neural network. +# The below sections detail the workings of autograd - feel free to skip them. +# + + +###################################################################### +# -------------- +# + + +###################################################################### +# Differentiation in Autograd +# ~~~~~~~~~~~~~~~~~~~~~~~~~~~ +# Let's take a look at how ``autograd`` collects gradients. We create two tensors ``a`` and ``b`` with +# ``requires_grad=True``. This signals to ``autograd`` that every operation on them should be tracked. +# import torch -############################################################### -# Create a tensor and set ``requires_grad=True`` to track computation with it -x = torch.ones(2, 2, requires_grad=True) -print(x) - -############################################################### -# Do a tensor operation: -y = x + 2 -print(y) - -############################################################### -# ``y`` was created as a result of an operation, so it has a ``grad_fn``. -print(y.grad_fn) - -############################################################### -# Do more operations on ``y`` -z = y * y * 3 -out = z.mean() - -print(z, out) - -################################################################ -# ``.requires_grad_( ... )`` changes an existing Tensor's ``requires_grad`` -# flag in-place. The input flag defaults to ``False`` if not given. -a = torch.randn(2, 2) -a = ((a * 3) / (a - 1)) -print(a.requires_grad) -a.requires_grad_(True) -print(a.requires_grad) -b = (a * a).sum() -print(b.grad_fn) - -############################################################### -# Gradients -# --------- -# Let's backprop now. -# Because ``out`` contains a single scalar, ``out.backward()`` is -# equivalent to ``out.backward(torch.tensor(1.))``. - -out.backward() - -############################################################### -# Print gradients d(out)/dx -# - -print(x.grad) - -############################################################### -# You should have got a matrix of ``4.5``. Let’s call the ``out`` -# *Tensor* “:math:`o`”. -# We have that :math:`o = \frac{1}{4}\sum_i z_i`, -# :math:`z_i = 3(x_i+2)^2` and :math:`z_i\bigr\rvert_{x_i=1} = 27`. -# Therefore, -# :math:`\frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2)`, hence -# :math:`\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5`. - -############################################################### -# Mathematically, if you have a vector valued function :math:`\vec{y}=f(\vec{x})`, -# then the gradient of :math:`\vec{y}` with respect to :math:`\vec{x}` -# is a Jacobian matrix: +a = torch.tensor([2., 3.], requires_grad=True) +b = torch.tensor([6., 4.], requires_grad=True) + +###################################################################### +# We create another tensor ``Q`` from ``a`` and ``b``. # # .. math:: -# J=\left(\begin{array}{ccc} -# \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}}\\ -# \vdots & \ddots & \vdots\\ -# \frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}} -# \end{array}\right) +# Q = 3a^3 - b^2 + +Q = 3*a**3 - b**2 + + +###################################################################### +# Let's assume ``a`` and ``b`` to be parameters of an NN, and ``Q`` +# to be the error. In NN training, we want gradients of the error +# w.r.t. parameters, i.e. +# +# .. math:: +# \frac{\partial Q}{\partial a} = 9a^2 +# +# .. math:: +# \frac{\partial Q}{\partial b} = -2b +# +# +# When we call ``.backward()`` on ``Q``, autograd calculates these gradients +# and stores them in the respective tensors' ``.grad`` attribute. +# +# We need to explicitly pass a ``gradient`` argument in ``Q.backward()`` because it is a vector. +# ``gradient`` is a tensor of the same shape as ``Q``, and it represents the +# gradient of Q w.r.t. itself, i.e. +# +# .. math:: +# \frac{dQ}{dQ} = 1 +# +# Equivalently, we can also aggregate Q into a scalar and call backward implicitly, like ``Q.sum().backward()``. +# +external_grad = torch.tensor([1., 1.]) +Q.backward(gradient=external_grad) + + +####################################################################### +# Gradients are now deposited in ``a.grad`` and ``b.grad`` + +# check if collected gradients are correct +print(9*a**2 == a.grad) +print(-2*b == b.grad) + + +###################################################################### +# Optional Reading - Vector Calculus using ``autograd`` +# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +# +# Mathematically, if you have a vector valued function +# :math:`\vec{y}=f(\vec{x})`, then the gradient of :math:`\vec{y}` with +# respect to :math:`\vec{x}` is a Jacobian matrix :math:`J`: +# +# .. math:: +# +# +# J +# = +# \left(\begin{array}{cc} +# \frac{\partial \bf{y}}{\partial x_{1}} & +# ... & +# \frac{\partial \bf{y}}{\partial x_{n}} +# \end{array}\right) +# = +# \left(\begin{array}{ccc} +# \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}}\\ +# \vdots & \ddots & \vdots\\ +# \frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}} +# \end{array}\right) # # Generally speaking, ``torch.autograd`` is an engine for computing -# vector-Jacobian product. That is, given any vector -# :math:`v=\left(\begin{array}{cccc} v_{1} & v_{2} & \cdots & v_{m}\end{array}\right)^{T}`, -# compute the product :math:`v^{T}\cdot J`. If :math:`v` happens to be -# the gradient of a scalar function :math:`l=g\left(\vec{y}\right)`, -# that is, -# :math:`v=\left(\begin{array}{ccc}\frac{\partial l}{\partial y_{1}} & \cdots & \frac{\partial l}{\partial y_{m}}\end{array}\right)^{T}`, +# vector-Jacobian product. That is, given any vector :math:`\vec{v}`, compute the product +# :math:`J^{T}\cdot \vec{v}` +# +# If :math:`v` happens to be the gradient of a scalar function +# +# .. math:: +# +# +# l +# = +# g\left(\vec{y}\right) +# = +# \left(\begin{array}{ccc}\frac{\partial l}{\partial y_{1}} & \cdots & \frac{\partial l}{\partial y_{m}}\end{array}\right)^{T} +# # then by the chain rule, the vector-Jacobian product would be the # gradient of :math:`l` with respect to :math:`\vec{x}`: # # .. math:: -# J^{T}\cdot v=\left(\begin{array}{ccc} -# \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{1}}\\ -# \vdots & \ddots & \vdots\\ -# \frac{\partial y_{1}}{\partial x_{n}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}} -# \end{array}\right)\left(\begin{array}{c} -# \frac{\partial l}{\partial y_{1}}\\ -# \vdots\\ -# \frac{\partial l}{\partial y_{m}} -# \end{array}\right)=\left(\begin{array}{c} -# \frac{\partial l}{\partial x_{1}}\\ -# \vdots\\ -# \frac{\partial l}{\partial x_{n}} -# \end{array}\right) -# -# (Note that :math:`v^{T}\cdot J` gives a row vector which can be -# treated as a column vector by taking :math:`J^{T}\cdot v`.) -# -# This characteristic of vector-Jacobian product makes it very -# convenient to feed external gradients into a model that has -# non-scalar output. - -############################################################### -# Now let's take a look at an example of vector-Jacobian product: - -x = torch.randn(3, requires_grad=True) - -y = x * 2 -while y.data.norm() < 1000: - y = y * 2 - -print(y) - -############################################################### -# Now in this case ``y`` is no longer a scalar. ``torch.autograd`` -# could not compute the full Jacobian directly, but if we just -# want the vector-Jacobian product, simply pass the vector to -# ``backward`` as argument: -v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float) -y.backward(v) - -print(x.grad) - -############################################################### -# You can also stop autograd from tracking history on Tensors -# with ``.requires_grad=True`` either by wrapping the code block in -# ``with torch.no_grad():`` -print(x.requires_grad) -print((x ** 2).requires_grad) - -with torch.no_grad(): - print((x ** 2).requires_grad) - -############################################################### -# Or by using ``.detach()`` to get a new Tensor with the same -# content but that does not require gradients: -print(x.requires_grad) -y = x.detach() -print(y.requires_grad) -print(x.eq(y).all()) - - -############################################################### -# **Read Later:** -# -# Document about ``autograd.Function`` is at -# https://pytorch.org/docs/stable/autograd.html#function +# +# +# J^{T}\cdot \vec{v}=\left(\begin{array}{ccc} +# \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{1}}\\ +# \vdots & \ddots & \vdots\\ +# \frac{\partial y_{1}}{\partial x_{n}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}} +# \end{array}\right)\left(\begin{array}{c} +# \frac{\partial l}{\partial y_{1}}\\ +# \vdots\\ +# \frac{\partial l}{\partial y_{m}} +# \end{array}\right)=\left(\begin{array}{c} +# \frac{\partial l}{\partial x_{1}}\\ +# \vdots\\ +# \frac{\partial l}{\partial x_{n}} +# \end{array}\right) +# +# This characteristic of vector-Jacobian product is what we use in the above example; +# ``external_grad`` represents :math:`\vec{v}`. +# + + + +###################################################################### +# Computational Graph +# ~~~~~~~~~~~~~~~~~~~ +# +# Conceptually, autograd keeps a record of data (tensors) & all executed +# operations (along with the resulting new tensors) in a directed acyclic +# graph (DAG) consisting of +# `Function `__ +# objects. In this DAG, leaves are the input tensors, roots are the output +# tensors. By tracing this graph from roots to leaves, you can +# automatically compute the gradients using the chain rule. +# +# In a forward pass, autograd does two things simultaneously: +# +# - run the requested operation to compute a resulting tensor, and +# - maintain the operation’s *gradient function* in the DAG. +# +# The backward pass kicks off when ``.backward()`` is called on the DAG +# root. ``autograd`` then: +# +# - computes the gradients from each ``.grad_fn``, +# - accumulates them in the respective tensor’s ``.grad`` attribute, and +# - using the chain rule, propagates all the way to the leaf tensors. +# +# Below is a visual representation of the DAG in our example. In the graph, +# the arrows are in the direction of the forward pass. The nodes represent the backward functions +# of each operation in the forward pass. The leaf nodes in blue represent our leaf tensors ``a`` and ``b``. +# +# .. figure:: /_static/img/dag_autograd.png +# +# .. note:: +# **DAGs are dynamic in PyTorch** +# An important thing to note is that the graph is recreated from scratch; after each +# ``.backward()`` call, autograd starts populating a new graph. This is +# exactly what allows you to use control flow statements in your model; +# you can change the shape, size and operations at every iteration if +# needed. +# +# Exclusion from the DAG +# ^^^^^^^^^^^^^^^^^^^^^^ +# +# ``torch.autograd`` tracks operations on all tensors which have their +# ``requires_grad`` flag set to ``True``. For tensors that don’t require +# gradients, setting this attribute to ``False`` excludes it from the +# gradient computation DAG. +# +# The output tensor of an operation will require gradients even if only a +# single input tensor has ``requires_grad=True``. +# + +x = torch.rand(5, 5) +y = torch.rand(5, 5) +z = torch.rand((5, 5), requires_grad=True) + +a = x + y +print(f"Does `a` require gradients? : {a.requires_grad}") +b = x + z +print(f"Does `b` require gradients?: {b.requires_grad}") + + +###################################################################### +# In a NN, parameters that don't compute gradients are usually called **frozen parameters**. +# It is useful to "freeze" part of your model if you know in advance that you won't need the gradients of those parameters +# (this offers some performance benefits by reducing autograd computations). +# +# Another common usecase where exclusion from the DAG is important is for +# `finetuning a pretrained network `__ +# +# In finetuning, we freeze most of the model and typically only modify the classifier layers to make predictions on new labels. +# Let's walk through a small example to demonstrate this. As before, we load a pretrained resnet18 model, and freeze all the parameters. + +from torch import nn, optim + +model = torchvision.models.resnet18(pretrained=True) + +# Freeze all the parameters in the network +for param in model.parameters(): + param.requires_grad = False + +###################################################################### +# Let's say we want to finetune the model on a new dataset with 10 labels. +# In resnet, the classifier is the last linear layer ``model.fc``. +# We can simply replace it with a new linear layer (unfrozen by default) +# that acts as our classifier. + +model.fc = nn.Linear(512, 10) + +###################################################################### +# Now all parameters in the model, except the parameters of ``model.fc``, are frozen. +# The only parameters that compute gradients are the weights and bias of ``model.fc``. + +# Optimize only the classifier +optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9) + +########################################################################## +# Notice although we register all the parameters in the optimizer, +# the only parameters that are computing gradients (and hence updated in gradient descent) +# are the weights and bias of the classifier. +# +# The same exclusionary functionality is available as a context manager in +# `torch.no_grad() `__ +# + +###################################################################### +# -------------- +# + +###################################################################### +# Further readings: +# ~~~~~~~~~~~~~~~~~~~ +# +# - `In-place operations & Multithreaded Autograd `__ +# - `Example implementation of reverse-mode autodiff `__ diff --git a/beginner_source/blitz/tensor_tutorial.py b/beginner_source/blitz/tensor_tutorial.py index 7b339ee225f..a949f205d8b 100644 --- a/beginner_source/blitz/tensor_tutorial.py +++ b/beginner_source/blitz/tensor_tutorial.py @@ -1,195 +1,200 @@ -# -*- coding: utf-8 -*- """ -What is PyTorch? -================ - -It’s a Python-based scientific computing package targeted at two sets of -audiences: - -- A replacement for NumPy to use the power of GPUs -- a deep learning research platform that provides maximum flexibility - and speed +Tensors +-------------------------------------------- -Getting Started ---------------- +Tensors are a specialized data structure that are very similar to arrays +and matrices. In PyTorch, we use tensors to encode the inputs and +outputs of a model, as well as the model’s parameters. -Tensors -^^^^^^^ +Tensors are similar to NumPy’s ndarrays, except that tensors can run on +GPUs or other specialized hardware to accelerate computing. If you’re familiar with ndarrays, you’ll +be right at home with the Tensor API. If not, follow along in this quick +API walkthrough. -Tensors are similar to NumPy’s ndarrays, with the addition being that -Tensors can also be used on a GPU to accelerate computing. """ -from __future__ import print_function import torch +import numpy as np -############################################################### -# .. note:: -# An uninitialized matrix is declared, -# but does not contain definite known -# values before it is used. When an -# uninitialized matrix is created, -# whatever values were in the allocated -# memory at the time will appear as the initial values. - -############################################################### -# Construct a 5x3 matrix, uninitialized: -x = torch.empty(5, 3) -print(x) - -############################################################### -# Construct a randomly initialized matrix: +###################################################################### +# Tensor Initialization +# ~~~~~~~~~~~~~~~~~~~~~ +# +# Tensors can be initialized in various ways. Take a look at the following examples: +# +# **Directly from data** +# +# Tensors can be created directly from data. The data type is automatically inferred. -x = torch.rand(5, 3) -print(x) +data = [[1, 2],[3, 4]] +x_data = torch.tensor(data) -############################################################### -# Construct a matrix filled zeros and of dtype long: +###################################################################### +# **From a NumPy array** +# +# Tensors can be created from NumPy arrays (and vice versa - see :ref:`bridge-to-np-label`). +np_array = np.array(data) +x_np = torch.from_numpy(np_array) -x = torch.zeros(5, 3, dtype=torch.long) -print(x) ############################################################### -# Construct a tensor directly from data: - -x = torch.tensor([5.5, 3]) -print(x) +# **From another tensor:** +# +# The new tensor retains the properties (shape, datatype) of the argument tensor, unless explicitly overridden. -############################################################### -# or create a tensor based on an existing tensor. These methods -# will reuse properties of the input tensor, e.g. dtype, unless -# new values are provided by user +x_ones = torch.ones_like(x_data) # retains the properties of x_data +print(f"Ones Tensor: \n {x_ones} \n") -x = x.new_ones(5, 3, dtype=torch.double) # new_* methods take in sizes -print(x) +x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data +print(f"Random Tensor: \n {x_rand} \n") -x = torch.randn_like(x, dtype=torch.float) # override dtype! -print(x) # result has the same size -############################################################### -# Get its size: +###################################################################### +# **With random or constant values:** +# +# ``shape`` is a tuple of tensor dimensions. In the functions below, it determines the dimensionality of the output tensor. -print(x.size()) +shape = (2,3,) +rand_tensor = torch.rand(shape) +ones_tensor = torch.ones(shape) +zeros_tensor = torch.zeros(shape) -############################################################### -# .. note:: -# ``torch.Size`` is in fact a tuple, so it supports all tuple operations. -# -# Operations -# ^^^^^^^^^^ -# There are multiple syntaxes for operations. In the following -# example, we will take a look at the addition operation. -# -# Addition: syntax 1 -y = torch.rand(5, 3) -print(x + y) +print(f"Random Tensor: \n {rand_tensor} \n") +print(f"Ones Tensor: \n {ones_tensor} \n") +print(f"Zeros Tensor: \n {zeros_tensor}") -############################################################### -# Addition: syntax 2 -print(torch.add(x, y)) -############################################################### -# Addition: providing an output tensor as argument -result = torch.empty(5, 3) -torch.add(x, y, out=result) -print(result) -############################################################### -# Addition: in-place +###################################################################### +# -------------- +# -# adds x to y -y.add_(x) -print(y) -############################################################### -# .. note:: -# Any operation that mutates a tensor in-place is post-fixed with an ``_``. -# For example: ``x.copy_(y)``, ``x.t_()``, will change ``x``. +###################################################################### +# Tensor Attributes +# ~~~~~~~~~~~~~~~~~ # -# You can use standard NumPy-like indexing with all bells and whistles! +# Tensor attributes describe their shape, datatype, and the device on which they are stored. -print(x[:, 1]) +tensor = torch.rand(3,4) -############################################################### -# Resizing: If you want to resize/reshape tensor, you can use ``torch.view``: -x = torch.randn(4, 4) -y = x.view(16) -z = x.view(-1, 8) # the size -1 is inferred from other dimensions -print(x.size(), y.size(), z.size()) +print(f"Shape of tensor: {tensor.shape}") +print(f"Datatype of tensor: {tensor.dtype}") +print(f"Device tensor is stored on: {tensor.device}") -############################################################### -# If you have a one element tensor, use ``.item()`` to get the value as a -# Python number -x = torch.randn(1) -print(x) -print(x.item()) -############################################################### -# **Read later:** +###################################################################### +# -------------- # + + +###################################################################### +# Tensor Operations +# ~~~~~~~~~~~~~~~~~ # -# 100+ Tensor operations, including transposing, indexing, slicing, -# mathematical operations, linear algebra, random numbers, etc., -# are described -# `here `_. +# Over 100 tensor operations, including transposing, indexing, slicing, +# mathematical operations, linear algebra, random sampling, and more are +# comprehensively described +# `here `__. # -# NumPy Bridge -# ------------ +# Each of them can be run on the GPU (at typically higher speeds than on a +# CPU). If you’re using Colab, allocate a GPU by going to Edit > Notebook +# Settings. # -# Converting a Torch Tensor to a NumPy array and vice versa is a breeze. + +# We move our tensor to the GPU if available +if torch.cuda.is_available(): + tensor = tensor.to('cuda') + + +###################################################################### +# Try out some of the operations from the list. +# If you're familiar with the NumPy API, you'll find the Tensor API a breeze to use. # -# The Torch Tensor and NumPy array will share their underlying memory -# locations (if the Torch Tensor is on CPU), and changing one will change -# the other. + +############################################################### +# **Standard numpy-like indexing and slicing:** + +tensor = torch.ones(4, 4) +tensor[:,1] = 0 +print(tensor) + +###################################################################### +# **Joining tensors** You can use ``torch.cat`` to concatenate a sequence of tensors along a given dimension. +# See also `torch.stack `__, +# another tensor joining op that is subtly different from ``torch.cat``. +t1 = torch.cat([tensor, tensor, tensor], dim=1) +print(t1) + +###################################################################### +# **Multiplying tensors** + +# This computes the element-wise product +print(f"tensor.mul(tensor) \n {tensor.mul(tensor)} \n") +# Alternative syntax: +print(f"tensor * tensor \n {tensor * tensor}") + +###################################################################### # -# Converting a Torch Tensor to a NumPy Array -# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +# This computes the matrix multiplication between two tensors +print(f"tensor.matmul(tensor.T) \n {tensor.matmul(tensor.T)} \n") +# Alternative syntax: +print(f"tensor @ tensor.T \n {tensor @ tensor.T}") -a = torch.ones(5) -print(a) -############################################################### +###################################################################### +# **In-place operations** +# Operations that have a ``_`` suffix are in-place. For example: ``x.copy_(y)``, ``x.t_()``, will change ``x``. + +print(tensor, "\n") +tensor.add_(5) +print(tensor) + +###################################################################### +# .. note:: +# In-place operations save some memory, but can be problematic when computing derivatives because of an immediate loss +# of history. Hence, their use is discouraged. + +###################################################################### +# -------------- # -b = a.numpy() -print(b) -############################################################### -# See how the numpy array changed in value. +###################################################################### +# .. _bridge-to-np-label: +# +# Bridge with NumPy +# ~~~~~~~~~~~~~~~~~ +# Tensors on the CPU and NumPy arrays can share their underlying memory +# locations, and changing one will change the other. -a.add_(1) -print(a) -print(b) -############################################################### -# Converting NumPy Array to Torch Tensor +###################################################################### +# Tensor to NumPy array # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -# See how changing the np array changed the Torch Tensor automatically +t = torch.ones(5) +print(f"t: {t}") +n = t.numpy() +print(f"n: {n}") -import numpy as np -a = np.ones(5) -b = torch.from_numpy(a) -np.add(a, 1, out=a) -print(a) -print(b) +###################################################################### +# A change in the tensor reflects in the NumPy array. -############################################################### -# All the Tensors on the CPU except a CharTensor support converting to -# NumPy and back. -# -# CUDA Tensors -# ------------ -# -# Tensors can be moved onto any device using the ``.to`` method. +t.add_(1) +print(f"t: {t}") +print(f"n: {n}") -# let us run this cell only if CUDA is available -# We will use ``torch.device`` objects to move tensors in and out of GPU -if torch.cuda.is_available(): - device = torch.device("cuda") # a CUDA device object - y = torch.ones_like(x, device=device) # directly create a tensor on GPU - x = x.to(device) # or just use strings ``.to("cuda")`` - z = x + y - print(z) - print(z.to("cpu", torch.double)) # ``.to`` can also change dtype together! + +###################################################################### +# NumPy array to Tensor +# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +n = np.ones(5) +t = torch.from_numpy(n) + +###################################################################### +# Changes in the NumPy array reflects in the tensor. +np.add(n, 1, out=n) +print(f"t: {t}") +print(f"n: {n}") diff --git a/beginner_source/deep_learning_60min_blitz.rst b/beginner_source/deep_learning_60min_blitz.rst index d07d34c0077..4fc156c08ce 100644 --- a/beginner_source/deep_learning_60min_blitz.rst +++ b/beginner_source/deep_learning_60min_blitz.rst @@ -8,13 +8,18 @@ Deep Learning with PyTorch: A 60 Minute Blitz -Goal of this tutorial: +What is PyTorch? +~~~~~~~~~~~~~~~~~~~~~ +PyTorch is a Python-based scientific computing package serving two broad purposes: + +- A replacement for NumPy to use the power of GPUs and other accelerators. +- An automatic differentiation library that is useful to implement neural networks. -- Understand PyTorch’s Tensor library and neural networks at a high - level. -- Train a small neural network to classify images +Goal of this tutorial: +~~~~~~~~~~~~~~~~~~~~~~~~ +- Understand PyTorch’s Tensor library and neural networks at a high level. +- Train a small neural network to classify images -*This tutorial assumes that you have a basic familiarity of numpy* .. Note:: Make sure you have the `torch`_ and `torchvision`_ packages installed.