diff --git a/_static/img/dag_autograd.png b/_static/img/dag_autograd.png
new file mode 100644
index 00000000000..cdc50fed625
Binary files /dev/null and b/_static/img/dag_autograd.png differ
diff --git a/beginner_source/blitz/autograd_tutorial.py b/beginner_source/blitz/autograd_tutorial.py
index 98e70a251d6..53659225d82 100644
--- a/beginner_source/blitz/autograd_tutorial.py
+++ b/beginner_source/blitz/autograd_tutorial.py
@@ -1,198 +1,322 @@
 # -*- coding: utf-8 -*-
 """
-Autograd: Automatic Differentiation
-===================================
+A Gentle Introduction to ``torch.autograd``
+---------------------------------
 
-Central to all neural networks in PyTorch is the ``autograd`` package.
-Let’s first briefly visit this, and we will then go to training our
-first neural network.
+``torch.autograd`` is PyTorch’s automatic differentiation engine that powers
+neural network training. In this section, you will get a conceptual
+understanding of how autograd helps a neural network train.
 
+Background
+~~~~~~~~~~
+Neural networks (NNs) are a collection of nested functions that are
+executed on some input data. These functions are defined by *parameters*
+(consisting of weights and biases), which in PyTorch are stored in
+tensors.
 
-The ``autograd`` package provides automatic differentiation for all operations
-on Tensors. It is a define-by-run framework, which means that your backprop is
-defined by how your code is run, and that every single iteration can be
-different.
+Training a NN happens in two steps:
 
-Let us see this in more simple terms with some examples.
+**Forward Propagation**: In forward prop, the NN makes its best guess
+about the correct output. It runs the input data through each of its
+functions to make this guess.
 
-Tensor
---------
+**Backward Propagation**: In backprop, the NN adjusts its parameters
+proportionate to the error in its guess. It does this by traversing
+backwards from the output, collecting the derivatives of the error with
+respect to the parameters of the functions (*gradients*), and optimizing
+the parameters using gradient descent. For a more detailed walkthrough
+of backprop, check out this `video from
+3Blue1Brown <https://www.youtube.com/watch?v=tIeHLnjs5U8>`__.
 
-``torch.Tensor`` is the central class of the package. If you set its attribute
-``.requires_grad`` as ``True``, it starts to track all operations on it. When
-you finish your computation you can call ``.backward()`` and have all the
-gradients computed automatically. The gradient for this tensor will be
-accumulated into ``.grad`` attribute.
 
-To stop a tensor from tracking history, you can call ``.detach()`` to detach
-it from the computation history, and to prevent future computation from being
-tracked.
 
-To prevent tracking history (and using memory), you can also wrap the code block
-in ``with torch.no_grad():``. This can be particularly helpful when evaluating a
-model because the model may have trainable parameters with
-``requires_grad=True``, but for which we don't need the gradients.
 
-There’s one more class which is very important for autograd
-implementation - a ``Function``.
+Usage in PyTorch
+~~~~~~~~~~~
+Let's take a look at a single training step.
+For this example, we load a pretrained resnet18 model from ``torchvision``.
+We create a random data tensor to represent a single image with 3 channels, and height & width of 64,
+and its corresponding ``label`` initialized to some random values.
+"""
+import torch, torchvision
+model = torchvision.models.resnet18(pretrained=True)
+data = torch.rand(1, 3, 64, 64)
+labels = torch.rand(1, 1000)
+
+############################################################
+# Next, we run the input data through the model through each of its layers to make a prediction.
+# This is the **forward pass**.
+#
 
-``Tensor`` and ``Function`` are interconnected and build up an acyclic
-graph, that encodes a complete history of computation. Each tensor has
-a ``.grad_fn`` attribute that references a ``Function`` that has created
-the ``Tensor`` (except for Tensors created by the user - their
-``grad_fn is None``).
+prediction = model(data) # forward pass
 
-If you want to compute the derivatives, you can call ``.backward()`` on
-a ``Tensor``. If ``Tensor`` is a scalar (i.e. it holds a one element
-data), you don’t need to specify any arguments to ``backward()``,
-however if it has more elements, you need to specify a ``gradient``
-argument that is a tensor of matching shape.
-"""
+############################################################
+# We use the model's prediction and the corresponding label to calculate the error (``loss``).
+# The next step is to backpropagate this error through the network.
+# Backward propagation is kicked off when we call ``.backward()`` on the error tensor.
+# Autograd then calculates and stores the gradients for each model parameter in the parameter's ``.grad`` attribute.
+#
+
+loss = (prediction - labels).sum()
+loss.backward() # backward pass
+
+############################################################
+# Next, we load an optimizer, in this case SGD with a learning rate of 0.01 and momentum of 0.9.
+# We register all the parameters of the model in the optimizer.
+#
+
+optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)
+
+######################################################################
+# Finally, we call ``.step()`` to initiate gradient descent. The optimizer adjusts each parameter by its gradient stored in ``.grad``.
+#
+
+optim.step() #gradient descent
+
+######################################################################
+# At this point, you have everything you need to train your neural network.
+# The below sections detail the workings of autograd - feel free to skip them.
+#
+
+
+######################################################################
+# --------------
+#
+
+
+######################################################################
+# Differentiation in Autograd
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# Let's take a look at how ``autograd`` collects gradients. We create two tensors ``a`` and ``b`` with
+# ``requires_grad=True``. This signals to ``autograd`` that every operation on them should be tracked.
+#
 
 import torch
 
-###############################################################
-# Create a tensor and set ``requires_grad=True`` to track computation with it
-x = torch.ones(2, 2, requires_grad=True)
-print(x)
-
-###############################################################
-# Do a tensor operation:
-y = x + 2
-print(y)
-
-###############################################################
-# ``y`` was created as a result of an operation, so it has a ``grad_fn``.
-print(y.grad_fn)
-
-###############################################################
-# Do more operations on ``y``
-z = y * y * 3
-out = z.mean()
-
-print(z, out)
-
-################################################################
-# ``.requires_grad_( ... )`` changes an existing Tensor's ``requires_grad``
-# flag in-place. The input flag defaults to ``False`` if not given.
-a = torch.randn(2, 2)
-a = ((a * 3) / (a - 1))
-print(a.requires_grad)
-a.requires_grad_(True)
-print(a.requires_grad)
-b = (a * a).sum()
-print(b.grad_fn)
-
-###############################################################
-# Gradients
-# ---------
-# Let's backprop now.
-# Because ``out`` contains a single scalar, ``out.backward()`` is
-# equivalent to ``out.backward(torch.tensor(1.))``.
-
-out.backward()
-
-###############################################################
-# Print gradients d(out)/dx
-#
-
-print(x.grad)
-
-###############################################################
-# You should have got a matrix of ``4.5``. Let’s call the ``out``
-# *Tensor* “:math:`o`”.
-# We have that :math:`o = \frac{1}{4}\sum_i z_i`,
-# :math:`z_i = 3(x_i+2)^2` and :math:`z_i\bigr\rvert_{x_i=1} = 27`.
-# Therefore,
-# :math:`\frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2)`, hence
-# :math:`\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5`.
-
-###############################################################
-# Mathematically, if you have a vector valued function :math:`\vec{y}=f(\vec{x})`,
-# then the gradient of :math:`\vec{y}` with respect to :math:`\vec{x}`
-# is a Jacobian matrix:
+a = torch.tensor([2., 3.], requires_grad=True)
+b = torch.tensor([6., 4.], requires_grad=True)
+
+######################################################################
+# We create another tensor ``Q`` from ``a`` and ``b``.
 #
 # .. math::
-#   J=\left(\begin{array}{ccc}
-#    \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}}\\
-#    \vdots & \ddots & \vdots\\
-#    \frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
-#    \end{array}\right)
+#    Q = 3a^3 - b^2
+
+Q = 3*a**3 - b**2
+
+
+######################################################################
+# Let's assume ``a`` and ``b`` to be parameters of an NN, and ``Q``
+# to be the error. In NN training, we want gradients of the error
+# w.r.t. parameters, i.e.
+#
+# .. math::
+#    \frac{\partial Q}{\partial a} = 9a^2
+#
+# .. math::
+#    \frac{\partial Q}{\partial b} = -2b
+#
+#
+# When we call ``.backward()`` on ``Q``, autograd calculates these gradients
+# and stores them in the respective tensors' ``.grad`` attribute.
+#
+# We need to explicitly pass a ``gradient`` argument in ``Q.backward()`` because it is a vector.
+# ``gradient`` is a tensor of the same shape as ``Q``, and it represents the
+# gradient of Q w.r.t. itself, i.e.
+#
+# .. math::
+#    \frac{dQ}{dQ} = 1
+#
+# Equivalently, we can also aggregate Q into a scalar and call backward implicitly, like ``Q.sum().backward()``.
+#
+external_grad = torch.tensor([1., 1.])
+Q.backward(gradient=external_grad)
+
+
+#######################################################################
+# Gradients are now deposited in ``a.grad`` and ``b.grad``
+
+# check if collected gradients are correct
+print(9*a**2 == a.grad)
+print(-2*b == b.grad)
+
+
+######################################################################
+# Optional Reading - Vector Calculus using ``autograd``
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+#
+# Mathematically, if you have a vector valued function
+# :math:`\vec{y}=f(\vec{x})`, then the gradient of :math:`\vec{y}` with
+# respect to :math:`\vec{x}` is a Jacobian matrix :math:`J`:
+#
+# .. math::
+#
+#
+#      J
+#      =
+#       \left(\begin{array}{cc}
+#       \frac{\partial \bf{y}}{\partial x_{1}} &
+#       ... &
+#       \frac{\partial \bf{y}}{\partial x_{n}}
+#       \end{array}\right)
+#      =
+#      \left(\begin{array}{ccc}
+#       \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}}\\
+#       \vdots & \ddots & \vdots\\
+#       \frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
+#       \end{array}\right)
 #
 # Generally speaking, ``torch.autograd`` is an engine for computing
-# vector-Jacobian product. That is, given any vector
-# :math:`v=\left(\begin{array}{cccc} v_{1} & v_{2} & \cdots & v_{m}\end{array}\right)^{T}`,
-# compute the product :math:`v^{T}\cdot J`. If :math:`v` happens to be
-# the gradient of a scalar function :math:`l=g\left(\vec{y}\right)`,
-# that is,
-# :math:`v=\left(\begin{array}{ccc}\frac{\partial l}{\partial y_{1}} & \cdots & \frac{\partial l}{\partial y_{m}}\end{array}\right)^{T}`,
+# vector-Jacobian product. That is, given any vector :math:`\vec{v}`, compute the product
+# :math:`J^{T}\cdot \vec{v}`
+#
+# If :math:`v` happens to be the gradient of a scalar function
+#
+# .. math::
+#
+#
+#    l
+#    =
+#    g\left(\vec{y}\right)
+#    =
+#    \left(\begin{array}{ccc}\frac{\partial l}{\partial y_{1}} & \cdots & \frac{\partial l}{\partial y_{m}}\end{array}\right)^{T}
+#
 # then by the chain rule, the vector-Jacobian product would be the
 # gradient of :math:`l` with respect to :math:`\vec{x}`:
 #
 # .. math::
-#   J^{T}\cdot v=\left(\begin{array}{ccc}
-#    \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{1}}\\
-#    \vdots & \ddots & \vdots\\
-#    \frac{\partial y_{1}}{\partial x_{n}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
-#    \end{array}\right)\left(\begin{array}{c}
-#    \frac{\partial l}{\partial y_{1}}\\
-#    \vdots\\
-#    \frac{\partial l}{\partial y_{m}}
-#    \end{array}\right)=\left(\begin{array}{c}
-#    \frac{\partial l}{\partial x_{1}}\\
-#    \vdots\\
-#    \frac{\partial l}{\partial x_{n}}
-#    \end{array}\right)
-#
-# (Note that :math:`v^{T}\cdot J` gives a row vector which can be
-# treated as a column vector by taking :math:`J^{T}\cdot v`.)
-#
-# This characteristic of vector-Jacobian product makes it very
-# convenient to feed external gradients into a model that has
-# non-scalar output.
-
-###############################################################
-# Now let's take a look at an example of vector-Jacobian product:
-
-x = torch.randn(3, requires_grad=True)
-
-y = x * 2
-while y.data.norm() < 1000:
-    y = y * 2
-
-print(y)
-
-###############################################################
-# Now in this case ``y`` is no longer a scalar. ``torch.autograd``
-# could not compute the full Jacobian directly, but if we just
-# want the vector-Jacobian product, simply pass the vector to
-# ``backward`` as argument:
-v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
-y.backward(v)
-
-print(x.grad)
-
-###############################################################
-# You can also stop autograd from tracking history on Tensors
-# with ``.requires_grad=True`` either by wrapping the code block in
-# ``with torch.no_grad():``
-print(x.requires_grad)
-print((x ** 2).requires_grad)
-
-with torch.no_grad():
-	print((x ** 2).requires_grad)
-
-###############################################################
-# Or by using ``.detach()`` to get a new Tensor with the same
-# content but that does not require gradients:
-print(x.requires_grad)
-y = x.detach()
-print(y.requires_grad)
-print(x.eq(y).all())
-
-
-###############################################################
-# **Read Later:**
-#
-# Document about ``autograd.Function`` is at
-# https://pytorch.org/docs/stable/autograd.html#function
+#
+#
+#      J^{T}\cdot \vec{v}=\left(\begin{array}{ccc}
+#       \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{1}}\\
+#       \vdots & \ddots & \vdots\\
+#       \frac{\partial y_{1}}{\partial x_{n}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
+#       \end{array}\right)\left(\begin{array}{c}
+#       \frac{\partial l}{\partial y_{1}}\\
+#       \vdots\\
+#       \frac{\partial l}{\partial y_{m}}
+#       \end{array}\right)=\left(\begin{array}{c}
+#       \frac{\partial l}{\partial x_{1}}\\
+#       \vdots\\
+#       \frac{\partial l}{\partial x_{n}}
+#       \end{array}\right)
+#
+# This characteristic of vector-Jacobian product is what we use in the above example;
+# ``external_grad`` represents :math:`\vec{v}`.
+#
+
+
+
+######################################################################
+# Computational Graph
+# ~~~~~~~~~~~~~~~~~~~
+#
+# Conceptually, autograd keeps a record of data (tensors) & all executed
+# operations (along with the resulting new tensors) in a directed acyclic
+# graph (DAG) consisting of
+# `Function <https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function>`__
+# objects. In this DAG, leaves are the input tensors, roots are the output
+# tensors. By tracing this graph from roots to leaves, you can
+# automatically compute the gradients using the chain rule.
+#
+# In a forward pass, autograd does two things simultaneously:
+#
+# - run the requested operation to compute a resulting tensor, and
+# - maintain the operation’s *gradient function* in the DAG.
+#
+# The backward pass kicks off when ``.backward()`` is called on the DAG
+# root. ``autograd`` then:
+#
+# - computes the gradients from each ``.grad_fn``,
+# - accumulates them in the respective tensor’s ``.grad`` attribute, and
+# - using the chain rule, propagates all the way to the leaf tensors.
+#
+# Below is a visual representation of the DAG in our example. In the graph,
+# the arrows are in the direction of the forward pass. The nodes represent the backward functions
+# of each operation in the forward pass. The leaf nodes in blue represent our leaf tensors ``a`` and ``b``.
+#
+# .. figure:: /_static/img/dag_autograd.png
+#
+# .. note::
+#   **DAGs are dynamic in PyTorch**
+#   An important thing to note is that the graph is recreated from scratch; after each
+#   ``.backward()`` call, autograd starts populating a new graph. This is
+#   exactly what allows you to use control flow statements in your model;
+#   you can change the shape, size and operations at every iteration if
+#   needed.
+#
+# Exclusion from the DAG
+# ^^^^^^^^^^^^^^^^^^^^^^
+#
+# ``torch.autograd`` tracks operations on all tensors which have their
+# ``requires_grad`` flag set to ``True``. For tensors that don’t require
+# gradients, setting this attribute to ``False`` excludes it from the
+# gradient computation DAG.
+#
+# The output tensor of an operation will require gradients even if only a
+# single input tensor has ``requires_grad=True``.
+#
+
+x = torch.rand(5, 5)
+y = torch.rand(5, 5)
+z = torch.rand((5, 5), requires_grad=True)
+
+a = x + y
+print(f"Does `a` require gradients? : {a.requires_grad}")
+b = x + z
+print(f"Does `b` require gradients?: {b.requires_grad}")
+
+
+######################################################################
+# In a NN, parameters that don't compute gradients are usually called **frozen parameters**.
+# It is useful to "freeze" part of your model if you know in advance that you won't need the gradients of those parameters
+# (this offers some performance benefits by reducing autograd computations).
+#
+# Another common usecase where exclusion from the DAG is important is for
+# `finetuning a pretrained network <https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html>`__
+#
+# In finetuning, we freeze most of the model and typically only modify the classifier layers to make predictions on new labels.
+# Let's walk through a small example to demonstrate this. As before, we load a pretrained resnet18 model, and freeze all the parameters.
+
+from torch import nn, optim
+
+model = torchvision.models.resnet18(pretrained=True)
+
+# Freeze all the parameters in the network
+for param in model.parameters():
+    param.requires_grad = False
+
+######################################################################
+# Let's say we want to finetune the model on a new dataset with 10 labels.
+# In resnet, the classifier is the last linear layer ``model.fc``.
+# We can simply replace it with a new linear layer (unfrozen by default)
+# that acts as our classifier.
+
+model.fc = nn.Linear(512, 10)
+
+######################################################################
+# Now all parameters in the model, except the parameters of ``model.fc``, are frozen.
+# The only parameters that compute gradients are the weights and bias of ``model.fc``.
+
+# Optimize only the classifier
+optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9)
+
+##########################################################################
+# Notice although we register all the parameters in the optimizer,
+# the only parameters that are computing gradients (and hence updated in gradient descent)
+# are the weights and bias of the classifier.
+#
+# The same exclusionary functionality is available as a context manager in
+# `torch.no_grad() <https://pytorch.org/docs/stable/generated/torch.no_grad.html>`__
+#
+
+######################################################################
+# --------------
+#
+
+######################################################################
+# Further readings:
+# ~~~~~~~~~~~~~~~~~~~
+#
+# -  `In-place operations & Multithreaded Autograd <https://pytorch.org/docs/stable/notes/autograd.html>`__
+# -  `Example implementation of reverse-mode autodiff <https://colab.research.google.com/drive/1VpeE6UvEPRz9HmsHh1KS0XxXjYu533EC>`__
diff --git a/beginner_source/blitz/tensor_tutorial.py b/beginner_source/blitz/tensor_tutorial.py
index 7b339ee225f..a949f205d8b 100644
--- a/beginner_source/blitz/tensor_tutorial.py
+++ b/beginner_source/blitz/tensor_tutorial.py
@@ -1,195 +1,200 @@
-# -*- coding: utf-8 -*-
 """
-What is PyTorch?
-================
-
-It’s a Python-based scientific computing package targeted at two sets of
-audiences:
-
--  A replacement for NumPy to use the power of GPUs
--  a deep learning research platform that provides maximum flexibility
-   and speed
+Tensors
+--------------------------------------------
 
-Getting Started
----------------
+Tensors are a specialized data structure that are very similar to arrays
+and matrices. In PyTorch, we use tensors to encode the inputs and
+outputs of a model, as well as the model’s parameters.
 
-Tensors
-^^^^^^^
+Tensors are similar to NumPy’s ndarrays, except that tensors can run on
+GPUs or other specialized hardware to accelerate computing. If you’re familiar with ndarrays, you’ll
+be right at home with the Tensor API. If not, follow along in this quick
+API walkthrough.
 
-Tensors are similar to NumPy’s ndarrays, with the addition being that
-Tensors can also be used on a GPU to accelerate computing.
 """
 
-from __future__ import print_function
 import torch
+import numpy as np
 
-###############################################################
-# .. note::
-#     An uninitialized matrix is declared,
-#     but does not contain definite known
-#     values before it is used. When an
-#     uninitialized matrix is created,
-#     whatever values were in the allocated
-#     memory at the time will appear as the initial values.
-
-###############################################################
-# Construct a 5x3 matrix, uninitialized:
 
-x = torch.empty(5, 3)
-print(x)
- 
-###############################################################
-# Construct a randomly initialized matrix:
+######################################################################
+# Tensor Initialization
+# ~~~~~~~~~~~~~~~~~~~~~
+#
+# Tensors can be initialized in various ways. Take a look at the following examples:
+#
+# **Directly from data**
+#
+# Tensors can be created directly from data. The data type is automatically inferred.
 
-x = torch.rand(5, 3)
-print(x)
+data = [[1, 2],[3, 4]]
+x_data = torch.tensor(data)
 
-###############################################################
-# Construct a matrix filled zeros and of dtype long:
+######################################################################
+# **From a NumPy array**
+#
+# Tensors can be created from NumPy arrays (and vice versa - see :ref:`bridge-to-np-label`).
+np_array = np.array(data)
+x_np = torch.from_numpy(np_array)
 
-x = torch.zeros(5, 3, dtype=torch.long)
-print(x)
 
 ###############################################################
-# Construct a tensor directly from data:
-
-x = torch.tensor([5.5, 3])
-print(x)
+# **From another tensor:**
+#
+# The new tensor retains the properties (shape, datatype) of the argument tensor, unless explicitly overridden.
 
-###############################################################
-# or create a tensor based on an existing tensor. These methods
-# will reuse properties of the input tensor, e.g. dtype, unless
-# new values are provided by user
+x_ones = torch.ones_like(x_data) # retains the properties of x_data
+print(f"Ones Tensor: \n {x_ones} \n")
 
-x = x.new_ones(5, 3, dtype=torch.double)      # new_* methods take in sizes
-print(x)
+x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data
+print(f"Random Tensor: \n {x_rand} \n")
 
-x = torch.randn_like(x, dtype=torch.float)    # override dtype!
-print(x)                                      # result has the same size
 
-###############################################################
-# Get its size:
+######################################################################
+# **With random or constant values:**
+#
+# ``shape`` is a tuple of tensor dimensions. In the functions below, it determines the dimensionality of the output tensor.
 
-print(x.size())
+shape = (2,3,)
+rand_tensor = torch.rand(shape)
+ones_tensor = torch.ones(shape)
+zeros_tensor = torch.zeros(shape)
 
-###############################################################
-# .. note::
-#     ``torch.Size`` is in fact a tuple, so it supports all tuple operations.
-#
-# Operations
-# ^^^^^^^^^^
-# There are multiple syntaxes for operations. In the following
-# example, we will take a look at the addition operation.
-#
-# Addition: syntax 1
-y = torch.rand(5, 3)
-print(x + y)
+print(f"Random Tensor: \n {rand_tensor} \n")
+print(f"Ones Tensor: \n {ones_tensor} \n")
+print(f"Zeros Tensor: \n {zeros_tensor}")
 
-###############################################################
-# Addition: syntax 2
 
-print(torch.add(x, y))
 
-###############################################################
-# Addition: providing an output tensor as argument
-result = torch.empty(5, 3)
-torch.add(x, y, out=result)
-print(result)
 
-###############################################################
-# Addition: in-place
+######################################################################
+# --------------
+#
 
-# adds x to y
-y.add_(x)
-print(y)
 
-###############################################################
-# .. note::
-#     Any operation that mutates a tensor in-place is post-fixed with an ``_``.
-#     For example: ``x.copy_(y)``, ``x.t_()``, will change ``x``.
+######################################################################
+# Tensor Attributes
+# ~~~~~~~~~~~~~~~~~
 #
-# You can use standard NumPy-like indexing with all bells and whistles!
+# Tensor attributes describe their shape, datatype, and the device on which they are stored.
 
-print(x[:, 1])
+tensor = torch.rand(3,4)
 
-###############################################################
-# Resizing: If you want to resize/reshape tensor, you can use ``torch.view``:
-x = torch.randn(4, 4)
-y = x.view(16)
-z = x.view(-1, 8)  # the size -1 is inferred from other dimensions
-print(x.size(), y.size(), z.size())
+print(f"Shape of tensor: {tensor.shape}")
+print(f"Datatype of tensor: {tensor.dtype}")
+print(f"Device tensor is stored on: {tensor.device}")
 
-###############################################################
-# If you have a one element tensor, use ``.item()`` to get the value as a
-# Python number
-x = torch.randn(1)
-print(x)
-print(x.item())
 
-###############################################################
-# **Read later:**
+######################################################################
+# --------------
 #
+
+
+######################################################################
+# Tensor Operations
+# ~~~~~~~~~~~~~~~~~
 #
-#   100+ Tensor operations, including transposing, indexing, slicing,
-#   mathematical operations, linear algebra, random numbers, etc.,
-#   are described
-#   `here <https://pytorch.org/docs/torch>`_.
+# Over 100 tensor operations, including transposing, indexing, slicing,
+# mathematical operations, linear algebra, random sampling, and more are
+# comprehensively described
+# `here <https://pytorch.org/docs/stable/torch.html>`__.
 #
-# NumPy Bridge
-# ------------
+# Each of them can be run on the GPU (at typically higher speeds than on a
+# CPU). If you’re using Colab, allocate a GPU by going to Edit > Notebook
+# Settings.
 #
-# Converting a Torch Tensor to a NumPy array and vice versa is a breeze.
+
+# We move our tensor to the GPU if available
+if torch.cuda.is_available():
+  tensor = tensor.to('cuda')
+
+
+######################################################################
+# Try out some of the operations from the list.
+# If you're familiar with the NumPy API, you'll find the Tensor API a breeze to use.
 #
-# The Torch Tensor and NumPy array will share their underlying memory
-# locations (if the Torch Tensor is on CPU), and changing one will change
-# the other.
+
+###############################################################
+# **Standard numpy-like indexing and slicing:**
+
+tensor = torch.ones(4, 4)
+tensor[:,1] = 0
+print(tensor)
+
+######################################################################
+# **Joining tensors** You can use ``torch.cat`` to concatenate a sequence of tensors along a given dimension.
+# See also `torch.stack <https://pytorch.org/docs/stable/generated/torch.stack.html>`__,
+# another tensor joining op that is subtly different from ``torch.cat``.
+t1 = torch.cat([tensor, tensor, tensor], dim=1)
+print(t1)
+
+######################################################################
+# **Multiplying tensors**
+
+# This computes the element-wise product
+print(f"tensor.mul(tensor) \n {tensor.mul(tensor)} \n")
+# Alternative syntax:
+print(f"tensor * tensor \n {tensor * tensor}")
+
+######################################################################
 #
-# Converting a Torch Tensor to a NumPy Array
-# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+# This computes the matrix multiplication between two tensors
+print(f"tensor.matmul(tensor.T) \n {tensor.matmul(tensor.T)} \n")
+# Alternative syntax:
+print(f"tensor @ tensor.T \n {tensor @ tensor.T}")
 
-a = torch.ones(5)
-print(a)
 
-###############################################################
+######################################################################
+# **In-place operations**
+# Operations that have a ``_`` suffix are in-place. For example: ``x.copy_(y)``, ``x.t_()``, will change ``x``.
+
+print(tensor, "\n")
+tensor.add_(5)
+print(tensor)
+
+######################################################################
+# .. note::
+#      In-place operations save some memory, but can be problematic when computing derivatives because of an immediate loss
+#      of history. Hence, their use is discouraged.
+
+######################################################################
+# --------------
 #
 
-b = a.numpy()
-print(b)
 
-###############################################################
-# See how the numpy array changed in value.
+######################################################################
+# .. _bridge-to-np-label:
+#
+# Bridge with NumPy
+# ~~~~~~~~~~~~~~~~~
+# Tensors on the CPU and NumPy arrays can share their underlying memory
+# locations, and changing one will change	the other.
 
-a.add_(1)
-print(a)
-print(b)
 
-###############################################################
-# Converting NumPy Array to Torch Tensor
+######################################################################
+# Tensor to NumPy array
 # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-# See how changing the np array changed the Torch Tensor automatically
+t = torch.ones(5)
+print(f"t: {t}")
+n = t.numpy()
+print(f"n: {n}")
 
-import numpy as np
-a = np.ones(5)
-b = torch.from_numpy(a)
-np.add(a, 1, out=a)
-print(a)
-print(b)
+######################################################################
+# A change in the tensor reflects in the NumPy array.
 
-###############################################################
-# All the Tensors on the CPU except a CharTensor support converting to
-# NumPy and back.
-#
-# CUDA Tensors
-# ------------
-#
-# Tensors can be moved onto any device using the ``.to`` method.
+t.add_(1)
+print(f"t: {t}")
+print(f"n: {n}")
 
-# let us run this cell only if CUDA is available
-# We will use ``torch.device`` objects to move tensors in and out of GPU
-if torch.cuda.is_available():
-    device = torch.device("cuda")          # a CUDA device object
-    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
-    x = x.to(device)                       # or just use strings ``.to("cuda")``
-    z = x + y
-    print(z)
-    print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!
+
+######################################################################
+# NumPy array to Tensor
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+n = np.ones(5)
+t = torch.from_numpy(n)
+
+######################################################################
+# Changes in the NumPy array reflects in the tensor.
+np.add(n, 1, out=n)
+print(f"t: {t}")
+print(f"n: {n}")
diff --git a/beginner_source/deep_learning_60min_blitz.rst b/beginner_source/deep_learning_60min_blitz.rst
index d07d34c0077..4fc156c08ce 100644
--- a/beginner_source/deep_learning_60min_blitz.rst
+++ b/beginner_source/deep_learning_60min_blitz.rst
@@ -8,13 +8,18 @@ Deep Learning with PyTorch: A 60 Minute Blitz
      <iframe width="560" height="315" src="https://www.youtube.com/embed/u7x8RXwLKcA" frameborder="0" allow="accelerometer; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
    </div>
 
-Goal of this tutorial:
+What is PyTorch?
+~~~~~~~~~~~~~~~~~~~~~
+PyTorch is a Python-based scientific computing package serving two broad purposes:
+
+-  A replacement for NumPy to use the power of GPUs and other accelerators.
+-  An automatic differentiation library that is useful to implement neural networks.
 
--  Understand PyTorch’s Tensor library and neural networks at a high
-   level.
--  Train a small neural network to classify images
+Goal of this tutorial:
+~~~~~~~~~~~~~~~~~~~~~~~~
+- Understand PyTorch’s Tensor library and neural networks at a high level.
+- Train a small neural network to classify images
 
-*This tutorial assumes that you have a basic familiarity of numpy*
 
 .. Note::
     Make sure you have the `torch`_ and `torchvision`_ packages installed.