Skip to content

Commit 9514424

Browse files
authored
Merge branch 'main' into aws_graviton
2 parents 7081f4b + 219a9e3 commit 9514424

File tree

83 files changed

+281
-304
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

83 files changed

+281
-304
lines changed

.devcontainer/requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ ipython
2424
# to run examples
2525
pandas
2626
scikit-image
27-
pillow==10.0.1
27+
pillow==10.2.0
2828
wget
2929

3030
# for codespaces env

_templates/layout.html

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -112,14 +112,4 @@
112112
</script>
113113

114114
<img height="1" width="1" style="border-style:none;" alt="" src="https://www.googleadservices.com/pagead/conversion/795629140/?label=txkmCPmdtosBENSssfsC&amp;guid=ON&amp;script=0"/>
115-
116-
//temporarily add a link to survey
117-
<script>
118-
var survey = '<div class="survey-banner"><p><i class="fas fa-poll" aria-hidden="true">&nbsp </i> Take the annual <a href="https://forms.gle/jdNexNU6eZ8mCGDY7">PyTorch Tutorials survey</a>.</p></div>'
119-
if ($(".pytorch-call-to-action-links").length) {
120-
$(".pytorch-call-to-action-links").before(survey);
121-
} else {
122-
$("#pytorch-article").prepend(survey);
123-
}
124-
</script>
125115
{% endblock %}

advanced_source/ddp_pipeline.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -439,7 +439,7 @@ def evaluate(eval_model, data_source):
439439

440440
######################################################################
441441
# Evaluate the model with the test dataset
442-
# -------------------------------------
442+
# ----------------------------------------
443443
#
444444
# Apply the best model to check the result with the test dataset.
445445

advanced_source/dispatcher.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ for debugging in larger models where previously it can be hard to pin-point
129129
exactly where the ``requires_grad``-ness is lost during the forward pass.
130130

131131
In-place or view ops
132-
^^^^^^^^^^^^^^^^^^^
132+
^^^^^^^^^^^^^^^^^^^^
133133

134134
To ensure correctness and best possible performance, if your op mutates an input
135135
in-place or returns a tensor that aliases with one of the inputs, two additional

advanced_source/neural_style_tutorial.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@
8787
# to 255 tensor images.
8888
#
8989
#
90-
# .. Note::
90+
# .. note::
9191
# Here are links to download the images required to run the tutorial:
9292
# `picasso.jpg <https://pytorch.org/tutorials/_static/img/neural-style/picasso.jpg>`__ and
9393
# `dancing.jpg <https://pytorch.org/tutorials/_static/img/neural-style/dancing.jpg>`__.
@@ -183,7 +183,7 @@ def forward(self, input):
183183
return input
184184

185185
######################################################################
186-
# .. Note::
186+
# .. note::
187187
# **Important detail**: although this module is named ``ContentLoss``, it
188188
# is not a true PyTorch Loss function. If you want to define your content
189189
# loss as a PyTorch Loss function, you have to create a PyTorch autograd function
@@ -372,7 +372,7 @@ def get_style_model_and_losses(cnn, normalization_mean, normalization_std,
372372
input_img = content_img.clone()
373373
# if you want to use white noise by using the following code:
374374
#
375-
# ::
375+
# .. code-block:: python
376376
#
377377
# input_img = torch.randn(content_img.data.size())
378378

advanced_source/usb_semisup_learn.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,7 @@
157157

158158
######################################################################
159159
# Use USB to Train ``SoftMatch`` with specific imbalanced algorithm on imbalanced CIFAR-10
160-
# ------------------------------------------------------------------------------------
160+
# ----------------------------------------------------------------------------------------
161161
#
162162
# Now let's say we have imbalanced labeled set and unlabeled set of CIFAR-10,
163163
# and we want to train a ``SoftMatch`` model on it.

beginner_source/basics/autogradqs_tutorial.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
`Save & Load Model <saveloadrun_tutorial.html>`_
1111
1212
Automatic Differentiation with ``torch.autograd``
13-
=======================================
13+
=================================================
1414
1515
When training neural networks, the most frequently used algorithm is
1616
**back propagation**. In this algorithm, parameters (model weights) are
@@ -170,7 +170,7 @@
170170

171171
######################################################################
172172
# Optional Reading: Tensor Gradients and Jacobian Products
173-
# --------------------------------------
173+
# --------------------------------------------------------
174174
#
175175
# In many cases, we have a scalar loss function, and we need to compute
176176
# the gradient with respect to some parameters. However, there are cases

beginner_source/basics/buildmodel_tutorial.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
`Save & Load Model <saveloadrun_tutorial.html>`_
1111
1212
Build the Neural Network
13-
===================
13+
========================
1414
1515
Neural networks comprise of layers/modules that perform operations on data.
1616
The `torch.nn <https://pytorch.org/docs/stable/nn.html>`_ namespace provides all the building blocks you need to
@@ -197,5 +197,5 @@ def forward(self, x):
197197

198198
#################################################################
199199
# Further Reading
200-
# --------------
200+
# -----------------
201201
# - `torch.nn API <https://pytorch.org/docs/stable/nn.html>`_

beginner_source/basics/data_tutorial.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
`Save & Load Model <saveloadrun_tutorial.html>`_
1111
1212
Datasets & DataLoaders
13-
===================
13+
======================
1414
1515
"""
1616

@@ -69,7 +69,7 @@
6969

7070
#################################################################
7171
# Iterating and Visualizing the Dataset
72-
# -----------------
72+
# -------------------------------------
7373
#
7474
# We can index ``Datasets`` manually like a list: ``training_data[index]``.
7575
# We use ``matplotlib`` to visualize some samples in our training data.
@@ -144,7 +144,7 @@ def __getitem__(self, idx):
144144

145145

146146
#################################################################
147-
# __init__
147+
# ``__init__``
148148
# ^^^^^^^^^^^^^^^^^^^^
149149
#
150150
# The __init__ function is run once when instantiating the Dataset object. We initialize
@@ -167,7 +167,7 @@ def __init__(self, annotations_file, img_dir, transform=None, target_transform=N
167167

168168

169169
#################################################################
170-
# __len__
170+
# ``__len__``
171171
# ^^^^^^^^^^^^^^^^^^^^
172172
#
173173
# The __len__ function returns the number of samples in our dataset.
@@ -180,7 +180,7 @@ def __len__(self):
180180

181181

182182
#################################################################
183-
# __getitem__
183+
# ``__getitem__``
184184
# ^^^^^^^^^^^^^^^^^^^^
185185
#
186186
# The __getitem__ function loads and returns a sample from the dataset at the given index ``idx``.
@@ -220,7 +220,7 @@ def __getitem__(self, idx):
220220

221221
###########################
222222
# Iterate through the DataLoader
223-
# --------------------------
223+
# -------------------------------
224224
#
225225
# We have loaded that dataset into the ``DataLoader`` and can iterate through the dataset as needed.
226226
# Each iteration below returns a batch of ``train_features`` and ``train_labels`` (containing ``batch_size=64`` features and labels respectively).
@@ -243,5 +243,5 @@ def __getitem__(self, idx):
243243

244244
#################################################################
245245
# Further Reading
246-
# --------------
246+
# ----------------
247247
# - `torch.utils.data API <https://pytorch.org/docs/stable/data.html>`_

beginner_source/basics/intro.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,15 +31,15 @@
3131
3232
3333
Running the Tutorial Code
34-
------------------
34+
-------------------------
3535
You can run this tutorial in a couple of ways:
3636
3737
- **In the cloud**: This is the easiest way to get started! Each section has a "Run in Microsoft Learn" and "Run in Google Colab" link at the top, which opens an integrated notebook in Microsoft Learn or Google Colab, respectively, with the code in a fully-hosted environment.
3838
- **Locally**: This option requires you to setup PyTorch and TorchVision first on your local machine (`installation instructions <https://pytorch.org/get-started/locally/>`_). Download the notebook or copy the code into your favorite IDE.
3939
4040
4141
How to Use this Guide
42-
-----------------
42+
---------------------
4343
If you're familiar with other deep learning frameworks, check out the `0. Quickstart <quickstart_tutorial.html>`_ first
4444
to quickly familiarize yourself with PyTorch's API.
4545

beginner_source/basics/tensorqs_tutorial.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@
8080

8181
######################################################################
8282
# Attributes of a Tensor
83-
# ~~~~~~~~~~~~~~~~~
83+
# ~~~~~~~~~~~~~~~~~~~~~~
8484
#
8585
# Tensor attributes describe their shape, datatype, and the device on which they are stored.
8686

@@ -97,7 +97,7 @@
9797

9898
######################################################################
9999
# Operations on Tensors
100-
# ~~~~~~~~~~~~~~~~~
100+
# ~~~~~~~~~~~~~~~~~~~~~~~
101101
#
102102
# Over 100 tensor operations, including arithmetic, linear algebra, matrix manipulation (transposing,
103103
# indexing, slicing), sampling and more are

beginner_source/blitz/autograd_tutorial.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# -*- coding: utf-8 -*-
22
"""
33
A Gentle Introduction to ``torch.autograd``
4-
---------------------------------
4+
===========================================
55
66
``torch.autograd`` is PyTorch’s automatic differentiation engine that powers
77
neural network training. In this section, you will get a conceptual
@@ -149,7 +149,7 @@
149149

150150
######################################################################
151151
# Optional Reading - Vector Calculus using ``autograd``
152-
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
152+
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
153153
#
154154
# Mathematically, if you have a vector valued function
155155
# :math:`\vec{y}=f(\vec{x})`, then the gradient of :math:`\vec{y}` with

beginner_source/blitz/cifar10_tutorial.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@ def imshow(img):
115115

116116
########################################################################
117117
# 2. Define a Convolutional Neural Network
118-
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
118+
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
119119
# Copy the neural network from the Neural Networks section before and modify it to
120120
# take 3-channel images (instead of 1-channel images as it was defined).
121121

beginner_source/blitz/neural_networks_tutorial.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,7 @@ def forward(self, x):
161161
# ``.grad_fn`` attribute, you will see a graph of computations that looks
162162
# like this:
163163
#
164-
# ::
164+
# .. code-block:: sh
165165
#
166166
# input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
167167
# -> flatten -> linear -> relu -> linear -> relu -> linear
@@ -253,7 +253,7 @@ def forward(self, x):
253253

254254

255255
###############################################################
256-
# .. Note::
256+
# .. note::
257257
#
258258
# Observe how gradient buffers had to be manually set to zero using
259259
# ``optimizer.zero_grad()``. This is because gradients are accumulated

beginner_source/blitz/tensor_tutorial.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
"""
22
Tensors
3-
--------------------------------------------
3+
========
44
55
Tensors are a specialized data structure that are very similar to arrays
66
and matrices. In PyTorch, we use tensors to encode the inputs and

beginner_source/data_loading_tutorial.py

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -50,9 +50,9 @@
5050
# estimation <https://blog.dlib.net/2014/08/real-time-face-pose-estimation.html>`__
5151
# on a few images from imagenet tagged as 'face'.
5252
#
53-
# Dataset comes with a csv file with annotations which looks like this:
53+
# Dataset comes with a ``.csv`` file with annotations which looks like this:
5454
#
55-
# ::
55+
# .. code-block:: sh
5656
#
5757
# image_name,part_0_x,part_0_y,part_1_x,part_1_y,part_2_x, ... ,part_67_x,part_67_y
5858
# 0805personali01.jpg,27,83,27,98, ... 84,134
@@ -196,7 +196,7 @@ def __getitem__(self, idx):
196196
# called. For this, we just need to implement ``__call__`` method and
197197
# if required, ``__init__`` method. We can then use a transform like this:
198198
#
199-
# ::
199+
# .. code-block:: python
200200
#
201201
# tsfm = Transform(params)
202202
# transformed_sample = tsfm(sample)
@@ -421,7 +421,9 @@ def show_landmarks_batch(sample_batched):
421421
# and dataloader. ``torchvision`` package provides some common datasets and
422422
# transforms. You might not even have to write custom classes. One of the
423423
# more generic datasets available in torchvision is ``ImageFolder``.
424-
# It assumes that images are organized in the following way: ::
424+
# It assumes that images are organized in the following way:
425+
#
426+
# .. code-block:: sh
425427
#
426428
# root/ants/xxx.png
427429
# root/ants/xxy.jpeg
@@ -435,7 +437,9 @@ def show_landmarks_batch(sample_batched):
435437
#
436438
# where 'ants', 'bees' etc. are class labels. Similarly generic transforms
437439
# which operate on ``PIL.Image`` like ``RandomHorizontalFlip``, ``Scale``,
438-
# are also available. You can use these to write a dataloader like this: ::
440+
# are also available. You can use these to write a dataloader like this:
441+
#
442+
# .. code-block:: pytorch
439443
#
440444
# import torch
441445
# from torchvision import transforms, datasets

beginner_source/dcgan_faces_tutorial.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -226,7 +226,7 @@
226226
# the ``celeba`` directory you just created. The resulting directory
227227
# structure should be:
228228
#
229-
# ::
229+
# .. code-block:: sh
230230
#
231231
# /path/to/celeba
232232
# -> img_align_celeba
@@ -265,7 +265,7 @@
265265
plt.axis("off")
266266
plt.title("Training Images")
267267
plt.imshow(np.transpose(vutils.make_grid(real_batch[0].to(device)[:64], padding=2, normalize=True).cpu(),(1,2,0)))
268-
268+
plt.show()
269269

270270

271271
######################################################################

beginner_source/ddp_series_fault_tolerance.rst

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -93,11 +93,7 @@ In elastic training, whenever there are any membership changes (adding or removi
9393
on available devices. Having this structure ensures your training job can continue without manual intervention.
9494

9595

96-
97-
98-
9996
Diff for `multigpu.py <https://github.com/pytorch/examples/blob/main/distributed/ddp-tutorial-series/multigpu.py>`__ v/s `multigpu_torchrun.py <https://github.com/pytorch/examples/blob/main/distributed/ddp-tutorial-series/multigpu_torchrun.py>`__
100-
-----------------------------------------------------------
10197

10298
Process group initialization
10399
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

beginner_source/ddp_series_multigpu.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,6 @@ Along the way, we will talk through important concepts in distributed training w
5252

5353

5454
Diff for `single_gpu.py <https://github.com/pytorch/examples/blob/main/distributed/ddp-tutorial-series/single_gpu.py>`__ v/s `multigpu.py <https://github.com/pytorch/examples/blob/main/distributed/ddp-tutorial-series/multigpu.py>`__
55-
----------------------------------------------------
5655

5756
These are the changes you typically make to a single-GPU training script to enable DDP.
5857

beginner_source/dist_overview.rst

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,10 @@ common development trajectory would be:
7474
4. Use multi-machine `DistributedDataParallel <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__
7575
and the `launching script <https://github.com/pytorch/examples/blob/master/distributed/ddp/README.md>`__,
7676
if the application needs to scale across machine boundaries.
77-
5. Use `torch.distributed.elastic <https://pytorch.org/docs/stable/distributed.elastic.html>`__
77+
5. Use multi-GPU `FullyShardedDataParallel <https://pytorch.org/docs/stable/fsdp.html>`__
78+
training on a single-machine or multi-machine when the data and model cannot
79+
fit on one GPU.
80+
6. Use `torch.distributed.elastic <https://pytorch.org/docs/stable/distributed.elastic.html>`__
7881
to launch distributed training if errors (e.g., out-of-memory) are expected or if
7982
resources can join and leave dynamically during training.
8083

@@ -134,6 +137,18 @@ DDP materials are listed below:
134137
5. The `Distributed Training with Uneven Inputs Using the Join Context Manager <../advanced/generic_join.html>`__
135138
tutorial walks through using the generic join context for distributed training with uneven inputs.
136139

140+
141+
``torch.distributed.FullyShardedDataParallel``
142+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
143+
144+
The `FullyShardedDataParallel <https://pytorch.org/docs/stable/fsdp.html>`__
145+
(FSDP) is a type of data parallelism paradigm which maintains a per-GPU copy of a model’s
146+
parameters, gradients and optimizer states, it shards all of these states across
147+
data-parallel workers. The support for FSDP was added starting PyTorch v1.11. The tutorial
148+
`Getting Started with FSDP <https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`__
149+
provides in depth explanation and example of how FSDP works.
150+
151+
137152
torch.distributed.elastic
138153
~~~~~~~~~~~~~~~~~~~~~~~~~
139154

@@ -150,7 +165,7 @@ throws an exception, it is likely to lead to desynchronization (mismatched
150165
adds fault tolerance and the ability to make use of a dynamic pool of machines (elasticity).
151166

152167
RPC-Based Distributed Training
153-
----------------------------
168+
------------------------------
154169

155170
Many training paradigms do not fit into data parallelism, e.g.,
156171
parameter server paradigm, distributed pipeline parallelism, reinforcement

beginner_source/hyperparameter_tuning_tutorial.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -462,7 +462,7 @@ def main(num_samples=10, max_num_epochs=10, gpus_per_trial=2):
462462
######################################################################
463463
# If you run the code, an example output could look like this:
464464
#
465-
# ::
465+
# .. code-block:: sh
466466
#
467467
# Number of trials: 10/10 (10 TERMINATED)
468468
# +-----+--------------+------+------+-------------+--------+---------+------------+

0 commit comments

Comments
 (0)