Skip to content

Commit 35e2cb0

Browse files
authored
Ported torchvision detection tutorial into sphinx gallery format (#2540)
* Port torchvision tutorial to gallery and remove previous files * Updated torchvision_tutorial.py content: links, irrelevant sections etc * Few more rendering fixes and reduced num epochs to run less than 3 minutes
1 parent fca99dd commit 35e2cb0

File tree

9 files changed

+57
-673
lines changed

9 files changed

+57
-673
lines changed

Makefile

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,9 @@ download:
8686
wget -nv -N https://www.manythings.org/anki/deu-eng.zip -P $(DATADIR)
8787
unzip -o $(DATADIR)/deu-eng.zip -d beginner_source/data/
8888

89+
# Download PennFudanPed dataset for intermediate_source/torchvision_tutorial.py
90+
wget https://www.cis.upenn.edu/~jshi/ped_html/PennFudanPed.zip -P $(DATADIR)
91+
unzip -o $(DATADIR)/PennFudanPed.zip -d intermediate_source/data/
8992

9093
docs:
9194
make download
@@ -103,3 +106,5 @@ html-noplot:
103106
clean-cache:
104107
make clean
105108
rm -rf advanced beginner intermediate recipes
109+
# remove additional python files downloaded for torchvision_tutorial.py
110+
rm -rf intermediate_source/engine.py intermediate_source/utils.py intermediate_source/transforms.py intermediate_source/coco_eval.py intermediate_source/coco_utils.py
-612 KB
Binary file not shown.
-12.4 KB
Binary file not shown.
-418 KB
Binary file not shown.
-849 KB
Binary file not shown.

en-wordlist.txt

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ RRef
1616
OOM
1717
subfolder
1818
Dialogs
19+
PennFudan
1920
performant
2021
multithreading
2122
linearities
@@ -36,6 +37,8 @@ breakpoint
3637
MobileNet
3738
DeepLabV
3839
Resampling
40+
RCNN
41+
RPN
3942
APIs
4043
ATen
4144
AVX
@@ -145,6 +148,7 @@ LRSchedulers
145148
Lua
146149
Luong
147150
macos
151+
mAP
148152
MLP
149153
MLPs
150154
MNIST
@@ -178,10 +182,12 @@ OU
178182
PIL
179183
PPO
180184
Plotly
185+
pre
181186
Prec
182187
Profiler
183188
PyTorch's
184189
RGB
190+
RGBA
185191
RL
186192
RNN
187193
RNNs
@@ -345,6 +351,7 @@ jit
345351
jitter
346352
jpg
347353
judgements
354+
keypoint
348355
kwargs
349356
labelled
350357
learnable
@@ -425,6 +432,7 @@ reinitializes
425432
relu
426433
reproducibility
427434
rescale
435+
rescaling
428436
resnet
429437
restride
430438
rewinded
@@ -476,10 +484,12 @@ torchscriptable
476484
torchtext
477485
torchtext's
478486
torchvision
487+
TorchVision
479488
torchviz
480489
traceback
481490
tradeoff
482491
tradeoffs
492+
uint
483493
uncomment
484494
uncommented
485495
underflowing

_static/tv-training-code.py renamed to intermediate_source/torchvision_tutorial.py

Lines changed: 41 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,10 @@
66

77
######################################################################
88
#
9-
# .. tip::
10-
#
11-
# To get the most of this tutorial, we suggest using this
12-
# `Colab Version <https://colab.research.google.com/github/pytorch/tutorials/blob/gh-pages/_downloads/torchvision_finetuning_instance_segmentation.ipynb>`__.
13-
# This will allow you to experiment with the information presented below.
14-
#
15-
#
169
# For this tutorial, we will be finetuning a pre-trained `Mask
17-
# R-CNN <https://arxiv.org/abs/1703.06870>`__ model on the `Penn-Fudan
10+
# R-CNN <https://arxiv.org/abs/1703.06870>`_ model on the `Penn-Fudan
1811
# Database for Pedestrian Detection and
19-
# Segmentation <https://www.cis.upenn.edu/~jshi/ped_html/>`__. It contains
12+
# Segmentation <https://www.cis.upenn.edu/~jshi/ped_html/>`_. It contains
2013
# 170 images with 345 instances of pedestrians, and we will use it to
2114
# illustrate how to use the new features in torchvision in order to train
2215
# an object detection and instance segmentation model on a custom dataset.
@@ -35,7 +28,7 @@
3528
# The reference scripts for training object detection, instance
3629
# segmentation and person keypoint detection allows for easily supporting
3730
# adding new custom datasets. The dataset should inherit from the standard
38-
# ``torch.utils.data.Dataset`` class, and implement ``__len__`` and
31+
# :class:`torch.utils.data.Dataset` class, and implement ``__len__`` and
3932
# ``__getitem__``.
4033
#
4134
# The only specificity that we require is that the dataset ``__getitem__``
@@ -65,7 +58,7 @@
6558
# ``pycocotools`` which can be installed with ``pip install pycocotools``.
6659
#
6760
# .. note ::
68-
# For Windows, please install ``pycocotools`` from `gautamchitnis <https://github.com/gautamchitnis/cocoapi>`__ with command
61+
# For Windows, please install ``pycocotools`` from `gautamchitnis <https://github.com/gautamchitnis/cocoapi>`_ with command
6962
#
7063
# ``pip install git+https://github.com/gautamchitnis/cocoapi.git@cocodataset-master#subdirectory=PythonAPI``
7164
#
@@ -85,10 +78,16 @@
8578
# Writing a custom dataset for PennFudan
8679
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8780
#
88-
# Let’s write a dataset for the PennFudan dataset. After `downloading and
89-
# extracting the zip
90-
# file <https://www.cis.upenn.edu/~jshi/ped_html/PennFudanPed.zip>`__, we
91-
# have the following folder structure:
81+
# Let’s write a dataset for the PennFudan dataset. First, let's download the dataset and
82+
# extract the `zip file <https://www.cis.upenn.edu/~jshi/ped_html/PennFudanPed.zip>`_:
83+
#
84+
# .. code:: python
85+
#
86+
# wget https://www.cis.upenn.edu/~jshi/ped_html/PennFudanPed.zip -P data
87+
# cd data && unzip PennFudanPed.zip
88+
#
89+
#
90+
# We have the following folder structure:
9291
#
9392
# ::
9493
#
@@ -106,21 +105,33 @@
106105
# FudanPed00004.png
107106
#
108107
# Here is one example of a pair of images and segmentation masks
109-
#
110-
# .. image:: ../../_static/img/tv_tutorial/tv_image01.png
111-
#
112-
# .. image:: ../../_static/img/tv_tutorial/tv_image02.png
113-
#
108+
109+
import matplotlib.pyplot as plt
110+
from torchvision.io import read_image
111+
112+
113+
image = read_image("data/PennFudanPed/PNGImages/FudanPed00046.png")
114+
mask = read_image("data/PennFudanPed/PedMasks/FudanPed00046_mask.png")
115+
116+
plt.figure(figsize=(16, 8))
117+
plt.subplot(121)
118+
plt.title("Image")
119+
plt.imshow(image.permute(1, 2, 0))
120+
plt.subplot(122)
121+
plt.title("Mask")
122+
plt.imshow(mask.permute(1, 2, 0))
123+
124+
######################################################################
114125
# So each image has a corresponding
115126
# segmentation mask, where each color correspond to a different instance.
116127
# Let’s write a :class:`torch.utils.data.Dataset` class for this dataset.
117128
# In the code below, we are wrapping images, bounding boxes and masks into
118-
# ``torchvision.TVTensor`` classes so that we will be able to apply torchvision
129+
# :class:`torchvision.tv_tensors.TVTensor` classes so that we will be able to apply torchvision
119130
# built-in transformations (`new Transforms API <https://pytorch.org/vision/stable/transforms.html>`_)
120131
# for the given object detection and segmentation task.
121132
# Namely, image tensors will be wrapped by :class:`torchvision.tv_tensors.Image`, bounding boxes into
122133
# :class:`torchvision.tv_tensors.BoundingBoxes` and masks into :class:`torchvision.tv_tensors.Mask`.
123-
# As ``torchvision.TVTensor`` are :class:`torch.Tensor` subclasses, wrapped objects are also tensors and inherit the plain
134+
# As :class:`torchvision.tv_tensors.TVTensor` are :class:`torch.Tensor` subclasses, wrapped objects are also tensors and inherit the plain
124135
# :class:`torch.Tensor` API. For more information about torchvision ``tv_tensors`` see
125136
# `this documentation <https://pytorch.org/vision/main/auto_examples/transforms/plot_transforms_getting_started.html#what-are-tvtensors>`_.
126137

@@ -196,8 +207,8 @@ def __len__(self):
196207
# -------------------
197208
#
198209
# In this tutorial, we will be using `Mask
199-
# R-CNN <https://arxiv.org/abs/1703.06870>`__, which is based on top of
200-
# `Faster R-CNN <https://arxiv.org/abs/1506.01497>`__. Faster R-CNN is a
210+
# R-CNN <https://arxiv.org/abs/1703.06870>`_, which is based on top of
211+
# `Faster R-CNN <https://arxiv.org/abs/1506.01497>`_. Faster R-CNN is a
201212
# model that predicts both bounding boxes and class scores for potential
202213
# objects in the image.
203214
#
@@ -345,6 +356,7 @@ def get_model_instance_segmentation(num_classes):
345356
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/coco_eval.py")
346357
os.system("wget https://raw.githubusercontent.com/pytorch/vision/main/references/detection/transforms.py")
347358

359+
######################################################################
348360
# Since v0.15.0 torchvision provides `new Transforms API <https://pytorch.org/vision/stable/transforms.html>`_
349361
# to easily write data augmentation pipelines for Object Detection and Segmentation tasks.
350362
#
@@ -362,7 +374,7 @@ def get_transform(train):
362374
transforms.append(T.ToPureTensor())
363375
return T.Compose(transforms)
364376

365-
377+
######################################################################
366378
# Testing ``forward()`` method (Optional)
367379
# ---------------------------------------
368380
#
@@ -455,8 +467,8 @@ def get_transform(train):
455467
gamma=0.1
456468
)
457469

458-
# let's train it for 5 epochs
459-
num_epochs = 5
470+
# let's train it just for 2 epochs
471+
num_epochs = 2
460472

461473
for epoch in range(num_epochs):
462474
# train for one epoch, printing every 10 iterations
@@ -477,14 +489,12 @@ def get_transform(train):
477489
# But what do the predictions look like? Let’s take one image in the
478490
# dataset and verify
479491
#
480-
# .. image:: ../../_static/img/tv_tutorial/tv_image05.png
481-
#
482492
import matplotlib.pyplot as plt
483493

484494
from torchvision.utils import draw_bounding_boxes, draw_segmentation_masks
485495

486496

487-
image = read_image("../_static/img/tv_tutorial/tv_image05.png")
497+
image = read_image("data/PennFudanPed/PNGImages/FudanPed00046.png")
488498
eval_transform = get_transform(train=False)
489499

490500
model.eval()
@@ -517,7 +527,7 @@ def get_transform(train):
517527
#
518528
# In this tutorial, you have learned how to create your own training
519529
# pipeline for object detection models on a custom dataset. For
520-
# that, you wrote a ``torch.utils.data.Dataset`` class that returns the
530+
# that, you wrote a :class:`torch.utils.data.Dataset` class that returns the
521531
# images and the ground truth boxes and segmentation masks. You also
522532
# leveraged a Mask R-CNN model pre-trained on COCO train2017 in order to
523533
# perform transfer learning on this new dataset.
@@ -526,5 +536,3 @@ def get_transform(train):
526536
# training, check ``references/detection/train.py``, which is present in
527537
# the torchvision repository.
528538
#
529-
# You can download a full source file for this tutorial
530-
# `here <https://pytorch.org/tutorials/_static/tv-training-code.py>`__.

0 commit comments

Comments
 (0)