Skip to content

Commit 25efd19

Browse files
committed
Addressed review comments
1 parent 8437043 commit 25efd19

File tree

1 file changed

+20
-15
lines changed

1 file changed

+20
-15
lines changed

intermediate_source/torchvision_tutorial.rst

Lines changed: 20 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Database for Pedestrian Detection and
1212
Segmentation <https://www.cis.upenn.edu/~jshi/ped_html/>`__. It contains
1313
170 images with 345 instances of pedestrians, and we will use it to
1414
illustrate how to use the new features in torchvision in order to train
15-
an object detection model on a custom dataset.
15+
an object detection and instance segmentation model on a custom dataset.
1616

1717
Defining the Dataset
1818
--------------------
@@ -26,22 +26,23 @@ adding new custom datasets. The dataset should inherit from the standard
2626
The only specificity that we require is that the dataset ``__getitem__``
2727
should return a tuple:
2828

29-
- image: ``torchvision.datapoints.Image[3, H, W]`` or a PIL Image of size ``(H, W)``
29+
- image: :class:`torchvision.datapoints.Image` of shape ``[3, H, W]`` or a PIL Image of size ``(H, W)``
3030
- target: a dict containing the following fields
3131

32-
- ``boxes (torchvision.datapoints.BoundingBoxes[N, 4])``: the coordinates of the ``N``
33-
bounding boxes in ``[x0, y0, x1, y1]`` format, ranging from ``0``
32+
- ``boxes``, :class:`torchvision.datapoints.BoundingBoxes` of shape ``[N, 4]``:
33+
the coordinates of the ``N`` bounding boxes in ``[x0, y0, x1, y1]`` format, ranging from ``0``
3434
to ``W`` and ``0`` to ``H``
35-
- ``labels (Int64Tensor[N])``: the label for each bounding box. ``0`` represents always the background class.
36-
- ``image_id (int)``: an image identifier. It should be
35+
- ``labels``, integer :class:`torch.Tensor` of shape ``[N]``: the label for each bounding box.
36+
``0`` represents always the background class.
37+
- ``image_id``, int: an image identifier. It should be
3738
unique between all the images in the dataset, and is used during
3839
evaluation
39-
- ``area (Float32Tensor[N])``: The area of the bounding box. This is used
40+
- ``area``, float :class:`torch.Tensor` of shape ``[N]``: the area of the bounding box. This is used
4041
during evaluation with the COCO metric, to separate the metric
4142
scores between small, medium and large boxes.
42-
- ``iscrowd (UInt8Tensor[N])``: instances with iscrowd=True will be
43+
- ``iscrowd``, uint8 :class:`torch.Tensor` of shape ``[N]``: instances with iscrowd=True will be
4344
ignored during evaluation.
44-
- (optionally) ``masks (torchvision.datapoints.Mask[N, H, W])``: The segmentation
45+
- (optionally) ``masks``, :class:`torchvision.datapoints.Mask` of shape ``[N, H, W]``: the segmentation
4546
masks for each one of the objects
4647

4748
If your dataset is compliant with above requirements then it will work for both
@@ -97,12 +98,16 @@ Here is one example of a pair of images and segmentation masks
9798

9899
So each image has a corresponding
99100
segmentation mask, where each color correspond to a different instance.
100-
Let’s write a ``torch.utils.data.Dataset`` class for this dataset.
101+
Let’s write a :class:`torch.utils.data.Dataset` class for this dataset.
101102
In the code below, we are wrapping images, bounding boxes and masks into
102-
``torchvision.datapoints`` structures so that we will be able to apply torchvision
103+
``torchvision.datapoints`` classes so that we will be able to apply torchvision
103104
built-in transformations (`new Transforms API <https://pytorch.org/vision/stable/transforms.html>`_)
104-
that cover the object detection and segmentation tasks.
105-
For more information about torchvision datapoints see `this documentation <https://pytorch.org/vision/stable/datapoints.html>`_.
105+
for the given object detection and segmentation task.
106+
Namely, image tensors will be wrapped by :class:`torchvision.datapoints.Image`, bounding boxes into
107+
:class:`torchvision.datapoints.BoundingBoxes` and masks into :class:`torchvision.datapoints.Mask`.
108+
As datapoints are :class:`torch.Tensor` subclasses, wrapped objects are also tensors and inherit plain
109+
:class:`torch.Tensor` API. For more information about torchvision datapoints see
110+
`this documentation <https://pytorch.org/vision/main/auto_examples/v2_transforms/plot_transforms_v2.html#sphx-glr-auto-examples-v2-transforms-plot-transforms-v2-py>`_.
106111

107112
.. code:: python
108113
@@ -264,8 +269,8 @@ way of doing it:
264269
rpn_anchor_generator=anchor_generator,
265270
box_roi_pool=roi_pooler)
266271
267-
Object detection model for PennFudan Dataset
268-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
272+
Object detection and instance segmentation model for PennFudan Dataset
273+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
269274

270275
In our case, we want to finetune from a pre-trained model, given that
271276
our dataset is very small, so we will be following approach number 1.

0 commit comments

Comments
 (0)