@@ -12,7 +12,7 @@ Database for Pedestrian Detection and
12
12
Segmentation <https://www.cis.upenn.edu/~jshi/ped_html/> `__. It contains
13
13
170 images with 345 instances of pedestrians, and we will use it to
14
14
illustrate how to use the new features in torchvision in order to train
15
- an object detection model on a custom dataset.
15
+ an object detection and instance segmentation model on a custom dataset.
16
16
17
17
Defining the Dataset
18
18
--------------------
@@ -26,22 +26,23 @@ adding new custom datasets. The dataset should inherit from the standard
26
26
The only specificity that we require is that the dataset ``__getitem__ ``
27
27
should return a tuple:
28
28
29
- - image: `` torchvision.datapoints.Image[3, H, W] `` or a PIL Image of size ``(H, W) ``
29
+ - image: :class: ` torchvision.datapoints.Image ` of shape `` [3, H, W] `` or a PIL Image of size ``(H, W) ``
30
30
- target: a dict containing the following fields
31
31
32
- - ``boxes ( torchvision.datapoints.BoundingBoxes[N, 4]) ``: the coordinates of the ``N ``
33
- bounding boxes in ``[x0, y0, x1, y1] `` format, ranging from ``0 ``
32
+ - ``boxes ``, :class: ` torchvision.datapoints.BoundingBoxes ` of shape ``[N, 4] ``:
33
+ the coordinates of the `` N `` bounding boxes in ``[x0, y0, x1, y1] `` format, ranging from ``0 ``
34
34
to ``W `` and ``0 `` to ``H ``
35
- - ``labels (Int64Tensor[N]) ``: the label for each bounding box. ``0 `` represents always the background class.
36
- - ``image_id (int) ``: an image identifier. It should be
35
+ - ``labels ``, integer :class: `torch.Tensor ` of shape ``[N] ``: the label for each bounding box.
36
+ ``0 `` represents always the background class.
37
+ - ``image_id ``, int: an image identifier. It should be
37
38
unique between all the images in the dataset, and is used during
38
39
evaluation
39
- - ``area (Float32Tensor [N]) ``: The area of the bounding box. This is used
40
+ - ``area ``, float :class: ` torch.Tensor ` of shape `` [N] ``: the area of the bounding box. This is used
40
41
during evaluation with the COCO metric, to separate the metric
41
42
scores between small, medium and large boxes.
42
- - ``iscrowd (UInt8Tensor [N]) ``: instances with iscrowd=True will be
43
+ - ``iscrowd ``, uint8 :class: ` torch.Tensor ` of shape `` [N] ``: instances with iscrowd=True will be
43
44
ignored during evaluation.
44
- - (optionally) ``masks ( torchvision.datapoints.Mask[N, H, W]) ``: The segmentation
45
+ - (optionally) ``masks ``, :class: ` torchvision.datapoints.Mask ` of shape `` [N, H, W] ``: the segmentation
45
46
masks for each one of the objects
46
47
47
48
If your dataset is compliant with above requirements then it will work for both
@@ -97,12 +98,16 @@ Here is one example of a pair of images and segmentation masks
97
98
98
99
So each image has a corresponding
99
100
segmentation mask, where each color correspond to a different instance.
100
- Let’s write a `` torch.utils.data.Dataset ` ` class for this dataset.
101
+ Let’s write a :class: ` torch.utils.data.Dataset ` class for this dataset.
101
102
In the code below, we are wrapping images, bounding boxes and masks into
102
- ``torchvision.datapoints `` structures so that we will be able to apply torchvision
103
+ ``torchvision.datapoints `` classes so that we will be able to apply torchvision
103
104
built-in transformations (`new Transforms API <https://pytorch.org/vision/stable/transforms.html >`_)
104
- that cover the object detection and segmentation tasks.
105
- For more information about torchvision datapoints see `this documentation <https://pytorch.org/vision/stable/datapoints.html >`_.
105
+ for the given object detection and segmentation task.
106
+ Namely, image tensors will be wrapped by :class: `torchvision.datapoints.Image `, bounding boxes into
107
+ :class: `torchvision.datapoints.BoundingBoxes ` and masks into :class: `torchvision.datapoints.Mask `.
108
+ As datapoints are :class: `torch.Tensor ` subclasses, wrapped objects are also tensors and inherit plain
109
+ :class: `torch.Tensor ` API. For more information about torchvision datapoints see
110
+ `this documentation <https://pytorch.org/vision/main/auto_examples/v2_transforms/plot_transforms_v2.html#sphx-glr-auto-examples-v2-transforms-plot-transforms-v2-py >`_.
106
111
107
112
.. code :: python
108
113
@@ -264,8 +269,8 @@ way of doing it:
264
269
rpn_anchor_generator = anchor_generator,
265
270
box_roi_pool = roi_pooler)
266
271
267
- Object detection model for PennFudan Dataset
268
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
272
+ Object detection and instance segmentation model for PennFudan Dataset
273
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
269
274
270
275
In our case, we want to finetune from a pre-trained model, given that
271
276
our dataset is very small, so we will be following approach number 1.
0 commit comments