From 53c85b994a606dd21f43299cb166970d6ac98571 Mon Sep 17 00:00:00 2001 From: zuppif Date: Sat, 28 Mar 2020 09:15:10 +0100 Subject: [PATCH 1/2] improve doc for mask rcnn - add small description on the labels classes --- intermediate_source/torchvision_tutorial.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/intermediate_source/torchvision_tutorial.rst b/intermediate_source/torchvision_tutorial.rst index c82b8097e93..596a20bcdc0 100644 --- a/intermediate_source/torchvision_tutorial.rst +++ b/intermediate_source/torchvision_tutorial.rst @@ -32,7 +32,7 @@ should return: - ``boxes (FloatTensor[N, 4])``: the coordinates of the ``N`` bounding boxes in ``[x0, y0, x1, y1]`` format, ranging from ``0`` to ``W`` and ``0`` to ``H`` - - ``labels (Int64Tensor[N])``: the label for each bounding box + - ``labels (Int64Tensor[N])``: the label for each bounding box. ``0`` represents always the background class. - ``image_id (Int64Tensor[1])``: an image identifier. It should be unique between all the images in the dataset, and is used during evaluation @@ -56,6 +56,8 @@ If your model returns the above methods, they will make it work for both training and evaluation, and will use the evaluation scripts from ``pycocotools``. +One note on the ``labels``. The model expects class ``0`` to be always the background. If your dataset does not contain the background class, you should not have ``0`` in your ``labels``. For example, assuming you have just two classes, *cat* and *dog*, you can define ``1`` (not ``0``) to represent *cats* and ``2`` to represent *dogs*. If in your image you have booth classes, your ``labels`` tensor should look like ``[1,2]``. + Additionally, if you want to use aspect ratio grouping during training (so that each batch only contains images with similar aspect ratio), then it is recommended to also implement a ``get_height_and_width`` From 6f8606e3250fd6cdb042a1677c47751e8384e132 Mon Sep 17 00:00:00 2001 From: zuppif Date: Sat, 28 Mar 2020 09:18:06 +0100 Subject: [PATCH 2/2] minor changes --- intermediate_source/torchvision_tutorial.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/intermediate_source/torchvision_tutorial.rst b/intermediate_source/torchvision_tutorial.rst index 596a20bcdc0..93fcfd3d247 100644 --- a/intermediate_source/torchvision_tutorial.rst +++ b/intermediate_source/torchvision_tutorial.rst @@ -56,7 +56,7 @@ If your model returns the above methods, they will make it work for both training and evaluation, and will use the evaluation scripts from ``pycocotools``. -One note on the ``labels``. The model expects class ``0`` to be always the background. If your dataset does not contain the background class, you should not have ``0`` in your ``labels``. For example, assuming you have just two classes, *cat* and *dog*, you can define ``1`` (not ``0``) to represent *cats* and ``2`` to represent *dogs*. If in your image you have booth classes, your ``labels`` tensor should look like ``[1,2]``. +One note on the ``labels``. The model considers class ``0`` as background. If your dataset does not contain the background class, you should not have ``0`` in your ``labels``. For example, assuming you have just two classes, *cat* and *dog*, you can define ``1`` (not ``0``) to represent *cats* and ``2`` to represent *dogs*. So, for instance, if one of the images has booth classes, your ``labels`` tensor should look like ``[1,2]``. Additionally, if you want to use aspect ratio grouping during training (so that each batch only contains images with similar aspect ratio),