From 53c85b994a606dd21f43299cb166970d6ac98571 Mon Sep 17 00:00:00 2001
From: zuppif <francesco.zuppichini@gmail.com>
Date: Sat, 28 Mar 2020 09:15:10 +0100
Subject: [PATCH 1/2] improve doc for mask rcnn

- add small description on the labels classes
---
 intermediate_source/torchvision_tutorial.rst | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/intermediate_source/torchvision_tutorial.rst b/intermediate_source/torchvision_tutorial.rst
index c82b8097e93..596a20bcdc0 100644
--- a/intermediate_source/torchvision_tutorial.rst
+++ b/intermediate_source/torchvision_tutorial.rst
@@ -32,7 +32,7 @@ should return:
    -  ``boxes (FloatTensor[N, 4])``: the coordinates of the ``N``
       bounding boxes in ``[x0, y0, x1, y1]`` format, ranging from ``0``
       to ``W`` and ``0`` to ``H``
-   -  ``labels (Int64Tensor[N])``: the label for each bounding box
+   -  ``labels (Int64Tensor[N])``: the label for each bounding box. ``0`` represents always the background class.
    -  ``image_id (Int64Tensor[1])``: an image identifier. It should be
       unique between all the images in the dataset, and is used during
       evaluation
@@ -56,6 +56,8 @@ If your model returns the above methods, they will make it work for both
 training and evaluation, and will use the evaluation scripts from
 ``pycocotools``.
 
+One note on the ``labels``. The model expects class ``0`` to be always the background. If your dataset does not contain the background class, you should not have ``0`` in your ``labels``. For example, assuming you have just two classes, *cat* and *dog*, you can define ``1`` (not ``0``) to represent *cats* and ``2`` to represent *dogs*. If in your image you have booth classes, your ``labels`` tensor should look like ``[1,2]``.
+
 Additionally, if you want to use aspect ratio grouping during training
 (so that each batch only contains images with similar aspect ratio),
 then it is recommended to also implement a ``get_height_and_width``

From 6f8606e3250fd6cdb042a1677c47751e8384e132 Mon Sep 17 00:00:00 2001
From: zuppif <francesco.zuppichini@gmail.com>
Date: Sat, 28 Mar 2020 09:18:06 +0100
Subject: [PATCH 2/2] minor changes

---
 intermediate_source/torchvision_tutorial.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/intermediate_source/torchvision_tutorial.rst b/intermediate_source/torchvision_tutorial.rst
index 596a20bcdc0..93fcfd3d247 100644
--- a/intermediate_source/torchvision_tutorial.rst
+++ b/intermediate_source/torchvision_tutorial.rst
@@ -56,7 +56,7 @@ If your model returns the above methods, they will make it work for both
 training and evaluation, and will use the evaluation scripts from
 ``pycocotools``.
 
-One note on the ``labels``. The model expects class ``0`` to be always the background. If your dataset does not contain the background class, you should not have ``0`` in your ``labels``. For example, assuming you have just two classes, *cat* and *dog*, you can define ``1`` (not ``0``) to represent *cats* and ``2`` to represent *dogs*. If in your image you have booth classes, your ``labels`` tensor should look like ``[1,2]``.
+One note on the ``labels``. The model considers class ``0`` as background. If your dataset does not contain the background class, you should not have ``0`` in your ``labels``. For example, assuming you have just two classes, *cat* and *dog*, you can define ``1`` (not ``0``) to represent *cats* and ``2`` to represent *dogs*. So, for instance, if one of the images has booth classes, your ``labels`` tensor should look like ``[1,2]``.
 
 Additionally, if you want to use aspect ratio grouping during training
 (so that each batch only contains images with similar aspect ratio),