|
| 1 | +MaskedTensor Overview |
| 2 | +===================== |
| 3 | + |
| 4 | +This tutorial is designed to serve as a starting point for using MaskedTensors |
| 5 | +and discuss its masking semantics. |
| 6 | + |
| 7 | +Using MaskedTensor |
| 8 | +++++++++++++++++++ |
| 9 | + |
| 10 | +Construction |
| 11 | +------------ |
| 12 | + |
| 13 | +There are a few different ways to construct a MaskedTensor: |
| 14 | + |
| 15 | +* The first way is to directly invoke the MaskedTensor class |
| 16 | +* The second (and our recommended way) is to use :func:`masked.masked_tensor` and :func:`masked.as_masked_tensor` factory functions, |
| 17 | + which are analogous to :func:`torch.tensor` and :func:`torch.as_tensor` |
| 18 | + |
| 19 | + .. autosummary:: |
| 20 | + :toctree: generated |
| 21 | + :nosignatures: |
| 22 | + |
| 23 | + masked.masked_tensor |
| 24 | + masked.as_masked_tensor |
| 25 | + |
| 26 | +Accessing the data and mask |
| 27 | +--------------------------- |
| 28 | + |
| 29 | +The underlying fields in a MaskedTensor can be accessed through: |
| 30 | + |
| 31 | +* the :meth:`MaskedTensor.get_data` function |
| 32 | +* the :meth:`MaskedTensor.get_mask` function. Recall that ``True`` indicates "specified" or "valid" while ``False`` indicates |
| 33 | + "unspecified" or "invalid". |
| 34 | + |
| 35 | +In general, the underlying data that is returned may not be valid in the unspecified entries, so we recommend that |
| 36 | +when users require a Tensor without any masked entries, that they use :meth:`MaskedTensor.to_tensor` (as shown above) to |
| 37 | +return a Tensor with filled values. |
| 38 | + |
| 39 | +Indexing and slicing |
| 40 | +-------------------- |
| 41 | + |
| 42 | +:class:`MaskedTensor` is a Tensor subclass, which means that it inherits the same semantics for indexing and slicing |
| 43 | +as :class:`torch.Tensor`. Below are some examples of common indexing and slicing patterns: |
| 44 | + |
| 45 | + >>> data = torch.arange(60).reshape(3, 4, 5) |
| 46 | + >>> mask = data % 2 == 0 |
| 47 | + >>> mt = masked_tensor(data.float(), mask) |
| 48 | + >>> mt[0] |
| 49 | + MaskedTensor( |
| 50 | + [ |
| 51 | + [ 0.0000, --, 2.0000, --, 4.0000], |
| 52 | + [ --, 6.0000, --, 8.0000, --], |
| 53 | + [ 10.0000, --, 12.0000, --, 14.0000], |
| 54 | + [ --, 16.0000, --, 18.0000, --] |
| 55 | + ] |
| 56 | + ) |
| 57 | + >>> mt[[0,2]] |
| 58 | + MaskedTensor( |
| 59 | + [ |
| 60 | + [ |
| 61 | + [ 0.0000, --, 2.0000, --, 4.0000], |
| 62 | + [ --, 6.0000, --, 8.0000, --], |
| 63 | + [ 10.0000, --, 12.0000, --, 14.0000], |
| 64 | + [ --, 16.0000, --, 18.0000, --] |
| 65 | + ], |
| 66 | + [ |
| 67 | + [ 40.0000, --, 42.0000, --, 44.0000], |
| 68 | + [ --, 46.0000, --, 48.0000, --], |
| 69 | + [ 50.0000, --, 52.0000, --, 54.0000], |
| 70 | + [ --, 56.0000, --, 58.0000, --] |
| 71 | + ] |
| 72 | + ] |
| 73 | + ) |
| 74 | + >>> mt[:, :2] |
| 75 | + MaskedTensor( |
| 76 | + [ |
| 77 | + [ |
| 78 | + [ 0.0000, --, 2.0000, --, 4.0000], |
| 79 | + [ --, 6.0000, --, 8.0000, --] |
| 80 | + ], |
| 81 | + [ |
| 82 | + [ 20.0000, --, 22.0000, --, 24.0000], |
| 83 | + [ --, 26.0000, --, 28.0000, --] |
| 84 | + ], |
| 85 | + [ |
| 86 | + [ 40.0000, --, 42.0000, --, 44.0000], |
| 87 | + [ --, 46.0000, --, 48.0000, --] |
| 88 | + ] |
| 89 | + ] |
| 90 | + ) |
| 91 | + |
| 92 | +Semantics |
| 93 | ++++++++++ |
| 94 | + |
| 95 | +MaskedTensor vs NumPy's MaskedArray |
| 96 | +----------------------------------- |
| 97 | + |
| 98 | +NumPy's ``MaskedArray`` has a few fundamental semantics differences from MaskedTensor. |
| 99 | + |
| 100 | +1. Their factory function and basic definition inverts the mask (similar to ``torch.nn.MHA``); that is, MaskedTensor |
| 101 | +uses ``True`` to denote "specified" and ``False`` to denote "unspecified", or "valid"/"invalid", whereas NumPy does the |
| 102 | +opposite. |
| 103 | +2. Intersection semantics. In NumPy, if one of two elements are masked out, the resulting element will be |
| 104 | +masked out as well -- in practice, they |
| 105 | +`apply the logical_or operator <https://github.com/numpy/numpy/blob/68299575d8595d904aff6f28e12d21bf6428a4ba/numpy/ma/core.py#L1016-L1024>`__. |
| 106 | + |
| 107 | + >>> data = torch.arange(5.) |
| 108 | + >>> mask = torch.tensor([True, True, False, True, False]) |
| 109 | + >>> npm0 = np.ma.masked_array(data.numpy(), (~mask).numpy()) |
| 110 | + >>> npm1 = np.ma.masked_array(data.numpy(), (mask).numpy()) |
| 111 | + >>> npm0 |
| 112 | + masked_array(data=[0.0, 1.0, --, 3.0, --], |
| 113 | + mask=[False, False, True, False, True], |
| 114 | + fill_value=1e+20, |
| 115 | + dtype=float32) |
| 116 | + >>> npm1 |
| 117 | + masked_array(data=[--, --, 2.0, --, 4.0], |
| 118 | + mask=[ True, True, False, True, False], |
| 119 | + fill_value=1e+20, |
| 120 | + dtype=float32) |
| 121 | + >>> npm0 + npm1 |
| 122 | + masked_array(data=[--, --, --, --, --], |
| 123 | + mask=[ True, True, True, True, True], |
| 124 | + fill_value=1e+20, |
| 125 | + dtype=float32) |
| 126 | + |
| 127 | +Meanwhile, MaskedTensor does not support addition or binary operators with masks that don't match -- to understand why, |
| 128 | +please find the section on reductions. |
| 129 | + |
| 130 | + >>> mt0 = masked_tensor(data, mask) |
| 131 | + >>> mt1 = masked_tensor(data, ~mask) |
| 132 | + >>> m0 |
| 133 | + MaskedTensor( |
| 134 | + [ 0.0000, 1.0000, --, 3.0000, --] |
| 135 | + ) |
| 136 | + >>> mt0 = masked_tensor(data, mask) |
| 137 | + >>> mt1 = masked_tensor(data, ~mask) |
| 138 | + >>> mt0 |
| 139 | + MaskedTensor( |
| 140 | + [ 0.0000, 1.0000, --, 3.0000, --] |
| 141 | + ) |
| 142 | + >>> mt1 |
| 143 | + MaskedTensor( |
| 144 | + [ --, --, 2.0000, --, 4.0000] |
| 145 | + ) |
| 146 | + >>> mt0 + mt1 |
| 147 | + ValueError: Input masks must match. If you need support for this, please open an issue on Github. |
| 148 | + |
| 149 | +However, if this behavior is desired, MaskedTensor does support these semantics by giving access to the data and masks |
| 150 | +and conveniently converting a MaskedTensor to a Tensor with masked values filled in using :func:`to_tensor`. |
| 151 | + |
| 152 | + >>> t0 = mt0.to_tensor(0) |
| 153 | + >>> t1 = mt1.to_tensor(0) |
| 154 | + >>> mt2 = masked_tensor(t0 + t1, mt0.get_mask() & mt1.get_mask()) |
| 155 | + >>> t0 |
| 156 | + tensor([0., 1., 0., 3., 0.]) |
| 157 | + >>> t1 |
| 158 | + tensor([0., 0., 2., 0., 4.]) |
| 159 | + >>> mt2 |
| 160 | + MaskedTensor( |
| 161 | + [ --, --, --, --, --] |
| 162 | + |
| 163 | +.. _reduction-semantics: |
| 164 | + |
| 165 | +Reduction semantics |
| 166 | +------------------- |
| 167 | + |
| 168 | +The basis for reduction semantics `has been documented and discussed at length <https://github.com/pytorch/rfcs/pull/27>`__, |
| 169 | +but again, by way of example: |
| 170 | + |
| 171 | + >>> data = torch.arange(12, dtype=torch.float).reshape(3, 4) |
| 172 | + >>> mask = torch.randint(2, (3, 4), dtype=torch.bool) |
| 173 | + >>> mt = masked_tensor(data, mask) |
| 174 | + >>> mt |
| 175 | + MaskedTensor( |
| 176 | + [ |
| 177 | + [ --, 1.0000, --, --], |
| 178 | + [ --, 5.0000, 6.0000, 7.0000], |
| 179 | + [ 8.0000, 9.0000, --, 11.0000] |
| 180 | + ] |
| 181 | + ) |
| 182 | + |
| 183 | + >>> torch.sum(mt, 1) |
| 184 | + MaskedTensor( |
| 185 | + [ 1.0000, 18.0000, 28.0000] |
| 186 | + ) |
| 187 | + >>> torch.mean(mt, 1) |
| 188 | + MaskedTensor( |
| 189 | + [ 1.0000, 6.0000, 9.3333] |
| 190 | + ) |
| 191 | + >>> torch.prod(mt, 1) |
| 192 | + MaskedTensor( |
| 193 | + [ 1.0000, 210.0000, 792.0000] |
| 194 | + ) |
| 195 | + >>> torch.amin(mt, 1) |
| 196 | + MaskedTensor( |
| 197 | + [ 1.0000, 5.0000, 8.0000] |
| 198 | + ) |
| 199 | + >>> torch.amax(mt, 1) |
| 200 | + MaskedTensor( |
| 201 | + [ 1.0000, 7.0000, 11.0000] |
| 202 | + ) |
| 203 | + |
| 204 | +Now we can revisit the question: why do we enforce the invariant that masks must match for binary operators? |
| 205 | +In other words, why don't we use the same semantics as ``np.ma.masked_array``? Consider the following example: |
| 206 | + |
| 207 | + >>> data0 = torch.arange(10.).reshape(2, 5) |
| 208 | + >>> data1 = torch.arange(10.).reshape(2, 5) + 10 |
| 209 | + >>> mask0 = torch.tensor([[True, True, False, False, False], [False, False, False, True, True]]) |
| 210 | + >>> mask1 = torch.tensor([[False, False, False, True, True], [True, True, False, False, False]]) |
| 211 | + |
| 212 | + >>> npm0 = np.ma.masked_array(data0.numpy(), (mask0).numpy()) |
| 213 | + >>> npm1 = np.ma.masked_array(data1.numpy(), (mask1).numpy()) |
| 214 | + >>> npm0 |
| 215 | + masked_array( |
| 216 | + data=[[--, --, 2.0, 3.0, 4.0], |
| 217 | + [5.0, 6.0, 7.0, --, --]], |
| 218 | + mask=[[ True, True, False, False, False], |
| 219 | + [False, False, False, True, True]], |
| 220 | + fill_value=1e+20, |
| 221 | + dtype=float32) |
| 222 | + >>> npm1 |
| 223 | + masked_array( |
| 224 | + data=[[10.0, 11.0, 12.0, --, --], |
| 225 | + [--, --, 17.0, 18.0, 19.0]], |
| 226 | + mask=[[False, False, False, True, True], |
| 227 | + [ True, True, False, False, False]], |
| 228 | + fill_value=1e+20, |
| 229 | + dtype=float32) |
| 230 | + >>> (npm0 + npm1).sum(0) |
| 231 | + masked_array(data=[--, --, 38.0, --, --], |
| 232 | + mask=[ True, True, False, True, True], |
| 233 | + fill_value=1e+20, |
| 234 | + dtype=float32) |
| 235 | + >>> npm0.sum(0) + npm1.sum(0) |
| 236 | + masked_array(data=[15.0, 17.0, 38.0, 21.0, 23.0], |
| 237 | + mask=[False, False, False, False, False], |
| 238 | + fill_value=1e+20, |
| 239 | + dtype=float32) |
| 240 | + |
| 241 | +Sum and addition should clearly be associative, but with NumPy's semantics, they are allowed to not be, |
| 242 | +which can certainly be confusing for the user. That being said, if the user wishes, there are ways around this |
| 243 | +(e.g. filling in the MaskedTensor's undefined elements with 0 values using :func:`to_tensor` as shown in a previous |
| 244 | +example), but the user must now be more explicit with their intentions. |
0 commit comments