Skip to content

Commit 33fc9d9

Browse files
committed
[maskedtensor] Add safe softmax tutorial
ghstack-source-id: 1e75739 Pull Request resolved: #2045
1 parent 7c643ad commit 33fc9d9

File tree

2 files changed

+34
-0
lines changed

2 files changed

+34
-0
lines changed
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
Safe Softmax
2+
------------
3+
4+
One of the issues that frequently comes up is the necessity for a safe softmax -- that is, if there is an entire
5+
batch that is "masked out" or consists entirely of padding (which, in the softmax case, translates to being set `-inf`),
6+
then this will result in NaNs, which can leading to training divergence. For more detail on why this functionality
7+
is necessary, please find refer to
8+
`Issue 55056 - Feature Request for Safe Softmax <https://github.com/pytorch/pytorch/issues/55056>`__.
9+
10+
Luckily, :class:`MaskedTensor` has solved this issue:
11+
12+
>>> data = torch.randn(3, 3)
13+
>>> mask = torch.tensor([[True, False, False], [True, False, True], [False, False, False]])
14+
>>> x = data.masked_fill(~mask, float('-inf'))
15+
>>> mt = masked_tensor(data, mask)
16+
17+
PyTorch result:
18+
19+
>>> x.softmax(0)
20+
tensor([[0.3548, nan, 0.0000],
21+
[0.6452, nan, 1.0000],
22+
[0.0000, nan, 0.0000]])
23+
24+
:class:`MaskedTensor` result:
25+
26+
>>> mt.softmax(0)
27+
MaskedTensor(
28+
[
29+
[ 0.3548, --, --],
30+
[ 0.6452, --, 1.0000],
31+
[ --, --, --]
32+
]
33+
)

index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -813,6 +813,7 @@ Additional Resources
813813
beginner/maskedtensor_overview
814814
beginner/maskedtensor_sparsity
815815
beginner/maskedtensor_distinguish_gradient
816+
beginner/maskedtensor_safe_softmax
816817

817818

818819
.. toctree::

0 commit comments

Comments
 (0)