diff --git a/beginner_source/maskedtensor_safe_softmax.rst b/beginner_source/maskedtensor_safe_softmax.rst new file mode 100644 index 00000000000..0f34b479caf --- /dev/null +++ b/beginner_source/maskedtensor_safe_softmax.rst @@ -0,0 +1,33 @@ +Safe Softmax +------------ + +One of the issues that frequently comes up is the necessity for a safe softmax -- that is, if there is an entire +batch that is "masked out" or consists entirely of padding (which, in the softmax case, translates to being set `-inf`), +then this will result in NaNs, which can leading to training divergence. For more detail on why this functionality +is necessary, please find refer to +`Issue 55056 - Feature Request for Safe Softmax `__. + +Luckily, :class:`MaskedTensor` has solved this issue: + + >>> data = torch.randn(3, 3) + >>> mask = torch.tensor([[True, False, False], [True, False, True], [False, False, False]]) + >>> x = data.masked_fill(~mask, float('-inf')) + >>> mt = masked_tensor(data, mask) + +PyTorch result: + + >>> x.softmax(0) + tensor([[0.3548, nan, 0.0000], + [0.6452, nan, 1.0000], + [0.0000, nan, 0.0000]]) + +:class:`MaskedTensor` result: + + >>> mt.softmax(0) + MaskedTensor( + [ + [ 0.3548, --, --], + [ 0.6452, --, 1.0000], + [ --, --, --] + ] + ) diff --git a/index.rst b/index.rst index df01394eded..bb7a019c83b 100644 --- a/index.rst +++ b/index.rst @@ -813,6 +813,7 @@ Additional Resources beginner/maskedtensor_overview beginner/maskedtensor_sparsity beginner/maskedtensor_distinguish_gradient + beginner/maskedtensor_safe_softmax .. toctree::