Skip to content

Commit 111843b

Browse files
Fix wording
1 parent 71c1bac commit 111843b

File tree

1 file changed

+6
-4
lines changed

1 file changed

+6
-4
lines changed

intermediate_source/transformer_building_blocks.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -570,14 +570,16 @@ def forward(self, x):
570570
print(f"Total sequence length in nested key/value {kv_len.sum().item()}, max sequence length {kv_len.max().item()}")
571571
out = new_mha_layer(query, key, value, is_causal=False)
572572

573+
# TODO: anything else I can add here?
574+
573575
################################################################################
574576
# Fully masked rows no longer cause NaNs
575577
# --------------------------------------
576578
#
577-
# There has been a long standing issue with ``nn.MultiheadAttention`` where if a row was
578-
# fully masked by the key_padding_mask, the output of the attention layer would be NaN
579-
# See `issue <https://github.com/pytorch/pytorch/issues/41508>`_. This is because
580-
# the softmax operation would divide by zero.
579+
# There has been a long standing issue with ``nn.MultiheadAttention`` and
580+
# ``scaled_dot_product_attention`` where if a row was fully masked, the output
581+
# of the attention layer would be NaN. See `issue <https://github.com/pytorch/pytorch/issues/41508>`_.
582+
# This is because the softmax operation would divide by zero.
581583
#
582584
# Thanks to `this PR <https://github.com/pytorch/pytorch/pull/133882>`_
583585
# this is no longer the case. Instead, fully masked rows will be set to zero. More

0 commit comments

Comments
 (0)