vulkan: fix NaN issue in flash attention shader #12776

jeffbolznv · 2025-04-06T06:33:09Z

I was seeing corruption when using llama-3 with flash attention (started with #12627.):

llama-cli -no-cnv -p "The Peninsular War (1807–1814) was fought in the Iberian Peninsula by Portugal, Spain and the United Kingdom against the invading and occupying forces of the First French Empire during the Napoleonic Wars." -c 2048 -n 150 --ignore-eos -ngl 99 -fa -m C:\models\meta-llama-3-8b-instruct.Q4_K_M.gguf
...
The Peninsular War (1807�1814) was fought in the Iberian Peninsula by Portugal, Spain and the United Kingdom against the invading and occupying forces of the First French Empire during the Napoleonic Wars. TheGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

The corruption was caused by NaNs. Use -FLT_MAX/2 rather than -inf as the initial value for computing the maximum. This is consistent with what the cuda shader does.

Use -FLT_MAX/2 rather than -inf as the initial value for computing the maximum.

0cc4m

I can confirm that this fixes it.

Use -FLT_MAX/2 rather than -inf as the initial value for computing the maximum.

vulkan: fix NaN issue in flash attention shader

ee66c16

Use -FLT_MAX/2 rather than -inf as the initial value for computing the maximum.

jeffbolznv requested a review from 0cc4m April 6, 2025 06:33

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Apr 6, 2025

0cc4m approved these changes Apr 6, 2025

View reviewed changes

0cc4m merged commit 0c74b04 into ggml-org:master Apr 6, 2025
47 checks passed

colout pushed a commit to colout/llama.cpp that referenced this pull request Apr 29, 2025

vulkan: fix NaN issue in flash attention shader (ggml-org#12776)

b213120

Use -FLT_MAX/2 rather than -inf as the initial value for computing the maximum.

timwu pushed a commit to timwu/llama.cpp that referenced this pull request May 5, 2025

vulkan: fix NaN issue in flash attention shader (ggml-org#12776)

e26ea37

Use -FLT_MAX/2 rather than -inf as the initial value for computing the maximum.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: fix NaN issue in flash attention shader #12776

vulkan: fix NaN issue in flash attention shader #12776

Uh oh!

jeffbolznv commented Apr 6, 2025

Uh oh!

0cc4m left a comment

Uh oh!

Uh oh!

Uh oh!

vulkan: fix NaN issue in flash attention shader #12776

vulkan: fix NaN issue in flash attention shader #12776

Uh oh!

Conversation

jeffbolznv commented Apr 6, 2025

Uh oh!

0cc4m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!