Skip to content

vulkan: fix NaN issue in flash attention shader #12776

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 6, 2025

Conversation

jeffbolznv
Copy link
Collaborator

I was seeing corruption when using llama-3 with flash attention (started with #12627.):

llama-cli -no-cnv -p "The Peninsular War (1807–1814) was fought in the Iberian Peninsula by Portugal, Spain and the United Kingdom against the invading and occupying forces of the First French Empire during the Napoleonic Wars." -c 2048 -n 150 --ignore-eos -ngl 99 -fa -m C:\models\meta-llama-3-8b-instruct.Q4_K_M.gguf
...
The Peninsular War (1807�1814) was fought in the Iberian Peninsula by Portugal, Spain and the United Kingdom against the invading and occupying forces of the First French Empire during the Napoleonic Wars. TheGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

The corruption was caused by NaNs. Use -FLT_MAX/2 rather than -inf as the initial value for computing the maximum. This is consistent with what the cuda shader does.

Use -FLT_MAX/2 rather than -inf as the initial value for computing the maximum.
@jeffbolznv jeffbolznv requested a review from 0cc4m April 6, 2025 06:33
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Apr 6, 2025
Copy link
Collaborator

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that this fixes it.

@0cc4m 0cc4m merged commit 0c74b04 into ggml-org:master Apr 6, 2025
47 checks passed
colout pushed a commit to colout/llama.cpp that referenced this pull request Apr 29, 2025
Use -FLT_MAX/2 rather than -inf as the initial value for computing the maximum.
timwu pushed a commit to timwu/llama.cpp that referenced this pull request May 5, 2025
Use -FLT_MAX/2 rather than -inf as the initial value for computing the maximum.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants