-
Notifications
You must be signed in to change notification settings - Fork 12.1k
[SYCL] Fix WARP_SIZE=16 bug of Intel GPU #8266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tested Meta-Llama-3-8B-Instruct-Q4_K_S.gguf and llama-2-7b.Q4_0.gguf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's passed on MTL after I test.
Tested iq4_XS, Q4_K_S. LGTM |
@qnixsynapse Thanks for your test! Q4_K models still use WARP_SIZE=32, so they won't benefit from this PR. |
@luoyu-intel Yes, I am aware. I am testing IQ4 models currently. |
@joeatodd @OuadiElfarouki Performance of the SYCL branch using an NVIDIA A100 with Q4_K has no regressions.
build: 4887fdc (3293)
build: 4887fdc (3293) |
[SYCL] Fix WARP_SIZE=16 bug of Intel GPU (ggml-org#8266) * fix group_norm ut * split softmax * fix softmax * add concat support condition * revert debug code * move QK_WARP_SIZE to presets.hpp Fix issue in above PR: fix norm() nullptr lead to crash on iGPU. use WARP_32_SIZE replace QK_WARP_SIZE optimize dmmv.cpp for iGPU. add sycl_hw.cpp to detect Hardware info.
[SYCL] Fix WARP_SIZE=16 bug of Intel GPU (ggml-org#8266) cherry-pick b549a1b
This PR fixes some bugs of WARP_SIZE=16 for Intel GPU. All warp-related UTs are passed.
WARP_SIZE=16 has the same output as WARP_SIZE=32 on Intel GPUs.
NOTE: QX_K kernels are specialized for WARP_SIZE=32, so I use a fixed WARP_SIZE for them.
Performance change
llama-2-7b-chat-hf-q4_0.gguf, 32 in and 32 out, on ARC A770, from 40 tokens/s to 44 tokens/s
Master Branch
PR Branch