-
Notifications
You must be signed in to change notification settings - Fork 12k
ggml : riscv: add xtheadvector support #13720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@ggerganov Gentle ping on this for review when you get a chance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should figure out a way to rework these ifdef
branches so the code is more readable. Not for this PR, just a general note that it is something we should do.
I plan to split the arch-dependent implementations in |
Let me first sync Btw, I wonder if we should first start with renaming the "aarch64" misnomer in the codebase. The code in |
Yes, I think it would be a good idea to rename it to something like |
@xctan Sync is complete. We could use some help with reorganizing the source tree, so feel free to help out. About the |
This PR builds upon #12530 to introduce k-quant support for the older RVV v0.7.1 implementation (xtheadvector).
Additionally, it updates zfh extension detection to use the built-in compiler macro, eliminating the need for an extra definition.
Evaluation
Build instruction
Verification
Test model: gemma-3-4b-it-GGUF, Q4_K_M quantization. The results of llama-perplexity are:
Performance
Using the same model as above on SG2042.