Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [Yes] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [Yes] I carefully followed the README.md.
- [Yes] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [Yes] I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
When running converted ggml model, the eps used in RMSNorm is consistent with original model definition.
Current Behavior
The norm_eps used in RMSNorm is hardcoded to 1e-6, in all backends: X86, CUDA, Metal.
Related commit: Change RMSNorm eps to 1e-6 #173 (22213a1)
Environment and Context
Recently I want to evaluate LLaMA-1 and LLaMA-2 models on MMLU (Measuring Massive Multitask Language Understanding, https://github.com/hendrycks/test) test set, and I chose llama.cpp as the inference engine.
The performance of LLaMA-1 models are nearly the same as the paper reported, but for LLaMA-2 7B and 13B models, they just got the LLaMA-1 7B level scores.
Then I check the model definitions of LLaMA-2 7B and 13B and found the “rms_norm_eps” in config.json is 1e-5 instead of 1e-6.
After recompiling the source code with the change of eps=1-5, the test results of LLaMA-2 models are finally looking good.
Related issue:
GGML model showing noticeable quality issues when compared to HF model #2354
Affected discussions:
LLaMA-2 Perplexities #2352
Presentation on llama.cpp on 25.07.2023 at karlsruhe.ai #2281