[Bug report] Performance deterioration of LLaMA-2 model due to hardcoded rms_norm_eps 

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [Yes] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [Yes] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [Yes] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [Yes] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

When running converted ggml model, the eps used in RMSNorm is consistent with original model definition.

# Current Behavior

The norm_eps used in RMSNorm is hardcoded to 1e-6, in all backends: X86, CUDA, Metal.
Related commit: Change RMSNorm eps to 1e-6 #173 (https://github.com/ggerganov/llama.cpp/commit/22213a17b56336bbea384a572a9484ce208c0333)

# Environment and Context

Recently I want to evaluate LLaMA-1 and LLaMA-2 models on MMLU (Measuring Massive Multitask Language Understanding, https://github.com/hendrycks/test) test set, and I chose llama.cpp as the inference engine.
The performance of LLaMA-1 models are nearly the same as the paper reported, but for LLaMA-2 7B and 13B models, they just got the LLaMA-1 7B level scores.
Then I check the model definitions of LLaMA-2 7B and 13B and found the “rms_norm_eps” in config.json is 1e-5 instead of 1e-6. 
After recompiling the source code with the change of eps=1-5, the test results of LLaMA-2 models are finally looking good.

Related issue: 
GGML model showing noticeable quality issues when compared to HF model #2354

Affected discussions: 
LLaMA-2 Perplexities #2352
Presentation on llama.cpp on 25.07.2023 at karlsruhe.ai #2281

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug report] Performance deterioration of LLaMA-2 model due to hardcoded rms_norm_eps #2373

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug report] Performance deterioration of LLaMA-2 model due to hardcoded rms_norm_eps #2373

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions