Closed
Description
The original paper, and the reference implementation [1] uses RMS norm. However, llama.cpp uses ggml_norm() which looks like Layer norm?
The differences between these may not be too obvious, because the mean is probably around 0. However, we should follow the original design.
[1] https://github.com/facebookresearch/llama/blob/main/llama/model.py