Support MiniCPM-2B-128k #6602

zkh2016 · 2024-04-11T07:47:39Z

Support https://huggingface.co/openbmb/MiniCPM-2B-128k.

ggerganov · 2024-04-11T13:12:35Z

llama.cpp

+                    if (!hparams.tie_lm_head){
+                        model.output      = ml.create_tensor(ctx_output_split, tn(LLM_TENSOR_OUTPUT,      "weight"), {n_embd, n_vocab}, false);
+                    }


We already handle tied tensors few lines below. Maybe you simply have to remove the if (model.arch != LLM_ARCH_MINICPM){ check and this model would work?

Hello @ggerganov , we are currently encountering a problem when adapting our new model: the evaluation effect for models smaller than 4k is consistent with vllm, but the evaluation effect for models larger than 4k is not as good as vllm. What may be the reason for this? When evaluating, the example/server method is used, with the following startup and request parameters:

./server -m MiniCPM-2B-128k/ggml-model-f16.gguf --chat-template chatml --rope-freq-base 4129032.258 --host 0.0.0.0 -c 12000

request data:

data = {"stream": False, "n_predict": max_token, "temperature": 0.3, "stop": ["<|im_end|>", "</s>"], "repeat_last_n": 256, "repeat_penalty": 1.0, "top_k": 40, "top_p": 0.5, "min_p": 0.05, "tfs_z": 1, "typical_p": 1, "presence_penalty": 0, "frequency_penalty": 0, "mirostat": 0, "mirostat_tau": 5, "mirostat_eta": 0.1, "grammar": "", "n_probs": 0, "min_keep": 0, "image_data": [], "cache_prompt": True, "api_key": "", "prompt": f"<|im_start|>user{prompt}<|im_end|><|im_start|>assistant\n" }

vllm param:

params_dict = { "n": 1, "best_of": None, "presence_penalty": 0.0, "frequency_penalty": 0.0, "repetition_penalty": 1.0, "temperature": 0.3, "top_p": 0.5, "top_k": -1, "use_beam_search": False, "length_penalty": 1.0, "early_stopping": False, "stop": None, "stop_token_ids": None, "ignore_eos": False, "logprobs": None, "prompt_logprobs": None, "skip_special_tokens": False, "stop": ["<|im_end|>", "</s>"] }

Why do you use --rope-freq-base 4129032.258 when the config specifies 1e6:

https://huggingface.co/openbmb/MiniCPM-2B-128k/blob/main/config.json#L34

Also, this model seems to use some rope scaling:

https://huggingface.co/openbmb/MiniCPM-2B-128k/blob/main/config.json#L25

You need to apply the same thing when starting the server

The model currently uses DynamicNTKScalingRotaryEmbedding. How should I pass parameters?

It can run up to 64k without NTK scaling.

@foldl @zkh2016 hi, do we support Dynamic NTK scaling in llama.cpp? Thanks.

I think we don't support it. Anyway, isn't 64k context length enough?

https://github.com/ggerganov/llama.cpp/blob/148ec970b62c3c5ae0a8bfdaad2fc237aaae350d/convert_hf_to_gguf.py#L1638-L1650

Ok. thanks for your quick reply.

Achazwl · 2024-04-23T09:08:44Z

convert-hf-to-gguf.py

@@ -1548,6 +1548,8 @@ def set_gguf_parameters(self):
        self.gguf_writer.add_head_count_kv(self.hparams["num_key_value_heads"])
        self.gguf_writer.add_layer_norm_rms_eps(self.hparams["rms_norm_eps"])
        self.gguf_writer.add_file_type(self.ftype)
+        if "tie_lm_head" in self.hparams:


The name should be tie_word_embeddings in huggingface's config.

github-actions · 2024-05-10T22:28:53Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 548 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8562.98ms p(95)=20759.53ms fails=, finish reason: stop=484 truncated=64
Prompt processing (pp): avg=94.97tk/s p(95)=397.46tk/s
Token generation (tg): avg=33.01tk/s p(95)=47.34tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=new_minicpm commit=f63f147471a2c45ba3c331d5f3578141243d3553

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 548 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1715379497 --> 1715380127
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 532.67, 532.67, 532.67, 532.67, 532.67, 850.55, 850.55, 850.55, 850.55, 850.55, 568.64, 568.64, 568.64, 568.64, 568.64, 598.73, 598.73, 598.73, 598.73, 598.73, 661.48, 661.48, 661.48, 661.48, 661.48, 667.86, 667.86, 667.86, 667.86, 667.86, 672.75, 672.75, 672.75, 672.75, 672.75, 701.66, 701.66, 701.66, 701.66, 701.66, 711.23, 711.23, 711.23, 711.23, 711.23, 727.17, 727.17, 727.17, 727.17, 727.17, 732.62, 732.62, 732.62, 732.62, 732.62, 739.24, 739.24, 739.24, 739.24, 739.24, 753.43, 753.43, 753.43, 753.43, 753.43, 763.24, 763.24, 763.24, 763.24, 763.24, 681.57, 681.57, 681.57, 681.57, 681.57, 668.04, 668.04, 668.04, 668.04, 668.04, 671.9, 671.9, 671.9, 671.9, 671.9, 681.22, 681.22, 681.22, 681.22, 681.22, 697.19, 697.19, 697.19, 697.19, 697.19, 703.89, 703.89, 703.89, 703.89, 703.89, 711.87, 711.87, 711.87, 711.87, 711.87, 717.6, 717.6, 717.6, 717.6, 717.6, 731.63, 731.63, 731.63, 731.63, 731.63, 724.72, 724.72, 724.72, 724.72, 724.72, 727.61, 727.61, 727.61, 727.61, 727.61, 730.22, 730.22, 730.22, 730.22, 730.22, 735.86, 735.86, 735.86, 735.86, 735.86, 733.81, 733.81, 733.81, 733.81, 733.81, 735.14, 735.14, 735.14, 735.14, 735.14, 736.86, 736.86, 736.86, 736.86, 736.86, 742.8, 742.8, 742.8, 742.8, 742.8, 745.5, 745.5, 745.5, 745.5, 745.5, 744.57, 744.57, 744.57, 744.57, 744.57, 746.66, 746.66, 746.66, 746.66, 746.66, 760.15, 760.15, 760.15, 760.15, 760.15, 767.94, 767.94, 767.94, 767.94, 767.94, 777.56, 777.56, 777.56, 777.56, 777.56, 776.35, 776.35, 776.35, 776.35, 776.35, 776.95, 776.95, 776.95, 776.95, 776.95, 779.25, 779.25, 779.25, 779.25, 779.25, 781.81, 781.81, 781.81, 781.81, 781.81, 785.98, 785.98, 785.98, 785.98, 785.98, 783.92, 783.92, 783.92, 783.92, 783.92, 755.2, 755.2, 755.2, 755.2, 755.2, 755.09, 755.09, 755.09, 755.09, 755.09, 754.44, 754.44, 754.44, 754.44, 754.44, 748.03, 748.03, 748.03, 748.03, 748.03, 755.51, 755.51, 755.51, 755.51, 755.51, 756.07, 756.07, 756.07, 756.07, 756.07, 763.23, 763.23, 763.23, 763.23, 763.23, 762.56, 762.56, 762.56, 762.56, 762.56, 765.52, 765.52, 765.52, 765.52, 765.52, 770.86, 770.86, 770.86, 770.86, 770.86, 771.32, 771.32, 771.32, 771.32, 771.32, 770.53, 770.53, 770.53, 770.53, 770.53, 772.16, 772.16, 772.16, 772.16, 772.16, 773.19, 773.19, 773.19, 773.19, 773.19, 774.7, 774.7, 774.7, 774.7, 774.7, 776.9, 776.9, 776.9, 776.9, 776.9, 777.34, 777.34, 777.34, 777.34, 777.34, 780.16, 780.16, 780.16, 780.16, 780.16, 780.16]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 548 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1715379497 --> 1715380127
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 40.08, 40.08, 40.08, 40.08, 40.08, 41.82, 41.82, 41.82, 41.82, 41.82, 32.38, 32.38, 32.38, 32.38, 32.38, 32.82, 32.82, 32.82, 32.82, 32.82, 33.5, 33.5, 33.5, 33.5, 33.5, 33.11, 33.11, 33.11, 33.11, 33.11, 34.53, 34.53, 34.53, 34.53, 34.53, 35.14, 35.14, 35.14, 35.14, 35.14, 35.37, 35.37, 35.37, 35.37, 35.37, 35.35, 35.35, 35.35, 35.35, 35.35, 34.83, 34.83, 34.83, 34.83, 34.83, 34.6, 34.6, 34.6, 34.6, 34.6, 33.49, 33.49, 33.49, 33.49, 33.49, 33.44, 33.44, 33.44, 33.44, 33.44, 33.09, 33.09, 33.09, 33.09, 33.09, 32.25, 32.25, 32.25, 32.25, 32.25, 32.43, 32.43, 32.43, 32.43, 32.43, 32.8, 32.8, 32.8, 32.8, 32.8, 32.53, 32.53, 32.53, 32.53, 32.53, 32.28, 32.28, 32.28, 32.28, 32.28, 32.13, 32.13, 32.13, 32.13, 32.13, 32.15, 32.15, 32.15, 32.15, 32.15, 32.25, 32.25, 32.25, 32.25, 32.25, 32.14, 32.14, 32.14, 32.14, 32.14, 31.88, 31.88, 31.88, 31.88, 31.88, 32.03, 32.03, 32.03, 32.03, 32.03, 32.1, 32.1, 32.1, 32.1, 32.1, 31.64, 31.64, 31.64, 31.64, 31.64, 31.28, 31.28, 31.28, 31.28, 31.28, 31.42, 31.42, 31.42, 31.42, 31.42, 31.5, 31.5, 31.5, 31.5, 31.5, 31.68, 31.68, 31.68, 31.68, 31.68, 31.82, 31.82, 31.82, 31.82, 31.82, 32.02, 32.02, 32.02, 32.02, 32.02, 31.95, 31.95, 31.95, 31.95, 31.95, 31.76, 31.76, 31.76, 31.76, 31.76, 31.62, 31.62, 31.62, 31.62, 31.62, 31.52, 31.52, 31.52, 31.52, 31.52, 31.56, 31.56, 31.56, 31.56, 31.56, 31.67, 31.67, 31.67, 31.67, 31.67, 31.82, 31.82, 31.82, 31.82, 31.82, 31.94, 31.94, 31.94, 31.94, 31.94, 31.83, 31.83, 31.83, 31.83, 31.83, 31.64, 31.64, 31.64, 31.64, 31.64, 31.41, 31.41, 31.41, 31.41, 31.41, 30.7, 30.7, 30.7, 30.7, 30.7, 29.91, 29.91, 29.91, 29.91, 29.91, 29.89, 29.89, 29.89, 29.89, 29.89, 30.01, 30.01, 30.01, 30.01, 30.01, 30.1, 30.1, 30.1, 30.1, 30.1, 30.13, 30.13, 30.13, 30.13, 30.13, 30.29, 30.29, 30.29, 30.29, 30.29, 30.28, 30.28, 30.28, 30.28, 30.28, 30.28, 30.28, 30.28, 30.28, 30.28, 30.28, 30.28, 30.28, 30.28, 30.28, 30.36, 30.36, 30.36, 30.36, 30.36, 30.5, 30.5, 30.5, 30.5, 30.5, 30.62, 30.62, 30.62, 30.62, 30.62, 30.73, 30.73, 30.73, 30.73, 30.73, 30.79, 30.79, 30.79, 30.79, 30.79, 30.83, 30.83, 30.83, 30.83, 30.83, 30.91]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 548 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1715379497 --> 1715380127
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.12, 0.12, 0.12, 0.12, 0.12, 0.34, 0.34, 0.34, 0.34, 0.34, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.26, 0.26, 0.26, 0.26, 0.26, 0.15, 0.15, 0.15, 0.15, 0.15, 0.09, 0.09, 0.09, 0.09, 0.09, 0.17, 0.17, 0.17, 0.17, 0.17, 0.11, 0.11, 0.11, 0.11, 0.11, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.22, 0.22, 0.22, 0.22, 0.22, 0.23, 0.23, 0.23, 0.23, 0.23, 0.22, 0.22, 0.22, 0.22, 0.22, 0.33, 0.33, 0.33, 0.33, 0.33, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.2, 0.2, 0.2, 0.2, 0.2, 0.22, 0.22, 0.22, 0.22, 0.22, 0.25, 0.25, 0.25, 0.25, 0.25, 0.14, 0.14, 0.14, 0.14, 0.14, 0.27, 0.27, 0.27, 0.27, 0.27, 0.26, 0.26, 0.26, 0.26, 0.26, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.31, 0.31, 0.31, 0.31, 0.31, 0.3, 0.3, 0.3, 0.3, 0.3, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.12, 0.12, 0.12, 0.12, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11, 0.15, 0.15, 0.15, 0.15, 0.15, 0.13, 0.13, 0.13, 0.13, 0.13, 0.18, 0.18, 0.18, 0.18, 0.18, 0.15, 0.15, 0.15, 0.15, 0.15, 0.22, 0.22, 0.22, 0.22, 0.22, 0.19, 0.19, 0.19, 0.19, 0.19, 0.15, 0.15, 0.15, 0.15, 0.15, 0.13, 0.13, 0.13, 0.13, 0.13, 0.11, 0.11, 0.11, 0.11, 0.11, 0.32, 0.32, 0.32, 0.32, 0.32, 0.49, 0.49, 0.49, 0.49, 0.49, 0.6, 0.6, 0.6, 0.6, 0.6, 0.61, 0.61, 0.61, 0.61, 0.61, 0.5, 0.5, 0.5, 0.5, 0.5, 0.11, 0.11, 0.11, 0.11, 0.11, 0.2, 0.2, 0.2, 0.2, 0.2, 0.1, 0.1, 0.1, 0.1, 0.1, 0.24, 0.24, 0.24, 0.24, 0.24, 0.1, 0.1, 0.1, 0.1, 0.1, 0.16, 0.16, 0.16, 0.16, 0.16, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.09, 0.09, 0.09, 0.09, 0.09, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.2, 0.2, 0.2, 0.2, 0.2, 0.27]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 548 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1715379497 --> 1715380127
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 1.0]

zkh2016 · 2024-05-13T10:46:12Z

move to #6919

zhangkaihuo added 5 commits March 30, 2024 10:34

for new minicpm

e913ac9

compatible with old and new minicpm versions

9ecc666

remove eos

8502a01

for old config

582b13c

recover main.cpp

4f61b30

ggerganov reviewed Apr 11, 2024

View reviewed changes

zhangkaihuo added 2 commits April 16, 2024 20:29

merge upstream

4da6e3e

fix

1cd0a03

Achazwl reviewed Apr 23, 2024

View reviewed changes

Merge branch 'master' into new_minicpm

f63f147

mofosyne added enhancement New feature or request Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level model Model specific labels May 10, 2024

zkh2016 closed this May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support MiniCPM-2B-128k #6602

Support MiniCPM-2B-128k #6602

Uh oh!

zkh2016 commented Apr 11, 2024

Uh oh!

ggerganov Apr 11, 2024

Uh oh!

zkh2016 Apr 16, 2024

Uh oh!

zkh2016 Apr 16, 2024

Uh oh!

ggerganov Apr 16, 2024

Uh oh!

zkh2016 Apr 17, 2024

Uh oh!

foldl Apr 19, 2024

Uh oh!

RunningLeon Jul 5, 2024

Uh oh!

foldl Jul 5, 2024

Uh oh!

RunningLeon Jul 5, 2024

Uh oh!

Achazwl Apr 23, 2024

Uh oh!

github-actions bot commented May 10, 2024

Uh oh!

zkh2016 commented May 13, 2024 •

edited

Loading

Uh oh!

Uh oh!

Support MiniCPM-2B-128k #6602

Support MiniCPM-2B-128k #6602

Uh oh!

Conversation

zkh2016 commented Apr 11, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented May 10, 2024

Uh oh!

zkh2016 commented May 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

zkh2016 commented May 13, 2024 •

edited

Loading