Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Description
(This is specifically for the latest 72B models. I have never tried the smaller ones).
I'm using this model: https://huggingface.co/Qwen/Qwen-72B-Chat
Commit: 33e171d1e9fc4903f9314b490d77fb8d58331b63
I think the current convert-hf-to-gguf.py
does not produce a .gguf
file that treats these two tokens correctly for <|im_start|>
and <|im_end|>
.
The prompt I used is "<|im_start|>system" for the examples below.
Following steps in #4281 to produce some .gguf
files (I personally used the Q6_K on a MacStudio) I tried the tokenize
tool:
27 -> '<'
91 -> '|'
318 -> 'im'
4906 -> '_start'
91 -> '|'
29 -> '>'
8948 -> 'system'
Compare this to a Yi model with exact same prompt:
6 -> '<|im_start|>'
10707 -> 'system'
I saw the Qwen model code (https://huggingface.co/Qwen/Qwen-72B/blob/main/tokenization_qwen.py#L37) and I think these are intended to be single tokens. But the current script does not handle it properly.
Steps to Reproduce
- Download the Qwen models. (https://huggingface.co/Qwen/Qwen-72B-Chat)
- Use the
convert-hf-to-gguf.py
script to convert one into a.gguf
file. (This is the exact command I found on my Mac Studio:python3 convert-hf-to-gguf.py --outfile /Volumes/T9/qwen_72b_chat_v3_f16.gguf --outtype f16 ~/text-generation-webui/models/Qwen_Qwen-72B-Chat
) - Run
tokenize
on them to see what tokens are interpreted.
If I'm honest, I'm not sure if this would be a bug to llama.cpp
repository or something Qwen team might want to fix in their repo. But I'm submitting it here for awareness.
Also, the model seems to work fine despite this. But maybe it would work better if they were interpreted correctly? No idea.