Skip to content

Qwen-72B-Chat conversion script does not treat <|im_start|> and <|im_end|> correctly. #4331

Closed
@Noeda

Description

@Noeda

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Description

(This is specifically for the latest 72B models. I have never tried the smaller ones).

I'm using this model: https://huggingface.co/Qwen/Qwen-72B-Chat

Commit: 33e171d1e9fc4903f9314b490d77fb8d58331b63

I think the current convert-hf-to-gguf.py does not produce a .gguf file that treats these two tokens correctly for <|im_start|> and <|im_end|>.

The prompt I used is "<|im_start|>system" for the examples below.

Following steps in #4281 to produce some .gguf files (I personally used the Q6_K on a MacStudio) I tried the tokenize tool:

    27 -> '<'
    91 -> '|'
   318 -> 'im'
  4906 -> '_start'
    91 -> '|'
    29 -> '>'
  8948 -> 'system'

Compare this to a Yi model with exact same prompt:

     6 -> '<|im_start|>'
 10707 -> 'system'

I saw the Qwen model code (https://huggingface.co/Qwen/Qwen-72B/blob/main/tokenization_qwen.py#L37) and I think these are intended to be single tokens. But the current script does not handle it properly.

Steps to Reproduce

  1. Download the Qwen models. (https://huggingface.co/Qwen/Qwen-72B-Chat)
  2. Use the convert-hf-to-gguf.py script to convert one into a .gguf file. (This is the exact command I found on my Mac Studio: python3 convert-hf-to-gguf.py --outfile /Volumes/T9/qwen_72b_chat_v3_f16.gguf --outtype f16 ~/text-generation-webui/models/Qwen_Qwen-72B-Chat)
  3. Run tokenize on them to see what tokens are interpreted.

If I'm honest, I'm not sure if this would be a bug to llama.cpp repository or something Qwen team might want to fix in their repo. But I'm submitting it here for awareness.

Also, the model seems to work fine despite this. But maybe it would work better if they were interpreted correctly? No idea.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions