Qwen-72B-Chat conversion script does not treat <|im_start|> and <|im_end|> correctly.

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [X] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [X] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [X] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new bug or useful enhancement to share.

# Description

(This is specifically for the latest 72B models. I have never tried the smaller ones).

I'm using this model: https://huggingface.co/Qwen/Qwen-72B-Chat

Commit: `33e171d1e9fc4903f9314b490d77fb8d58331b63`

I think the current `convert-hf-to-gguf.py` does not produce a `.gguf` file that treats these two tokens correctly for `<|im_start|>` and `<|im_end|>`.

The prompt I used is "<|im_start|>system" for the examples below.

Following steps in https://github.com/ggerganov/llama.cpp/pull/4281 to produce some `.gguf` files (I personally used the Q6_K on a MacStudio) I tried the `tokenize` tool:

```
    27 -> '<'
    91 -> '|'
   318 -> 'im'
  4906 -> '_start'
    91 -> '|'
    29 -> '>'
  8948 -> 'system'
```

Compare this to a Yi model with exact same prompt:

```
     6 -> '<|im_start|>'
 10707 -> 'system'
```

I saw the Qwen model code (https://huggingface.co/Qwen/Qwen-72B/blob/main/tokenization_qwen.py#L37) and I think these are intended to be single tokens. But the current script does not handle it properly.

# Steps to Reproduce

1. Download the Qwen models. (https://huggingface.co/Qwen/Qwen-72B-Chat)
2. Use the `convert-hf-to-gguf.py` script to convert one into a `.gguf` file. (This is the exact command I found on my Mac Studio: `python3 convert-hf-to-gguf.py --outfile /Volumes/T9/qwen_72b_chat_v3_f16.gguf --outtype f16 ~/text-generation-webui/models/Qwen_Qwen-72B-Chat`)
3. Run `tokenize` on them to see what tokens are interpreted.

If I'm honest, I'm not sure if this would be a bug to `llama.cpp` repository or something Qwen team might want to fix in their repo. But I'm submitting it here for awareness.

Also, the model seems to work fine despite this. But maybe it would work better if they were interpreted correctly? No idea.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen-72B-Chat conversion script does not treat <|im_start|> and <|im_end|> correctly. #4331

Prerequisites

Description

Steps to Reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen-72B-Chat conversion script does not treat <|im_start|> and <|im_end|> correctly. #4331

Description

Prerequisites

Description

Steps to Reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions