Eval bug: llama-cli, Qwen3 jinja template will break CLI multiturn conversation

### Name and Version

611aa914ef4231fab5d1ad04773c42e119ae2d2e

### Operating systems

Linux

### GGML backends

CUDA

### Hardware

NVIDIA

### Models

Qwen3

### Problem description & steps to reproduce

I don't know if it's too soon but I'm opening this to keep track of the issue.
The original qwen3 template is not supported but the bug can be tested by using a [modified template](https://pastebin.com/GGuTbFRc)

The qwen3 template contains the following check (stripped down to the relevant section):
```
{%- if loop.index0 > ns.last_query_index %} 
{%- if loop.last or (not loop.last and reasoning_content) %} 
 KEEP REASONING TOKENS
```

Meaning that in the common case the tokens are kept when the last role is `assistant` and the tokens are discarded when the last role is `user`.

The problem is that at the start of the turn, the following pseudo-code is executed:
```
- messages.append(user_message)
- fmt_past_msg = apply_chat_template(messages)
- messages.append(assistant_messages)
- fmt_new_msg = apply_chat_template(messages)
- diff = fmt_new_msg - fmt_past_msg
```

The `diff` is not computed correctly since the assistant message used in v1 has the thinking tokens preserved and the assistant message in v2 has the thinking tokens removed.

Relevant section of the code:
https://github.com/ggml-org/llama.cpp/blob/611aa914ef4231fab5d1ad04773c42e119ae2d2e/common/chat.cpp#L320


### First Bad Commit

_No response_

### Relevant log output

```shell
std::string common_chat_format_single(...) {
[...]
fmt_past_msg = common_chat_templates_apply(tmpls, inputs).prompt;
[...]
inputs.messages.push_back(new_msg);
[...]
auto fmt_new_msg = common_chat_templates_apply(tmpls, inputs).prompt;
// get the diff part
ss << fmt_new_msg.substr(fmt_past_msg.size(), fmt_new_msg.size() - fmt_past_msg.size());
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: llama-cli, Qwen3 jinja template will break CLI multiturn conversation #13404

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: llama-cli, Qwen3 jinja template will break CLI multiturn conversation #13404

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions