Open
Description
Name and Version
version: 5327 (27ebfca)
built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CUDA
Hardware
nvidia
Models
all
Problem description & steps to reproduce
After the user prompt is provided, the code enters this branch:
Line 716 in 0cf6725
No new tokens are generated.
However, the following code assumes that there is a new token and it is inserted in the assistant response:
Line 824 in 0cf6725
First Bad Commit
No response
Relevant log output
The easiest way is to set a breakpoint here and wait for the assistant message:
https://github.com/ggml-org/llama.cpp/blob/0cf6725e9f9a164c39f7a87214d60342f7f946d8/tools/main/main.cpp#L270