Skip to content

Eval bug: llama-cli, spurious token added to assistant response #13402

Open
@matteoserva

Description

@matteoserva

Name and Version

version: 5327 (27ebfca)
built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

nvidia

Models

all

Problem description & steps to reproduce

After the user prompt is provided, the code enters this branch:

LOG_DBG("embd_inp.size(): %d, n_consumed: %d\n", (int) embd_inp.size(), n_consumed);

No new tokens are generated.

However, the following code assumes that there is a new token and it is inserted in the assistant response:

assistant_ss << common_token_to_piece(ctx, id, false);

First Bad Commit

No response

Relevant log output

The easiest way is to set a breakpoint here and wait for the assistant message:

https://github.com/ggml-org/llama.cpp/blob/0cf6725e9f9a164c39f7a87214d60342f7f946d8/tools/main/main.cpp#L270

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions