Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- I am running the latest code: cafcd4f
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
According to documentation: https://github.com/ggerganov/llama.cpp/blob/cafcd4f89500b8afef722cdb08088eceb8a22572/examples/server/README.md?plain=1#L117
or as an array of strings or numbers representing tokens
Current Behavior
When supplying the prompt as array of token identifiers, it instead calls split_multiprompt_task
and the request hangs.
Steps to Reproduce
- Call /tokenize with a text prompt in
content
. - Add BOS if needed.
- Call /completion with the resulting array in
prompt
.
Failure Logs
slot 0 is processing [task id: 2]
slot unavailable
print_timings: prompt eval time = 0.00 ms / 0 tokens ( -nan ms per token, -nan tokens per second)
print_timings: eval time = -94366367288.92 ms / 0 runs ( -inf ms per token, -0.00 tokens per second)
print_timings: total time = -94366367288.92 ms
slot unavailable