Skip to content

server: Completion of pre-tokenized prompt is broken #4476

Closed
@shibe2

Description

@shibe2

Prerequisites

Please answer the following questions for yourself before submitting an issue.

Expected Behavior

According to documentation: https://github.com/ggerganov/llama.cpp/blob/cafcd4f89500b8afef722cdb08088eceb8a22572/examples/server/README.md?plain=1#L117

or as an array of strings or numbers representing tokens

Current Behavior

When supplying the prompt as array of token identifiers, it instead calls split_multiprompt_task and the request hangs.

Steps to Reproduce

  1. Call /tokenize with a text prompt in content.
  2. Add BOS if needed.
  3. Call /completion with the resulting array in prompt.

Failure Logs

all slots are idle and system prompt is empty, clear the KV cache

slot 0 is processing [task id: 2]
slot unavailable

print_timings: prompt eval time = 0.00 ms / 0 tokens ( -nan ms per token, -nan tokens per second)
print_timings: eval time = -94366367288.92 ms / 0 runs ( -inf ms per token, -0.00 tokens per second)
print_timings: total time = -94366367288.92 ms
slot unavailable

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions