single client multi-prompt hangs on server

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [X] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [X] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [X] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

Tried the example in #4232

# Current Behavior

The example in #4232 hangs the server.
```
$ ./server -m models/mistral-7b-instruct-v0.2.Q8_0.gguf -c 32768 -t 1 -ngl 1 -np 2                                                                                                                                                                     
{"timestamp":1703215447,"level":"INFO","function":"main","line":2668,"message":"build info","build":1680,"commit":"afefa319"}
{"timestamp":1703215447,"level":"INFO","function":"main","line":2675,"message":"system info","n_threads":1,"n_threads_batch":-1,"total_threads":8,"system_info":"AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | "}
llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from models/mistral-7b-instruct-v0.2.Q8_0.gguf (version GGUF V3 (latest))
[... omit ...]
Available slots:
 -> Slot 0 - max context: 16384
 -> Slot 1 - max context: 16384

llama server listening at http://127.0.0.1:8080

{"timestamp":1703215448,"level":"INFO","function":"main","line":3097,"message":"HTTP server listening","port":"8080","hostname":"127.0.0.1"}
all slots are idle and system prompt is empty, clear the KV cache
slot 0 is processing [task id: 2]
slot 1 is processing [task id: 3]
slot 0 : kv cache rm - [0, end)
slot 1 : kv cache rm - [0, end)

print_timings: prompt eval time =     888.72 ms /    17 tokens (   52.28 ms per token,    19.13 tokens per second)
print_timings:        eval time =   16917.36 ms /    85 runs   (  199.03 ms per token,     5.02 tokens per second)
print_timings:       total time =   17806.08 ms
slot 0 released (103 tokens in cache)

print_timings: prompt eval time =     888.64 ms /    16 tokens (   55.54 ms per token,    18.01 tokens per second)
print_timings:        eval time =   19226.04 ms /   111 runs   (  173.21 ms per token,     5.77 tokens per second)
print_timings:       total time =   20114.68 ms
```

On the client side, it's the example in #4232, but there's nothing coming back
```
$  curl --request POST --url http://localhost:8080/completion --header "Content-Type: application/json" --data '{"prompt": ["<s>[INST] What is the capital of the US? [/INST]", "<s>[INST] What is the capital of France? [/INST]"], "n_predict": 2048}'
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

single client multi-prompt hangs on server #4583

Prerequisites

Expected Behavior

Current Behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

single client multi-prompt hangs on server #4583

Description

Prerequisites

Expected Behavior

Current Behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions