server: process prompt fairly accross slots

### Context

At the moment we implement a FIFO approach to batch prompt tokens. So if a large prompt is to be processed it blocks all other slots.

Proposal: implement a fair batch usage of prompt processing accross all pending slots.

References:
- https://github.com/ggerganov/llama.cpp/issues/4216#issuecomment-2043558080
- https://github.com/ggerganov/llama.cpp/issues/5851#issuecomment-1975120585


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server: process prompt fairly accross slots #6607

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

server: process prompt fairly accross slots #6607

Description

Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions