Open
Description
Is your feature request related to a problem? Please describe.
During the generation of tokens I would like to stop when I encounter some condition that changes during the runtime, using Stream=True
.
E.g. I would like to stop generation after 5 lines of generation.
Describe the solution you'd like
I would like a method on llm called stop()
, or interrupt()
, that forces the model to stop after the next token is generated, similar to CTRL+C in the regular llama.cpp
Describe alternatives you've considered
I have considered adding a newline as stop token, but I think this is not performant. Another way I can think of is changing the stop
list after passing it the generation method, but that feels hacky.