Dynamically intterupt token generation

**Is your feature request related to a problem? Please describe.**
During the generation of tokens I would like to stop when I encounter some condition that changes during the runtime, using `Stream=True`.
E.g. I would like to stop generation after 5 lines of generation.

**Describe the solution you'd like**
I would like a method on llm called `stop()`, or `interrupt()`, that forces the model to stop after the next token is generated, similar to CTRL+C in the regular llama.cpp

**Describe alternatives you've considered**
I have considered adding a newline as stop token, but I think this is not performant. Another way I can think of is changing the `stop` list after passing it the generation method, but that feels hacky.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dynamically intterupt token generation #599

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Dynamically intterupt token generation #599

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions