Closed
Description
129c7d1 (#20) added a repetition penalty that prevent the model to run into loops.
Here are a few suggestions for possible enhancements:
- One issue with the interactive mode is that the repetition penalty is affecting the anti-prompt and response prefix, causing the model to generate unnecessarily long responses. One solution could be to exclude these tokens from the penalty,
- It is possible to exempt or reduce the penalty for stop words, punctuation characters, and newlines; maybe applying a frequency-based penalty instead,
- Using an exponential decay, such that recent tokens are more penalized than older ones, causing less issues with large
repeat_last_n
windows, - Token repetition is an approximation of sub-strings or word repetition, but it seems difficult to do otherwise without backtracking the inference.