Closed
Description
Prerequisites
Context length limit is an issue on all LLMs. The following repository and associated paper is demonstrating that keeping the 4 initial tokens will enable a infinite context length on most common LLMs without sacrificing performance or efficiency.
Code : https://github.com/mit-han-lab/streaming-llm
Paper reference inside the repo which demonstrates the attention-sink effect of LLMs and how to take advantage of it.
Current Behavior
There is a limit on context length defined mostly by pre-training. Other approaches like rope or sliding window have their pros and cons, none of them can get to a higher context length than this apporach.