kv-cells : track min/max used cells and per-sequence positions #13808
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
cont #13706
The
llama_kv_cells_unified
now tracks 2 new min/max quantities:The min/max indices of used cells
Mainly needed for determining the range of KV cells
[0, n_kv)
which are considered during the attention computation with unified KV cache (a.k.a. the oldcell_max()
).The min/max positions for each sequence currently present in the KV cache
Will be used in kv-cache : refactor + add llama_memory_state_i #13746 to improve the
find_slot()
logic in SWA cases.To amortize the cost, we utilize
std::set
.