Skip to content

llama : switch to floating-point token positions #5679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ggerganov
Copy link
Member

Change llama_pos from int32_t to float

This change might seem unnecessary at first as we are used to think about token positions as integers, but technically nothing prevents these to be floats. Also, I'm having some ideas for KV cache compression / context extension tricks and having float positions could turn out to be useful.

Still contemplating if we should merge this, so for now just a draft

@ngxson
Copy link
Collaborator

ngxson commented Feb 23, 2024

+1 For this, I'm wondering if it helps simplifying the code of group attention (self-extend)

@ggerganov
Copy link
Member Author

Not sure if it will become simpler, but one of the things I want to investigate is to apply floating-point division in llama_kv_cache_seq_div() instead of the current integer division. Intuitively, I expect to improve the recall quality

The other idea I want to explore is to merge KV cells into one another via averaging both of the positions and the KV values. Wondering if this can be applied to compress the KV cache data into fewer cells

@mofosyne mofosyne added refactoring Refactoring Review Complexity : High Generally require indepth knowledge of LLMs or GPUs labels May 10, 2024
@ggerganov ggerganov added the demo Demonstrate some concept or idea, not intended to be merged label Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
demo Demonstrate some concept or idea, not intended to be merged refactoring Refactoring Review Complexity : High Generally require indepth knowledge of LLMs or GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants