-
Notifications
You must be signed in to change notification settings - Fork 12k
llama : support RWKV v6 models #8980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
53 commits
Select commit
Hold shift + click to select a range
8d2eca3
convert_hf_to_gguf: Add support for RWKV v6
MollySophia dc0767f
Add RWKV tokenization
LaylBongers 865167d
Fix build
MollySophia 7cac72a
Do not use special tokens when matching in RWKV tokenizer
LaylBongers e92c74f
Fix model loading
LaylBongers a0aae8d
Add (broken) placeholder graph builder for RWKV
LaylBongers a866789
Add workaround for kv cache
LaylBongers 4e23d97
Add logits conversion to rwkv5
LaylBongers 5479588
Add rwkv5 layer norms
LaylBongers dd3aa3d
Add time mix KVRG & correct merge mistake
LaylBongers b409fd8
Add remaining time mix parameters
LaylBongers 3cbeffc
Add time mix output loading
LaylBongers b3b17e0
Add placeholder llm_build_time_mix
LaylBongers 700dad1
Fix build
MollySophia a180b63
Load more tensors for rwkv v6
MollySophia 0e5ac34
Fix rwkv tokenizer
MollySophia 5732de8
ggml: Add unary operator Exp
MollySophia 0784a0c
RWKV v6 graph building
MollySophia 8d498c7
Add ``rescale_every_n_layers`` parameter
MollySophia 903089b
Add ``wkv.head_size`` key for RWKV
MollySophia 98ce5f4
Fix offloading layers to CUDA
MollySophia 01dcf4b
Fix parallel inferencing for RWKV
MollySophia 6ae2f48
Remove trailing whitespaces
MollySophia 8bc1f9a
build_rwkv: Avoid using inplace operations
MollySophia 18decea
convert_hf_to_gguf: rwkv: Avoid using ``eval``
MollySophia 7f2e370
convert_hf_to_gguf: rwkv tokenizer: Don't escape sequences manually
MollySophia c695552
Update convert_hf_to_gguf.py
MollySophia 8aa711a
ggml: Add backward computation for unary op ``exp``
MollySophia ae9936a
Update convert_hf_to_gguf.py
MollySophia 5afa3ef
Update convert_hf_to_gguf.py
MollySophia 12fbe1a
Use MODEL_ARCH.RWKV6 instead of MODEL_ARCH.RWKV
MollySophia 276d53b
build_rwkv6: Simplify graph
MollySophia b0f4fe5
llama: rwkv6: Detect model.type
MollySophia 683d70c
llama: rwkv6: Fix tensor loading for 7B/14B models
MollySophia ee1b78c
llama: rwkv6: Fix group_norm assertion failure with Metal
MollySophia c165e34
llama: rwkv6: Clean up
MollySophia 6da6aa4
llama: rwkv6: Add quantization tensor exclusion
MollySophia f5d955d
llama: rwkv6: Use the new advanced batch splits
MollySophia 57decb4
Update src/llama.cpp
MollySophia e94778a
llama: rwkv6: Use ``ggml_norm`` instead of ``ggml_group_norm``
MollySophia 7756afd
llama: rwkv6: Apply code style and misc changes
MollySophia 87a2901
converter: Use class name ``Rwkv6Model``
MollySophia c414a24
llama: rwkv6: Make use of key ``feed_forward_length``
MollySophia 6d69fd7
llama: rwkv6: Add kv ``time_mix_extra_dim`` and ``time_decay_extra_dim``
MollySophia 601b592
converter: Match ``new_name`` instead of ``name`` for float32 explici…
MollySophia e0ea511
llama: rwkv6: Keep ``time_mix_w1/w2`` as F32
MollySophia 5f00c52
llama: rwkv6: Remove unused nodes
MollySophia 7444046
llama: rwkv6: Apply code format changes
MollySophia 7f2ef56
llama: rwkv6: Add lora for some supported tensors
MollySophia 7004323
rwkv : speed-up tokenization using trie
ggerganov 59dc2e7
minor : style + indentation
ggerganov 5175375
llama: rwkv6: Avoid division by zero
MollySophia 846358d
ggml: rwkv_wkv: Avoid copying the state
MollySophia File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.