-
Notifications
You must be signed in to change notification settings - Fork 388
feat: add bpe #119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add bpe #119
Conversation
I don't think it's necessary to load the merge file, vocab is already ready, just use it. The only thing we need to do is implement the actual bpe function. |
Okay, I'll adjust it |
I actually want to implement customized clip loading. For example, chinese-clip, I wonder if this idea is feasible? edit: Possible obstacles are that the tokenizer it uses is not bpe at all. |
The clip model works if it can produce output with similar distribution to the clip model used by stable diffusion or if there is a dedicated unet model. I don't know if chinese-clip is suitable for stable diffusion, because I haven't studied its training process in detail. |
@leejet I just checked the built-in vocab.json. The vocab.json cannot repalce the function of merge.txt because it does not provide subword order. Another problem is that the token id of the bpe algorithm is the frequency of word occurrence, not a fixed mapping, so I am a bit worried that unexpected errors may occur when using the built-in vocabulary list. |
I've improved the tokenizer's handling of Unicode, generating vocab using merges information, similar to https://github.com/openai/CLIP/blob/main/clip/simple_tokenizer.py. Now you can merge the latest master branch into your PR, making it convenient to implement BPE. Please refer to https://github.com/openai/CLIP/blob/main/clip/simple_tokenizer.py for guidance. |
I already have an implementation of embeddings (textual inversion) ready, and it works fine. However, it requires a specific way of loading the embedding from the prompt Another issue is that it is unclear how other projects handle very long prompts of more than 77 tokens. Do they only use the first 77 and ignore the rest? There are some embeddings that require up to 75 tokens, and that covers almost the entirety of the available context. |
Thx |
# Conflicts: # model.cpp # model.h # stable-diffusion.cpp
Now We have completed all the logic, but I feel that the performance is not very good. After investigation, I found that If we want to improve the speed, we can add a built-in vocab.json, which will increase the memory by about 1MB. If we need to less memory, we can keep the current code. |
@FSSRepo Please check out the link below: In summary, it will be split into multiple chunks |
On my computer, load_merges takes 82ms, and load_vocab takes 53ms. There isn't a significant difference. These operations, akin to loading model weights, run only once, and the time spent compared to loading model weights is negligible. I believe there's no need to do this. |
We can merge this pr |
My pull request is ready; however, a user is experiencing issues with the VAE tiling function for reasons unknown to me. I have tested it with images of various resolutions and haven't encountered problems with incomplete images. |
I tested on mac and windows cuda and didn't encounter this problem |
I've finished reviewing your PR and I'm ready to merge it, but I'm unsure whether Green-Sky will still encounter the issue. |
Once I've reviewed the changes in this PR, I might merge it first. |
I am finding this PR results in images that have very little relation to the prompt. Running on Linux and using CPU only. Even if a simple prompt like "photo of a cat next to a christmas tree, wearing a christmas hat". Master will generate a image of a cat generally sitting next to a christmas tree and wearing a red santa type hat. While this PR will generate a woman wearing a woolly hat. Generally a close up head shot of her. |
I just tried it and it happened to me too. Let me see what caused it. |
@Cyberhan123 Is this a complete implementation of BPE, or are there still things missing, as you could start deleting "Implement BPE Tokenizer" from the TODO section of the README? |
@FSSRepo This is the complete implementation, I just made a low-level mistake, which has been fixed now. |
@mwt11a ./sd -m ../../models/sd-v1-4.ckpt -p "photo of a cat next to a christmas tree, wearing a christmas hat" --type q8_0 |
emmm, the cat generated by this model looks a bit like Gargamel... |
@leejet But The So I change the code to: auto end = merges.begin() + 49152 - 256 - 2 + 1; // The Exception caused.
merges = std::vector<std::u32string>(merges.begin() + 1, end); Then I jumped into the exception. We can know that the macro |
I am more accustomed to using JetBrains products, such as webstorm, goland, and clion. Maybe due to lack of experience, I didn't find a suitable way to format the code, which caused extra work for you. I'm sorry. |
I'm working with VSCode 😂😂😂 and a lot of extensions |
@Cyberhan123 I've completed reviewing the code in this PR, and it looks good to me. I'm planning to merge it now. Thank you for your contribution. |
fix: #85