-
Notifications
You must be signed in to change notification settings - Fork 6k
HiDream Image #11231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HiDream Image #11231
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@kebe7jun What is your |
thanks, this can work. |
) | ||
|
||
|
||
class HiDreamImageTransformer2DModel(ModelMixin, ConfigMixin, PeftAdapterMixin, FromOriginalModelMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FromOriginalModelMixin
shouldn't be needed here I think since the weights are diffusers format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, thanks, spotted a couple other things in 9d43a32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good 👍🏽 . Could we add fast tests for Pipeline and Model.
@bot /style |
Style fixes have been applied. View the workflow run here. |
_, seq_len, _ = prompt_embeds.shape | ||
|
||
# duplicate text embeddings and attention mask for each generation per prompt, using mps friendly method | ||
prompt_embeds = prompt_embeds.repeat(1, num_images_per_prompt, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just nits, but we discussed on doing this repeat/expand parts in encode_prompt
. Not blocker to merge to main atm, so feel free to take up in followup PR. Same comment for other similar repeats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree here, should just return a single prompt_embeds
src/diffusers/pipelines/hidream_image/pipeline_hidream_image.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
@a-r-r-o-w Removing |
I think the test slices differences is expected, no? It just changes the random initialization of the matrices, so if we're loading pretrained weights, it wouldn't cause perceptual difference |
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Hi @hlky, Thank you for your contribution and effort in integrating our We’d love to collaborate with you to refine this PR—whether by reviewing the implementation, adding missing components (e.g., docs, tests), or assisting with upstream merging. Let us know how you’d prefer to proceed (e.g., we can co-author this PR or build upon your work). Again, we appreciate your initiative! Looking forward to your thoughts. Best, |
@ShuyUSTC feel free to give the PR a review and help test it:) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the operation noise_pred = -noise_pred
in pipeline_hidream_image
to transformer_hidream_image
hidden_states = ff_output_i + hidden_states | ||
encoder_hidden_states = ff_output_t + encoder_hidden_states | ||
return hidden_states, encoder_hidden_states | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class NegateLayer(nn.Module): | |
def __init__(self): | |
super().__init__() | |
def forward(self, x): | |
return -x | |
Add a NegateLayer
to convert the input x
to -x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ShuyUSTC The comments about returning the negative value are not really in diffusers coding style for how we write the modeling code. So, we will be unable to add those changes here, and will have to keep the negation in the pipeline.
Feel free to let us know if there's anything else that you'd like us to change
caption_projection = [] | ||
for caption_channel in caption_channels: | ||
caption_projection.append(TextProjection(in_features=caption_channel, hidden_size=self.inner_dim)) | ||
self.caption_projection = nn.ModuleList(caption_projection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.caption_projection = nn.ModuleList(caption_projection) | |
self.caption_projection = nn.ModuleList(caption_projection) | |
self.negate_layer = NegateLayer() |
Initialize a negate_layer
|
||
hidden_states = hidden_states[:, :image_tokens_seq_len, ...] | ||
output = self.final_layer(hidden_states, adaln_input) | ||
output = self.unpatchify(output, img_sizes, self.training) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output = self.unpatchify(output, img_sizes, self.training) | |
output = self.unpatchify(output, img_sizes, self.training) | |
output = self.negate_layer(output) |
Convert the output
to -output
img_ids=img_ids, | ||
return_dict=False, | ||
)[0] | ||
noise_pred = -noise_pred |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noise_pred = -noise_pred |
Remove this operation and add a negate_layer
to HiDreamImageTransformer2DModel
to convert the output
NegateLayer
does not fit with coding style, there are other cases of -noise_pred
in the codebase.
Will this implementation work with NF4 and sequential offload? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yiyixuxu Addressed some of the review comments. LMK if any further changes are required (we can do follow up PRs too). I've only tested with the All models seem to be working but the outputs of "Fast" feel a bit off. It might be scheduler related - looking into itFull
model for now, and will do the other two tomorrow unless someone else can finish up the PR.
Also tested that num_images_per_prompt > 1
works with the changes to encode_prompt. PR LGTM to merge for first pass 👍
fyi, i've tested hidream-i1-fast and it works fine with this pr, but there is one issue...
|
@vladmandic Did you try with the latest changes? I encountered the issue too, but now I don't get it any more after the refactor |
just updated the codebase, i cant reproduce at the moment as well anymore - will run few more tests as it was pretty random to start with. |
Apparently, here are a couple of improvements: |
update: no issues with offloading using latest codebase. |
merged PR - we can add any follow-up changes in a new one |
Would this work for the NF4 Quantized 4-bit models? I had implemented already using this fork https://github.com/hykilpikonna/HiDream-I1-nf4 and these models azaneko/HiDream-I1-Dev-nf4 because it didn't run on less than 24gb otherwise. Transitioning to this implementation from the Github and just want to make sure I can keep the code mostly the same.. Wouldn't mind seeing example for memory optimized code that runs <=16GB... |
@Skquark on-the-fly quantization using bitsandbytes and/or optimum.quanto together with diffusers implementation works just fine, you dont need random unofficial fixed quants. with bnb-nf4, it works with 16gb vram and with quanto-int4 it works even with 12gb. |
@vladmandic It'd still be nice to have working example of quantization in the docs of this one since it takes more than 24gb and a bit more complicated. Do we just run BitsAndBytesConfig 4bit quant on the Transformer or the Tokenizer, and can we optimize Llama encoder with nf4 too? Could it also use group offloading? Thanks. |
@nitinmukesh Interesting, but not what I was expecting to load those models. That's similar to what I was originally doing, but he was saying it's better on-the-fly quantization instead of using the modded int4 models. Since there seems to be like 4 different ways to optimize this, I'll just have an Optimization Mode option of which to try in my app and figure it out from there. Any better ways? |
On-the fly is very time consuming. Each launch will quantize again. Also I have added GGUF version if you know how to use. (same topic) |
you just pass and and when you load individual components, you assemble the pipeline. |
What does this PR do?
Original code
Weights
Code
Output
NOTES
HiDream-ai/HiDream-I1-Full
is using existingUniPCMultistepScheduler
withprediction_type
anduse_flow_sigmas
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.