HiDream Image #11231

hlky · 2025-04-08T12:33:53Z

What does this PR do?

Code

import torch
from transformers import PreTrainedTokenizerFast, LlamaForCausalLM
from diffusers import (
    UniPCMultistepScheduler,
    HiDreamImagePipeline,
    HiDreamImageTransformer2DModel,
)

scheduler = UniPCMultistepScheduler(
    flow_shift=3.0,
    prediction_type="flow_prediction",
    use_flow_sigmas=True,
)

tokenizer_4 = PreTrainedTokenizerFast.from_pretrained(
    "meta-llama/Meta-Llama-3.1-8B-Instruct"
)

text_encoder_4 = LlamaForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3.1-8B-Instruct",
    output_hidden_states=True,
    output_attentions=True,
    torch_dtype=torch.bfloat16,
)

transformer = HiDreamImageTransformer2DModel.from_pretrained(
    "HiDream-ai/HiDream-I1-Full", subfolder="transformer", torch_dtype=torch.bfloat16
)

pipe = HiDreamImagePipeline.from_pretrained(
    "HiDream-ai/HiDream-I1-Full",
    scheduler=scheduler,
    tokenizer_4=tokenizer_4,
    text_encoder_4=text_encoder_4,
    transformer=transformer,
    torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()

image = pipe(
    'A cat holding a sign that says "Hi-Dreams.ai".',
    height=1024,
    width=1024,
    guidance_scale=5.0,
    num_inference_steps=50,
    generator=torch.Generator("cuda").manual_seed(0),
).images[0]

image.save("hidream.png")

Output

NOTES

Scheduler changes are not necessarily required, above test of HiDream-ai/HiDream-I1-Full is using existing UniPCMultistepScheduler with prediction_type and use_flow_sigmas

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2025-04-08T12:40:30Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

hlky · 2025-04-09T10:19:21Z

@kebe7jun What is your transformers version? Can you do pip install -U transformers and try with latest?

kebe7jun · 2025-04-09T10:41:15Z

@kebe7jun What is your transformers version? Can you do pip install -U transformers and try with latest?

thanks, this can work.

DN6 · 2025-04-10T14:00:30Z

src/diffusers/models/transformers/transformer_hidream_image.py

+        )
+
+
+class HiDreamImageTransformer2DModel(ModelMixin, ConfigMixin, PeftAdapterMixin, FromOriginalModelMixin):


FromOriginalModelMixin shouldn't be needed here I think since the weights are diffusers format?

Nice catch, thanks, spotted a couple other things in 9d43a32

DN6

Looking good 👍🏽 . Could we add fast tests for Pipeline and Model.

hlky · 2025-04-10T14:29:33Z

@bot /style

github-actions · 2025-04-10T14:30:29Z

Style fixes have been applied. View the workflow run here.

a-r-r-o-w · 2025-04-10T14:35:59Z

src/diffusers/pipelines/hidream_image/pipeline_hidream_image.py

+        _, seq_len, _ = prompt_embeds.shape
+
+        # duplicate text embeddings and attention mask for each generation per prompt, using mps friendly method
+        prompt_embeds = prompt_embeds.repeat(1, num_images_per_prompt, 1)


Just nits, but we discussed on doing this repeat/expand parts in encode_prompt. Not blocker to merge to main atm, so feel free to take up in followup PR. Same comment for other similar repeats

agree here, should just return a single prompt_embeds

src/diffusers/models/transformers/transformer_hidream_image.py

src/diffusers/pipelines/hidream_image/pipeline_hidream_image.py

src/diffusers/models/transformers/transformer_hidream_image.py

Co-authored-by: Aryan <contact.aryanvs@gmail.com>

hlky · 2025-04-10T16:01:25Z

@a-r-r-o-w Removing _init_weights changes the slice in local tests, need to test if real output is perceptually different, there are a few other cases where we have these functions.

a-r-r-o-w · 2025-04-10T16:04:26Z

I think the test slices differences is expected, no? It just changes the random initialization of the matrices, so if we're loading pretrained weights, it wouldn't cause perceptual difference

src/diffusers/models/transformers/transformer_hidream_image.py

Co-authored-by: Aryan <contact.aryanvs@gmail.com>

ShuyUSTC · 2025-04-11T03:33:02Z

Hi @hlky,

Thank you for your contribution and effort in integrating our HiDream-I1 into the diffusers library! We’re the official team behind this model (HiDream-I1), and we’re currently working on its official integration.

We’d love to collaborate with you to refine this PR—whether by reviewing the implementation, adding missing components (e.g., docs, tests), or assisting with upstream merging. Let us know how you’d prefer to proceed (e.g., we can co-author this PR or build upon your work).

Again, we appreciate your initiative! Looking forward to your thoughts.

Best,
HiDream.ai

yiyixuxu · 2025-04-11T06:46:50Z

@ShuyUSTC
thanks for the message. and congrats on such great work!

feel free to give the PR a review and help test it:)

ShuyUSTC

Move the operation noise_pred = -noise_pred in pipeline_hidream_image to transformer_hidream_image

ShuyUSTC · 2025-04-11T08:28:04Z

src/diffusers/models/transformers/transformer_hidream_image.py

+        hidden_states = ff_output_i + hidden_states
+        encoder_hidden_states = ff_output_t + encoder_hidden_states
+        return hidden_states, encoder_hidden_states
+


Suggested change

class NegateLayer(nn.Module):

def __init__(self):

super().__init__()

def forward(self, x):

return -x

Add a NegateLayer to convert the input x to -x

@ShuyUSTC The comments about returning the negative value are not really in diffusers coding style for how we write the modeling code. So, we will be unable to add those changes here, and will have to keep the negation in the pipeline.

Feel free to let us know if there's anything else that you'd like us to change

ShuyUSTC · 2025-04-11T08:30:57Z

src/diffusers/models/transformers/transformer_hidream_image.py

+        caption_projection = []
+        for caption_channel in caption_channels:
+            caption_projection.append(TextProjection(in_features=caption_channel, hidden_size=self.inner_dim))
+        self.caption_projection = nn.ModuleList(caption_projection)


Suggested change

self.caption_projection = nn.ModuleList(caption_projection)

self.caption_projection = nn.ModuleList(caption_projection)

self.negate_layer = NegateLayer()

Initialize a negate_layer

ShuyUSTC · 2025-04-11T08:32:02Z

src/diffusers/models/transformers/transformer_hidream_image.py

+
+        hidden_states = hidden_states[:, :image_tokens_seq_len, ...]
+        output = self.final_layer(hidden_states, adaln_input)
+        output = self.unpatchify(output, img_sizes, self.training)


Suggested change

output = self.unpatchify(output, img_sizes, self.training)

output = self.unpatchify(output, img_sizes, self.training)

output = self.negate_layer(output)

Convert the output to -output

ShuyUSTC · 2025-04-11T08:34:23Z

src/diffusers/pipelines/hidream_image/pipeline_hidream_image.py

+                    img_ids=img_ids,
+                    return_dict=False,
+                )[0]
+                noise_pred = -noise_pred


Suggested change

noise_pred = -noise_pred

Remove this operation and add a negate_layer to HiDreamImageTransformer2DModel to convert the output

NegateLayer does not fit with coding style, there are other cases of -noise_pred in the codebase.

nitinmukesh · 2025-04-11T09:33:14Z

Will this implementation work with NF4 and sequential offload?
https://huggingface.co/azaneko/HiDream-I1-Full-nf4
https://huggingface.co/azaneko/HiDream-I1-Fast-nf4
https://huggingface.co/azaneko/HiDream-I1-Dev-nf4

a-r-r-o-w

@yiyixuxu Addressed some of the review comments. LMK if any further changes are required (we can do follow up PRs too). ~~I've only tested with the Full model for now, and will do the other two tomorrow unless someone else can finish up the PR.~~ All models seem to be working but the outputs of "Fast" feel a bit off. It might be scheduler related - looking into it

Also tested that num_images_per_prompt > 1 works with the changes to encode_prompt. PR LGTM to merge for first pass 👍

vladmandic · 2025-04-11T14:57:47Z

fyi, i've tested hidream-i1-fast and it works fine with this pr, but there is one issue...
using low_cpu_mem_usage=True sometimes/often breaks offloading as model fails with:

ValueError: weight is on the meta device, we need a value to put in on 0.

a-r-r-o-w · 2025-04-11T14:59:22Z

@vladmandic Did you try with the latest changes? I encountered the issue too, but now I don't get it any more after the refactor

vladmandic · 2025-04-11T15:01:07Z

just updated the codebase, i cant reproduce at the moment as well anymore - will run few more tests as it was pretty random to start with.

tin2tin · 2025-04-11T15:28:33Z

Apparently, here are a couple of improvements:
https://github.com/lum3on/comfyui_HiDream-Sampler

vladmandic · 2025-04-11T16:31:26Z

update: no issues with offloading using latest codebase.
also works fine with both bnb and optimum.quanto quantization_config.
regarding llama replacement, yes, thats totally ok, but imo that's not really up to diffusers to provide other than one-liner "here is how you load te4" which is already in docs.

yiyixuxu · 2025-04-11T16:32:05Z

merged PR - we can add any follow-up changes in a new one

Skquark · 2025-04-11T23:42:03Z

Would this work for the NF4 Quantized 4-bit models? I had implemented already using this fork https://github.com/hykilpikonna/HiDream-I1-nf4 and these models azaneko/HiDream-I1-Dev-nf4 because it didn't run on less than 24gb otherwise. Transitioning to this implementation from the Github and just want to make sure I can keep the code mostly the same.. Wouldn't mind seeing example for memory optimized code that runs <=16GB...

vladmandic · 2025-04-12T03:29:10Z

@Skquark on-the-fly quantization using bitsandbytes and/or optimum.quanto together with diffusers implementation works just fine, you dont need random unofficial fixed quants. with bnb-nf4, it works with 16gb vram and with quanto-int4 it works even with 12gb.

Skquark · 2025-04-16T07:45:24Z

@vladmandic It'd still be nice to have working example of quantization in the docs of this one since it takes more than 24gb and a bit more complicated. Do we just run BitsAndBytesConfig 4bit quant on the Transformer or the Tokenizer, and can we optimize Llama encoder with nf4 too? Could it also use group offloading? Thanks.

nitinmukesh · 2025-04-16T08:24:35Z

@Skquark

See if this helps
#11337

Skquark · 2025-04-16T08:36:18Z

@nitinmukesh Interesting, but not what I was expecting to load those models. That's similar to what I was originally doing, but he was saying it's better on-the-fly quantization instead of using the modded int4 models. Since there seems to be like 4 different ways to optimize this, I'll just have an Optimization Mode option of which to try in my app and figure it out from there. Any better ways?

nitinmukesh · 2025-04-16T08:45:23Z

On-the fly is very time consuming. Each launch will quantize again.
You can create your own repo using whatever settings you prefer and then save_pretrained . Put on HF (locally also works) and use.

Also I have added GGUF version if you know how to use. (same topic)

vladmandic · 2025-04-16T13:16:46Z

@vladmandic It'd still be nice to have working example of quantization in the docs of this one since it takes more than 24gb and a bit more complicated. Do we just run BitsAndBytesConfig 4bit quant on the Transformer or the Tokenizer, and can we optimize Llama encoder with nf4 too? Could it also use group offloading? Thanks.

you just pass quantization_config when loading transformer, text_encoder_3 (t5), text_encoder_4 (llama)
you can quantize any of them or all 3.
(not tokenizer and not te1/te2)

and quantization_config can be any valid bitsandbytes or optimium.quanto config.
it should also work with torchao and layerwise methods, but i didnt test those.
you can also mix&match, e.g. you can run transformer in nf4 and te4 in fp8

and when you load individual components, you assemble the pipeline.

HiDream Image

642203e

linoytsaban mentioned this pull request Apr 10, 2025

[LoRA] add LoRA support to HiDream and fine-tuning script #11281

Merged

hlky added 6 commits April 10, 2025 13:31

update

3b5e03b

-einops

a40c95f

Merge branch 'main' into hidream

eb798ed

py3.8

a90372e

Merge branch 'hidream' of https://github.com/hlky/diffusers into hidream

f94d68e

fix -einops

b8aa38d

hlky marked this pull request as ready for review April 10, 2025 13:31

DN6 reviewed Apr 10, 2025

View reviewed changes

mixins, offload_seq, option_components

9d43a32

DN6 reviewed Apr 10, 2025

View reviewed changes

docs

e1766a1

Apply style fixes

8fbc630

trigger tests

b6b9b45

a-r-r-o-w reviewed Apr 10, 2025

View reviewed changes

hlky and others added 6 commits April 10, 2025 16:05

Apply suggestions from code review

8dd065b

Co-authored-by: Aryan <contact.aryanvs@gmail.com>

joint_attention_kwargs -> attention_kwargs, fixes

8b2670d

fast tests

f2aa727

-_init_weights

8e328f3

style tests

7c4eced

move reshape logic

07c670e

update slice 😴

efc44ea

a-r-r-o-w reviewed Apr 11, 2025

View reviewed changes

src/diffusers/models/transformers/transformer_hidream_image.py Show resolved Hide resolved

Update src/diffusers/models/transformers/transformer_hidream_image.py

9eb0b8b

Co-authored-by: Aryan <contact.aryanvs@gmail.com>

ShuyUSTC previously requested changes Apr 11, 2025

View reviewed changes

nitinmukesh mentioned this pull request Apr 11, 2025

[Feature] Add support for HiDream-I1 mit-han-lab/nunchaku#267

Open

2 tasks

a-r-r-o-w mentioned this pull request Apr 11, 2025

Add support for HiDream a-r-r-o-w/finetrainers#366

Open

3 tasks

a-r-r-o-w added 4 commits April 11, 2025 16:26

address review comments

32af5ce

update tests

3ec1896

doc updates

2d65aa2

Merge branch 'main' into hidream

72c9667

a-r-r-o-w reviewed Apr 11, 2025

View reviewed changes

yiyixuxu approved these changes Apr 11, 2025

View reviewed changes

yiyixuxu merged commit 0ef2935 into huggingface:main Apr 11, 2025
12 checks passed

asomoza mentioned this pull request Apr 24, 2025

Add StableDiffusion3InstructPix2PixPipeline #11378

Merged

		)


		class HiDreamImageTransformer2DModel(ModelMixin, ConfigMixin, PeftAdapterMixin, FromOriginalModelMixin):

+class NegateLayer(nn.Module):
+    def __init__(self):
+        super().__init__()
+    def forward(self, x):
+        return -x

	self.caption_projection = nn.ModuleList(caption_projection)
	self.caption_projection = nn.ModuleList(caption_projection)
	self.negate_layer = NegateLayer()

	output = self.unpatchify(output, img_sizes, self.training)
	output = self.unpatchify(output, img_sizes, self.training)
	output = self.negate_layer(output)

HiDream Image #11231

HiDream Image #11231

Uh oh!

Conversation

hlky commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Apr 8, 2025

Uh oh!

hlky commented Apr 9, 2025

Uh oh!

kebe7jun commented Apr 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DN6 left a comment

Choose a reason for hiding this comment

Uh oh!

hlky commented Apr 10, 2025

Uh oh!

github-actions bot commented Apr 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hlky commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a-r-r-o-w commented Apr 10, 2025

Uh oh!

Uh oh!

ShuyUSTC commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiyixuxu commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShuyUSTC left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nitinmukesh commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a-r-r-o-w left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vladmandic commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a-r-r-o-w commented Apr 11, 2025

Uh oh!

vladmandic commented Apr 11, 2025

Uh oh!

tin2tin commented Apr 11, 2025

hlky commented Apr 8, 2025 •

edited

Loading

hlky commented Apr 10, 2025 •

edited

Loading

ShuyUSTC commented Apr 11, 2025 •

edited

Loading

yiyixuxu commented Apr 11, 2025 •

edited

Loading

ShuyUSTC left a comment •

edited

Loading

nitinmukesh commented Apr 11, 2025 •

edited

Loading

a-r-r-o-w left a comment •

edited

Loading

vladmandic commented Apr 11, 2025 •

edited

Loading

Skquark commented Apr 16, 2025 •

edited

Loading

nitinmukesh commented Apr 16, 2025 •

edited

Loading

vladmandic commented Apr 16, 2025 •

edited

Loading