Fix Graph Breaks When Compiling CogView4 #10959

chengzeyi · 2025-03-04T10:46:31Z

Eliminate this:

t]V0304 10:24:23.421000 3131076 torch/_dynamo/guards.py:2813] [0/4] [__recompiles] Recompiling function forward in /home/zeyi/repos/diffusers/src/diffusers/models/transformers/transformer_cogview4.py:374
V0304 10:24:23.421000 3131076 torch/_dynamo/guards.py:2813] [0/4] [__recompiles]     triggered by the following guard failure(s):
V0304 10:24:23.421000 3131076 torch/_dynamo/guards.py:2813] [0/4] [__recompiles]     - 0/3: ___check_obj_id(L['self'].rope.freqs_h, 139976127328032)    
V0304 10:24:23.421000 3131076 torch/_dynamo/guards.py:2813] [0/4] [__recompiles]     - 0/2: ___check_obj_id(L['self'].rope.freqs_h, 139976107780960)    
V0304 10:24:23.421000 3131076 torch/_dynamo/guards.py:2813] [0/4] [__recompiles]     - 0/1: ___check_obj_id(L['self'].rope.freqs_h, 140022511848960)    
V0304 10:24:23.421000 3131076 torch/_dynamo/guards.py:2813] [0/4] [__recompiles]     - 0/0: ___check_obj_id(L['self'].rope.freqs_h, 140024081342416)

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Eliminate this: ``` t]V0304 10:24:23.421000 3131076 torch/_dynamo/guards.py:2813] [0/4] [__recompiles] Recompiling function forward in /home/zeyi/repos/diffusers/src/diffusers/models/transformers/transformer_cogview4.py:374 V0304 10:24:23.421000 3131076 torch/_dynamo/guards.py:2813] [0/4] [__recompiles] triggered by the following guard failure(s): V0304 10:24:23.421000 3131076 torch/_dynamo/guards.py:2813] [0/4] [__recompiles] - 0/3: ___check_obj_id(L['self'].rope.freqs_h, 139976127328032) V0304 10:24:23.421000 3131076 torch/_dynamo/guards.py:2813] [0/4] [__recompiles] - 0/2: ___check_obj_id(L['self'].rope.freqs_h, 139976107780960) V0304 10:24:23.421000 3131076 torch/_dynamo/guards.py:2813] [0/4] [__recompiles] - 0/1: ___check_obj_id(L['self'].rope.freqs_h, 140022511848960) V0304 10:24:23.421000 3131076 torch/_dynamo/guards.py:2813] [0/4] [__recompiles] - 0/0: ___check_obj_id(L['self'].rope.freqs_h, 140024081342416) ```

a-r-r-o-w · 2025-03-04T10:50:07Z

src/diffusers/models/transformers/transformer_cogview4.py

@@ -252,20 +252,18 @@ def __init__(self, dim: int, patch_size: int, rope_axes_dim: Tuple[int, int], th
        w_inv_freq = 1.0 / (theta ** (torch.arange(0, dim_w, 2, dtype=torch.float32)[: (dim_w // 2)].float() / dim_w))
        h_seq = torch.arange(self.rope_axes_dim[0])
        w_seq = torch.arange(self.rope_axes_dim[1])
-        self.freqs_h = torch.outer(h_seq, h_inv_freq)
-        self.freqs_w = torch.outer(w_seq, w_inv_freq)
+        self.freqs_h = torch.nn.Buffer(torch.outer(h_seq, h_inv_freq))


I think these need a persistent=False to not be part of state dict, no? Other than that, changes LGTM!

Also would maybe prefer using register_buffer here to be consistent with some other implementations

@a-r-r-o-w Yep, just fixed!

a-r-r-o-w · 2025-03-04T11:05:52Z

src/diffusers/models/transformers/transformer_cogview4.py

@@ -252,20 +252,18 @@ def __init__(self, dim: int, patch_size: int, rope_axes_dim: Tuple[int, int], th
        w_inv_freq = 1.0 / (theta ** (torch.arange(0, dim_w, 2, dtype=torch.float32)[: (dim_w // 2)].float() / dim_w))
        h_seq = torch.arange(self.rope_axes_dim[0])
        w_seq = torch.arange(self.rope_axes_dim[1])
-        self.freqs_h = torch.outer(h_seq, h_inv_freq)
-        self.freqs_w = torch.outer(w_seq, w_inv_freq)
+        self.freqs_h = self.register_buffer("freqs_h", torch.outer(h_seq, h_inv_freq), persistent=False)


Sorry about the back and forth but I just remembered that we did it this way so that freqs was always in float32. Using a buffer makes the tensor part of the module-modifiable parameters, so if someone were to load the model with torch_dtype=bfloat16 or do model.to(some_other_dtype), it would change the dtype of freqs. I believe that's problematic since RoPE must in fp32.

Do you think there's another way around this to avoid recompiling?

@a-r-r-o-w I see. I am thinking about a proper workaround.

Sorry about the back and forth but I just remembered that we did it this way so that freqs was always in float32. Using a buffer makes the tensor part of the module-modifiable parameters, so if someone were to load the model with torch_dtype=bfloat16 or do model.to(some_other_dtype), it would change the dtype of freqs. I believe that's problematic since RoPE must in fp32.

Do you think there's another way around this to avoid recompiling?

I have made these tensors for indexing generated on the fly during the inference. And doing so seems to even make the inference a bit faster.

interesting! thanks!

. And doing so seems to even make the inference a bit faster.

this is true for both compiled and non-compiled?

It goes against intuition for me tbh (so I will try to dive in and understand on weekend). Previously we were only computing the freqs once at initialization but now do it at each inference step, so my first thought is that is must be slower.

Since the computations are running on CPU here, including freqs-related matmul, it is understandable that it might not cause additional slowdown compared to before, since CPU is scheduling instructions much faster than GPU is processing them.

But it is followed by indexing a CPU tensor with GPU tensor, so it should introduce a CPU sync here I think. Atleast the same/similar thing was happening with our schedulers. CPU sync would drastically slow down inference but seems like it's not the case... which is interesting...

@yiyixuxu @a-r-r-o-w I think both generating cpu tensors or cuda tensors on the fly are faster than reading from the saved one at least when compiling.🧐

yiyixuxu

great! thanks @chengzeyi !

yiyixuxu · 2025-03-06T20:26:06Z

@bot /style

github-actions · 2025-03-06T20:26:54Z

Style fixes have been applied. View the workflow run here.

HuggingFaceDocBuilderDev · 2025-03-06T20:30:56Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

a-r-r-o-w reviewed Mar 4, 2025

View reviewed changes

Update transformer_cogview4.py

9080bd6

a-r-r-o-w reviewed Mar 4, 2025

View reviewed changes

fix cogview4 rotary pos embed

74b591a

yiyixuxu approved these changes Mar 6, 2025

View reviewed changes

Apply style fixes

7d18503

Merge branch 'main' into patch-1

732356a

yiyixuxu merged commit 6a0137e into huggingface:main Mar 7, 2025
11 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Graph Breaks When Compiling CogView4 #10959

Fix Graph Breaks When Compiling CogView4 #10959

Uh oh!

chengzeyi commented Mar 4, 2025

Uh oh!

a-r-r-o-w Mar 4, 2025

Uh oh!

a-r-r-o-w Mar 4, 2025

Uh oh!

chengzeyi Mar 4, 2025

Uh oh!

a-r-r-o-w Mar 4, 2025

Uh oh!

chengzeyi Mar 4, 2025

Uh oh!

chengzeyi Mar 6, 2025

Uh oh!

yiyixuxu Mar 6, 2025

Uh oh!

yiyixuxu Mar 6, 2025

Uh oh!

a-r-r-o-w Mar 6, 2025

Uh oh!

chengzeyi Mar 7, 2025

Uh oh!

yiyixuxu left a comment

Uh oh!

yiyixuxu commented Mar 6, 2025

Uh oh!

github-actions bot commented Mar 6, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Mar 6, 2025

Uh oh!

Uh oh!

Uh oh!

Fix Graph Breaks When Compiling CogView4 #10959

Fix Graph Breaks When Compiling CogView4 #10959

Uh oh!

Conversation

chengzeyi commented Mar 4, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

yiyixuxu commented Mar 6, 2025

Uh oh!

github-actions bot commented Mar 6, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Mar 6, 2025

Uh oh!

Uh oh!

Uh oh!