Update ptxla training #9864

entrpn · 2024-11-04T22:02:05Z

Updates TPU benchmark numbers.
Updates the ptxla training example code.
Adds flash attention to ptxla code running on TPUs.

@sayakpaul can you please review. This new PR supersedes the other one I had opened a while back, which I just closed. Thank you.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

sayakpaul · 2024-11-05T10:53:42Z

Cc: @yiyixuxu could you review the changes made to attention_processor.py?

yiyixuxu · 2024-11-05T20:52:29Z

@entrpn can you use a custom attention instead? (without updating our default attention processor)

zpcore · 2024-11-05T21:02:02Z

@entrpn can you use a custom attention instead? (without updating our default attention processor)

Hi @yiyixuxu , we wrapped the flash attention kernel call under condition if XLA_AVAILABLE. This shouldn't touch the default attention processor behavior. Can you give more details about use a custom attention? Thanks

yiyixuxu · 2024-11-05T21:11:33Z

I'm just wondering if it makes sense for Flash Attention to have its attention processor since this one is meant for SDPA

cc @DN6 here too

entrpn · 2024-11-05T22:11:55Z

@yiyixuxu this makes sense.

@zpcore do you think you can implement it?

zpcore · 2024-11-05T22:14:03Z

@yiyixuxu this makes sense.

@zpcore do you think you can implement it?

Yes, I can follow up with the code change.

zpcore · 2024-11-05T22:56:39Z

Hi @yiyixuxu , what about we create another AttnProcess with flash attention in parallel with AttnProcessor2_0? My concern is that majority of the code will be the same as AttnProcessor2_0.

yiyixuxu · 2024-11-06T00:24:20Z

@zpcore
that should not be a problem. a lot of our attention processors share majority of same code, e.g. https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py#L732 and https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py#L2443

this way user can explicitly set to use flash attention if they want to

miladm · 2024-11-06T01:16:09Z

@yiyixuxu - to better understand, can you please help me understand why wrapping the flash attention kernel call under condition if XLA_AVAILABLE causes a trouble? Do you want this functionality to be more generalized?

yiyixuxu · 2024-11-06T02:26:18Z

is it not possible that XLA_AVAILABLE but the user does not want to use flash attention?
our attention processors are designed to be very easy to switch & each one corresponding to a very specific method -> could be xformer, SDPA, or even like special method like fused has its own processor

sayakpaul · 2024-11-28T10:59:25Z

@miladm @zpcore a gentle ping

zpcore · 2024-11-28T23:17:43Z

Thanks for the review feedback. We split out the XLA flash attention process from AttnProcessor2_0 as requested in the review. PTAL

sayakpaul

Thanks for working on this and for being patient with our feedback.

I left a few minor comments.

The other reviewer, @yiyixuxu will review this soon. Please allow for some time because of the thanksgiving week.

sayakpaul · 2024-11-29T02:03:44Z

src/diffusers/models/attention_processor.py

+        if len(args) > 0 or kwargs.get("scale", None) is not None:
+            deprecation_message = "The `scale` argument is deprecated and will be ignored. Please remove it, as passing it will raise an error in the future. `scale` should directly be passed while calling the underlying pipeline component i.e., via `cross_attention_kwargs`."
+            deprecate("scale", "1.0.0", deprecation_message)


Since this is a new attention processor, I think we can safely remove this.

sayakpaul · 2024-11-29T02:04:34Z

src/diffusers/models/attention_processor.py

@@ -2750,6 +2763,117 @@ def __call__(
        return hidden_states


+class XLAFlashAttnProcessor2_0:


So, this will be automatically used when using the compatible models under an XLA environment, right?

Yes, AttnProcessor2_0 will be replaced with XLAFlashAttnProcessor2_0 if XLA version condition satisfied.

sayakpaul · 2024-11-29T02:05:10Z

src/diffusers/models/attention_processor.py

+if is_torch_xla_available():
+    from torch_xla.experimental.custom_kernel import flash_attention


Does this need to go through any version check guards too i.e., a minimum version known to have flash_attention?

Introduced the version check function is_torch_xla_version in import_utils.py. Added the version check for torch_xla here.

sayakpaul · 2024-11-29T02:05:29Z

src/diffusers/models/attention_processor.py

-                AttnProcessor2_0() if hasattr(F, "scaled_dot_product_attention") and self.scale_qk else AttnProcessor()
-            )
+            if hasattr(F, "scaled_dot_product_attention") and self.scale_qk:
+                if is_torch_xla_available:


Same here too. Does this need to be guarded with a version check too?

Added the version check for torch_xla here too.

HuggingFaceDocBuilderDev · 2024-11-29T02:10:15Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

DN6 · 2024-12-02T06:22:09Z

I think @yiyixuxu's point here is valid:

is it not possible that XLA_AVAILABLE but the user does not want to use flash attention?

IMO it's better to use a similar API to xformers to enable the XLA processor.

diffusers/src/diffusers/models/modeling_utils.py

Line 228 in 784b351

    
           def enable_xformers_memory_efficient_attention(self, attention_op: Optional[Callable] = None) -> None:

zpcore · 2024-12-03T22:31:49Z

I think @yiyixuxu's point here is valid:

is it not possible that XLA_AVAILABLE but the user does not want to use flash attention?

IMO it's better to use a similar API to xformers to enable the XLA processor.

diffusers/src/diffusers/models/modeling_utils.py

Line 228 in 784b351

def enable_xformers_memory_efficient_attention(self, attention_op: Optional[Callable] = None) -> None:

OK, now I get it! We have added functions like enable_xla_flash_attention similar to the enable_xformers_memory_efficient_attention to give user the option to enable xla flash attention or not. In the example (train_text_to_image_xla.py) we give, we apply the kernel to the diffusion model unet. Thanks!

yiyixuxu

thanks! l think we can merge this soon!

yiyixuxu · 2024-12-05T00:58:59Z

src/diffusers/models/attention_processor.py

+        if (
+            use_xla_flash_attention
+            and is_torch_xla_available
+            and is_torch_xla_version('>', '2.2')
+            and (not is_spmd() or is_torch_xla_version('>', '2.3'))
+        ):
+            processor = XLAFlashAttnProcessor2_0(partition_spec)


Suggested change

if (

use_xla_flash_attention

and is_torch_xla_available

and is_torch_xla_version('>', '2.2')

and (not is_spmd() or is_torch_xla_version('>', '2.3'))

):

processor = XLAFlashAttnProcessor2_0(partition_spec)

if use_xla_flash_attention:

if is_torch_xla_version("<", "2.3"):

raise ...

elif is_spmd() and is_torch_xla_version("<", "2.4"):

raise ...

else:

processor = XLAFlashAttnProcessor2_0(partition_spec)

):

processor = XLAFlashAttnProcessor2_0(partition_spec)

if user explicitly set xla_flash_attention, we want to give very explicit warn/error message when the condition wasn't met so they can take actions accordingly - we don't want to silently switch to something just because it wasn't installed

Sounds good, updated!

yiyixuxu · 2024-12-05T01:05:07Z

src/diffusers/models/attention_processor.py

+            partition_spec = self.partition_spec if is_spmd() else None
+            hidden_states = flash_attention(query, key, value, causal=False, partition_spec=partition_spec)
+        else:
+            hidden_states = F.scaled_dot_product_attention(


we don't need to support SDPA in this XLAFlash attention processor! - we can remove all the logics related to it!

There is a constraint when using the pallas kernel. We need this all(tensor.shape[2] >= 4096 for tensor in [query, key, value]) or xla will error out.

However, we added a new error message when it fall back to scaled_dot_product_attention to avoid silently skip the kernel.

ok thank you for explaining to me!

yiyixuxu · 2024-12-05T01:06:11Z

src/diffusers/models/attention_processor.py

+        if not hasattr(F, "scaled_dot_product_attention"):
+            raise ImportError("XLAFlashAttnProcessor2_0 requires PyTorch 2.0, to use it, please upgrade PyTorch to 2.0.")


Suggested change

if not hasattr(F, "scaled_dot_product_attention"):

raise ImportError("XLAFlashAttnProcessor2_0 requires PyTorch 2.0, to use it, please upgrade PyTorch to 2.0.")

I think we don't need to support SDPA in XLA Flash attention processor! let's remove all the logics related to that to simplify things a bit!

Please check my comment above for why we still keep it here. Thanks

yiyixuxu · 2024-12-05T01:17:18Z

src/diffusers/utils/import_utils.py

@@ -700,6 +700,19 @@ def is_torch_version(operation: str, version: str):
    return compare_versions(parse(_torch_version), operation, version)


+def is_torch_xla_version(operation: str, version: str):


can we make sure we can call is_torch_xla_version() when it is not installed? currently, I think you will have to run it together with is_torch_xla_available(), because the _torch_xla_version is not defined otherwise

we can do like this

diffusers/src/diffusers/utils/import_utils.py

Line 55 in 243d9a4

_torch_version = "N/A"

Nice catch, updated.

yiyixuxu

thank you!

entrpn · 2024-12-09T18:49:53Z

Thank you all!

* update ptxla example --------- Co-authored-by: Juan Acevedo <jfacevedo@google.com> Co-authored-by: Pei Zhang <zpcore@gmail.com> Co-authored-by: Pei Zhang <piz@google.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Pei Zhang <pei@Peis-MacBook-Pro.local> Co-authored-by: hlky <hlky@hlky.ac>

jfacevedo-google added 5 commits November 1, 2024 00:25

update ptxla example

f04ee1d

update ptxla example based on Pei's comments.

96af06e

add print loss cli argument. Run make style and quality.

6234a37

make measure_start_step an argument.

b516134

use PORT variable across the script.

8c47f35

Merge branch 'main' into update_ptxla_training

10b6ba1

split out xla flash attention from base AttnProcessor

8b3cbb1

Merge branch 'main' into update_ptxla_training

6c74c79

sayakpaul requested a review from yiyixuxu November 29, 2024 02:02

sayakpaul reviewed Nov 29, 2024

View reviewed changes

zpcore and others added 2 commits November 29, 2024 00:23

use version check for torch_xla

2c00cbd

Merge branch 'main' into update_ptxla_training

5969ce4

zpcore and others added 2 commits December 3, 2024 14:27

setup the option to use xla flash attention or not

fb29e37

Merge branch 'main' into update_ptxla_training

46baa56

naming nit

df31c9d

Merge branch 'main' into update_ptxla_training

f1ade2e

sayakpaul requested a review from DN6 December 4, 2024 01:27

format fix with ruff cmd

ff332e6

yiyixuxu reviewed Dec 5, 2024

View reviewed changes

yiyixuxu added the close-to-merge label Dec 5, 2024

zpcore and others added 5 commits December 5, 2024 21:50

adding warning message

dbe4725

format fix with ruff cmd

c012faf

Merge branch 'main' into update_ptxla_training

0225ed7

make style

03089f5

Merge branch 'main' into update_ptxla_training

719fcf9

yiyixuxu approved these changes Dec 6, 2024

View reviewed changes

yiyixuxu merged commit 3cb7b86 into huggingface:main Dec 6, 2024
15 checks passed

yiyixuxu removed the close-to-merge label Dec 6, 2024

entrpn mentioned this pull request Jan 10, 2025

implementing flux on TPUs with ptxla #10515

Merged

6 tasks

		@@ -2750,6 +2763,117 @@ def __call__(
		return hidden_states


		class XLAFlashAttnProcessor2_0:

		if is_torch_xla_available():
		from torch_xla.experimental.custom_kernel import flash_attention

		if not hasattr(F, "scaled_dot_product_attention"):
		raise ImportError("XLAFlashAttnProcessor2_0 requires PyTorch 2.0, to use it, please upgrade PyTorch to 2.0.")

		@@ -700,6 +700,19 @@ def is_torch_version(operation: str, version: str):
		return compare_versions(parse(_torch_version), operation, version)


		def is_torch_xla_version(operation: str, version: str):

Update ptxla training #9864

Update ptxla training #9864

Uh oh!

Conversation

entrpn commented Nov 4, 2024

Before submitting

Who can review?

Uh oh!

sayakpaul commented Nov 5, 2024

Uh oh!

yiyixuxu commented Nov 5, 2024

Uh oh!

zpcore commented Nov 5, 2024

Uh oh!

yiyixuxu commented Nov 5, 2024

Uh oh!

entrpn commented Nov 5, 2024

Uh oh!

zpcore commented Nov 5, 2024

Uh oh!

zpcore commented Nov 5, 2024

Uh oh!

yiyixuxu commented Nov 6, 2024

Uh oh!

miladm commented Nov 6, 2024

Uh oh!

yiyixuxu commented Nov 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayakpaul commented Nov 28, 2024

Uh oh!

zpcore commented Nov 28, 2024

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Nov 29, 2024

Uh oh!

DN6 commented Dec 2, 2024

Uh oh!

zpcore commented Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yiyixuxu commented Nov 6, 2024 •

edited

Loading

zpcore commented Dec 3, 2024 •

edited

Loading