Skip to content

enable unidiffuser test cases on xpu #11444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 30, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions tests/pipelines/test_pipelines_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -1485,8 +1485,8 @@ def test_to_device(self):
model_devices = [component.device.type for component in components.values() if hasattr(component, "device")]
self.assertTrue(all(device == torch_device for device in model_devices))

output_cuda = pipe(**self.get_dummy_inputs(torch_device))[0]
self.assertTrue(np.isnan(to_np(output_cuda)).sum() == 0)
output_device = pipe(**self.get_dummy_inputs(torch_device))[0]
self.assertTrue(np.isnan(to_np(output_device)).sum() == 0)

def test_to_dtype(self):
components = self.get_dummy_components()
Expand Down Expand Up @@ -1677,11 +1677,11 @@ def test_cpu_offload_forward_pass_twice(self, expected_max_diff=2e-4):

pipe.set_progress_bar_config(disable=None)

pipe.enable_model_cpu_offload(device=torch_device)
pipe.enable_model_cpu_offload()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was going to merge but just realized this was removed. Change looks good, but is there a particular reason for this to be removed for XPU case? The enable_model_cpu_offload method correctly gets passed torch_device="xpu" here, no?

Copy link
Contributor Author

@yao-matrix yao-matrix Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it just because this is introduced in this PR #9399, it puts explicit torch_device to enable_model_cpu_offload to try to make it work on device other than CUDA(which is XPU in that case). But it doesn't work because diffusers internal logic also call enable_model_cpu_offload which cannot fixed by only setting device in application code, I fixed that issue in this PR #11288, and the PR was merged.

So, with latest code, we don't need application/test code to explicitly set the device, so I changed such changes in this test back to original. And another motivation I changed back is: I can see this usage is the recommended way in diffusers docs, so to align w/ the recommendation and test the most common used pattern.

To be short, in current codebase, both(with torch_device and without torch_device) work, I just changed back the way in original test code.

Hope it explains. thx.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. I've triggered the tests and will merge once they pass

inputs = self.get_dummy_inputs(generator_device)
output_with_offload = pipe(**inputs)[0]

pipe.enable_model_cpu_offload(device=torch_device)
pipe.enable_model_cpu_offload()
inputs = self.get_dummy_inputs(generator_device)
output_with_offload_twice = pipe(**inputs)[0]

Expand Down Expand Up @@ -2226,7 +2226,7 @@ def create_pipe():

def enable_group_offload_on_component(pipe, group_offloading_kwargs):
# We intentionally don't test VAE's here. This is because some tests enable tiling on the VAE. If
# tiling is enabled and a forward pass is run, when cuda streams are used, the execution order of
# tiling is enabled and a forward pass is run, when accelerator streams are used, the execution order of
# the layers is not traced correctly. This causes errors. For apply group offloading to VAE, a
# warmup forward pass (even with dummy small inputs) is recommended.
for component_name in [
Expand Down
16 changes: 8 additions & 8 deletions tests/pipelines/unidiffuser/test_unidiffuser.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,13 @@
UniDiffuserTextDecoder,
)
from diffusers.utils.testing_utils import (
backend_empty_cache,
enable_full_determinism,
floats_tensor,
load_image,
nightly,
require_torch_2,
require_torch_accelerator,
require_torch_gpu,
run_test_in_subprocess,
torch_device,
)
Expand Down Expand Up @@ -577,24 +577,24 @@ def test_unidiffuser_default_img2text_v1_fp16(self):
assert text[0][: len(expected_text_prefix)] == expected_text_prefix

@unittest.skip(
"Test not supported becauseit has a bunch of direct configs at init and also, this pipeline isn't used that much now."
"Test not supported because it has a bunch of direct configs at init and also, this pipeline isn't used that much now."
)
def test_encode_prompt_works_in_isolation():
pass


@nightly
@require_torch_gpu
@require_torch_accelerator
class UniDiffuserPipelineSlowTests(unittest.TestCase):
def setUp(self):
super().setUp()
gc.collect()
torch.cuda.empty_cache()
backend_empty_cache(torch_device)

def tearDown(self):
super().tearDown()
gc.collect()
torch.cuda.empty_cache()
backend_empty_cache(torch_device)

def get_inputs(self, device, seed=0, generate_latents=False):
generator = torch.manual_seed(seed)
Expand Down Expand Up @@ -705,17 +705,17 @@ def test_unidiffuser_compile(self, seed=0):


@nightly
@require_torch_gpu
@require_torch_accelerator
class UniDiffuserPipelineNightlyTests(unittest.TestCase):
def setUp(self):
super().setUp()
gc.collect()
torch.cuda.empty_cache()
backend_empty_cache(torch_device)

def tearDown(self):
super().tearDown()
gc.collect()
torch.cuda.empty_cache()
backend_empty_cache(torch_device)

def get_inputs(self, device, seed=0, generate_latents=False):
generator = torch.manual_seed(seed)
Expand Down
Loading