Description
Describe the bug
unidiffusers cpu_offload failed with the log in Reproduction column.
I took a deeper look, it seems that in this case, self.text_decoder.encode will be called after text_encoder
and before image_encoder
. The thing is this is just a submodule of text_decoder
and not in model_cpu_offload_seq
, so didn't register hook while enable_model_cpu_offload
. It became an orphan. I don't have a good idea to fix it, since it's an embeded submodule in a sub-model and whether to trigger it is a runtime decision based on reduce_text_emb_dim
. But I'm willing to contribute to the fix of it.
Reproduction
pytest -rA tests/pipelines/unidiffuser/test_unidiffuser.py::UniDiffuserPipelineFastTests::test_model_cpu_offload_forward_pass
you can see below error log. The same issue happens on CUDA too.
self = Linear(in_features=32, out_features=32, bias=True)
input = tensor([[[-0.8407, -0.3964, -0.6832, ..., -0.2908, 0.1523, -1.0043],
[-0.8155, -0.1579, 0.6659, ..., 1.4...375, -0.4626, -0.3352],
[-1.2005, -0.1820, 0.4218, ..., -0.3822, -0.5105, -0.2234]]],
device='xpu:0')def forward(self, input: Tensor) -> Tensor: # print(f"input.device: {input.device}, weight device: {self.weight.device}")
return F.linear(input, self.weight, self.bias)
E RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and xpu:0! (when checking argument for argument mat1 in method wrapper_XPU_addmm)
Logs
System Info
N/A
Who can help?
No response