Open
Description
Can anyone help me?
I used WanX's diffusers and used apply_group_offloading according to url: https://huggingface.co/docs/diffusers/main/en/optimization/memory.
The code is as follows:
image_encoder = CLIPVisionModel.from_pretrained(local_model_path, subfolder="image_encoder", torch_dtype=torch.float32)
vae = AutoencoderKLWan.from_pretrained(local_model_path, subfolder="vae", torch_dtype=torch.float32)
scheduler_b = UniPCMultistepScheduler(prediction_type="flow_prediction", use_flow_sigmas=True, flow_shift=5.0)
pipe = WanImageToVideoPipeline.from_pretrained(local_model_path, vae=vae, image_encoder=image_encoder, scheduler=scheduler_b, torch_dtype=torch.bfloat16)
pipe.transformer.enable_group_offload(onload_device=torch.device("cuda"), offload_device=torch.device("cpu"), offload_type="block_level", num_blocks_per_group=1, use_stream=True)
apply_group_offloading(pipe.text_encoder, onload_device=torch.device("cuda"), offload_type="block_level", num_blocks_per_group=1, use_stream=True)
apply_group_offloading(pipe.vae, onload_device=torch.device("cuda"), offload_type="block_level", num_blocks_per_group=1, use_stream=True)
apply_group_offloading(pipe.image_encoder, onload_device=torch.device("cuda"), offload_type="block_level", num_blocks_per_group=1, use_stream=True)
Then print the device information:
Before apply_offload: text_encoder device: cpu transformer device: cpu vae device: cpu image_encoder device: cpu start to group_offload_block_1_stream After apply_offload: text_encoder device: cpu transformer device: cpu vae device: cpu image_encoder device: cpu
Finally, an exception is thrown:
return F.conv3d( ^^^^^^^^^ RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
Does anyone know how to fix this? Thanks a lot.