Closed
Description
Describe the bug
WAN 2.1 I2V models prepare_latents
method has an issue when num_frames is not at default 81 frames.
Reproduction
Set width=832 height=480 num_frames=15
Logs
│ /home/vlado/dev/sdnext/venv/lib/python3.12/site-packages/diffusers/pipelines/wan/pipeline_wan_i2v.py:611 in __call__ │
│ │
│ 610 │ │ image = self.video_processor.preprocess(image, height=height, width=width).to(device, dtype=torch.float32) │
│ ❱ 611 │ │ latents, condition = self.prepare_latents( │
│ 612 │ │ │ image, │
│ │
│ /home/vlado/dev/sdnext/venv/lib/python3.12/site-packages/diffusers/pipelines/wan/pipeline_wan_i2v.py:424 in prepare_latents │
│ │
│ ❱ 424 │ │ mask_lat_size = mask_lat_size.view(batch_size, -1, self.vae_scale_factor_temporal, latent_height, latent_width) │
│ 425 │ │ mask_lat_size = mask_lat_size.transpose(1, 2) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: shape '[1, -1, 4, 60, 104]' is invalid for input of size 112320
System Info
diffusers==main