WanImageToVideoPipeline broken math when preparing latents

### Describe the bug

WAN 2.1 I2V models `prepare_latents` method has an issue when num_frames is not at default 81 frames.



### Reproduction

Set width=832 height=480 num_frames=15


### Logs

```shell
│ /home/vlado/dev/sdnext/venv/lib/python3.12/site-packages/diffusers/pipelines/wan/pipeline_wan_i2v.py:611 in __call__                                                                                                                                                                                                                                                                                                             │
│                                                                                                                                                                                                                                                                                                                                                                                                                                  │
│   610 │   │   image = self.video_processor.preprocess(image, height=height, width=width).to(device, dtype=torch.float32)                                                                                                                                                                                                                                                                                                         │
│ ❱ 611 │   │   latents, condition = self.prepare_latents(                                                                                                                                                                                                                                                                                                                                                                         │
│   612 │   │   │   image,                                                                                                                                                                                                                                                                                                                                                                                                         │
│                                                                                                                                                                                                                                                                                                                                                                                                                                  │
│ /home/vlado/dev/sdnext/venv/lib/python3.12/site-packages/diffusers/pipelines/wan/pipeline_wan_i2v.py:424 in prepare_latents                                                                                                                                                                                                                                                                                                      │
│                                                                                                                                                                                                                                                                                                                                                                                                                                  │
│ ❱ 424 │   │   mask_lat_size = mask_lat_size.view(batch_size, -1, self.vae_scale_factor_temporal, latent_height, latent_width)                                                                                                                                                                                                                                                                                                    │
│   425 │   │   mask_lat_size = mask_lat_size.transpose(1, 2)                                                                                                                                                                                                                                                                                                                                                                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: shape '[1, -1, 4, 60, 104]' is invalid for input of size 112320
```

### System Info

diffusers==main

### Who can help?

@DN6 @a-r-r-o-w @hlky 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WanImageToVideoPipeline broken math when preparing latents #11163

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

WanImageToVideoPipeline broken math when preparing latents #11163

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions