Skip to content

WanImageToVideoPipeline broken math when preparing latents #11163

Closed
@vladmandic

Description

@vladmandic

Describe the bug

WAN 2.1 I2V models prepare_latents method has an issue when num_frames is not at default 81 frames.

Reproduction

Set width=832 height=480 num_frames=15

Logs

│ /home/vlado/dev/sdnext/venv/lib/python3.12/site-packages/diffusers/pipelines/wan/pipeline_wan_i2v.py:611 in __call__                                                                                                                                                                                                                                                                                                             │
│                                                                                                                                                                                                                                                                                                                                                                                                                                  │
│   610 │   │   image = self.video_processor.preprocess(image, height=height, width=width).to(device, dtype=torch.float32)                                                                                                                                                                                                                                                                                                         │
│ ❱ 611 │   │   latents, condition = self.prepare_latents(                                                                                                                                                                                                                                                                                                                                                                         │
│   612 │   │   │   image,                                                                                                                                                                                                                                                                                                                                                                                                         │
│                                                                                                                                                                                                                                                                                                                                                                                                                                  │
│ /home/vlado/dev/sdnext/venv/lib/python3.12/site-packages/diffusers/pipelines/wan/pipeline_wan_i2v.py:424 in prepare_latents                                                                                                                                                                                                                                                                                                      │
│                                                                                                                                                                                                                                                                                                                                                                                                                                  │
│ ❱ 424 │   │   mask_lat_size = mask_lat_size.view(batch_size, -1, self.vae_scale_factor_temporal, latent_height, latent_width)                                                                                                                                                                                                                                                                                                    │
│   425 │   │   mask_lat_size = mask_lat_size.transpose(1, 2)                                                                                                                                                                                                                                                                                                                                                                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: shape '[1, -1, 4, 60, 104]' is invalid for input of size 112320

System Info

diffusers==main

Who can help?

@DN6 @a-r-r-o-w @hlky

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions