[pipelines] allow models to run with a user-provided dtype map instead of a single dtype

The newer models like Mochi-1 run the text encoder and VAE decoding in FP32 while keeping the denoising process in `torch.bfloat16` autocast. 

Currently, it's not possible for our pipelines to run the different models involved as we set a global `torch_dtype` while initializing the pipeline. 

We have some pipelines like SDXL where the VAE has a [config attribute](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/vae/config.json#L18) called `force_upcast` and it's handled within the pipeline implementation like so:
https://github.com/huggingface/diffusers/blob/cfdeebd4a8f0decc3d0e1f0f05a7112ddd1e0a29/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py#L1264-L1275

Another way to achieve this could be to decouple the major computation stages of the pipeline and users can choose whatever supported `torch_dtype` they want. [Here](https://gist.github.com/sayakpaul/23862a2e7f5ab73dfdcc513751289bea#file-run_flux_under_24gbs-py) is an example. 

But this an involved process and is a power-user thing, IMO. What if we could allow the users to pass a `torch_dtype` map like so:

```py
{"unet": torch.bfloat16, "vae": torch.float32, "text_encoder": torch.float32}
```

This along with @a-r-r-o-w's idea of an upcast marker could really benefit the pipelines that are not resilient to precision changes. 

Cc: @DN6 @yiyixuxu @hlky 

	if not output_type == "latent":
	# make sure the VAE is in float32 mode, as it overflows in float16
	needs_upcasting = self.vae.dtype == torch.float16 and self.vae.config.force_upcast

	if needs_upcasting:
	self.upcast_vae()
	latents = latents.to(next(iter(self.vae.post_quant_conv.parameters())).dtype)
	elif latents.dtype != self.vae.dtype:
	if torch.backends.mps.is_available():
	# some platforms (eg. apple mps) misbehave due to a pytorch bug: https://github.com/pytorch/pytorch/pull/99272
	self.vae = self.vae.to(latents.dtype)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pipelines] allow models to run with a user-provided dtype map instead of a single dtype #10108

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[pipelines] allow models to run with a user-provided dtype map instead of a single dtype #10108

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions