-
Notifications
You must be signed in to change notification settings - Fork 6k
add OnnxStableDiffusionUpscalePipeline pipeline #2158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add OnnxStableDiffusionUpscalePipeline pipeline #2158
Conversation
The documentation is not available anymore as the PR was closed or merged. |
I added a basic test, which is passing locally ( |
@ssube |
@ForserX I'm using this script: https://github.com/ssube/onnx-web/blob/main/api/onnx_web/convert.py#L206 |
How difficult everything is... I'll try, if it doesn't work out, I'll ask for a ready-made model)) |
Using that Some logs from that:
and
|
The Vulkan variation of ESRGAN works even faster Check mail, Please |
I pushed a copy of the model that I have been using to https://huggingface.co/ssube/stable-diffusion-x4-upscaler-onnx and updated the tests accordingly 🤞 |
Cool, cc @anton-l @echarlaix for review |
It remains to wait custiom VAE and LoRA for ONNX)) |
I added another, longer test and fixed up a few of the TODOs. The remaining ones are all related to hard-coded channel counts and the I also tried adding |
cc @anton-l |
Is there anything else I can/should add to this? I'm not sure where to look up the |
@anton-l can you take a look here? |
39bdc34
to
295a96d
Compare
I've been using and testing this pipeline more, with more schedulers, and fixed a couple of issues related to the mix of numpy and torch types. There was an There were a few The last issue I'm aware of is a slight difference between the parameter types to the I did run into one issue with int32 vs int64 types, but that appears to be related to how the model is trained or serialized, and exporting it again with the 4th input as a # UNET
if single_vae:
unet_inputs = ["sample", "timestep", "encoder_hidden_states", "class_labels"]
- unet_scale = torch.tensor(4).to(device=ctx.training_device, dtype=torch.int)
+ unet_scale = torch.tensor(4).to(
+ device=ctx.training_device, dtype=torch.long
+ ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very impressive work @ssube, thank you so much for contributing!
Overall your implementation looks good to me, just left a couple of minor comments :)
For the int32 vs int64 issue: maybe it would be possible to infer the type at runtime, similar to
diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_onnx_stable_diffusion.py
Line 260 in 568b73f
timestep_dtype = next( |
NUM_LATENT_CHANNELS = 4 | ||
NUM_UNET_INPUT_CHANNELS = 7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this works 👍
NUM_UNET_INPUT_CHANNELS = 7 | ||
|
||
# TODO: should this be a lookup? it needs to match the conversion script | ||
class_labels_dtype = np.int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The integer types stay the same even in fp16 mode, so you can safely move it inline
|
||
# 5. Add noise to image | ||
noise_level = torch.tensor([noise_level], dtype=torch.long, device=device) | ||
noise = torch.randn(image.shape, generator=generator, device=device, dtype=text_embeddings_dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
text_embeddings_dtype
can be inferred from text_embeddings
(fp32 or fp16), so this shouldn't be a constant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought so, let me fix that up
### | ||
# This is based on a combination of the ONNX img2img pipeline and the PyTorch upscale pipeline: | ||
# https://github.com/huggingface/diffusers/blob/v0.11.1/src/diffusers/pipelines/stable_diffusion/pipeline_onnx_stable_diffusion_img2img.py | ||
# https://github.com/huggingface/diffusers/blob/v0.11.1/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_upscale.py | ||
### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably ok to remove this disclaimer now 😄
if hasattr(vae, "config"): | ||
# check if vae has a config attribute `scaling_factor` and if it is set to 0.08333, else set it to 0.08333 and deprecate | ||
is_vae_scaling_factor_set_to_0_08333 = ( | ||
hasattr(vae.config, "scaling_factor") and vae.config.scaling_factor == 0.08333 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @patrickvonplaten @patil-suraj for this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure about this part, but if the VAE doesn't have .config
, the current implement will throw without logging much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok for me!
I inlined the integer type and put in lookups for the other two. One of them needed to go from numpy to the torch dtype since that's what the For the int32/64 issue that I mentioned, I tested that a little bit more, and everything seems to work as long as the type in the convert/export code and the pipeline match. Is there any reason not to use int64 there? For more context, this is my convert script and the relevant part is: # UNET
if single_vae: # upscale pipeline
unet_inputs = ["sample", "timestep", "encoder_hidden_states", "class_labels"]
unet_scale = torch.tensor(4).to(device=ctx.training_device, dtype=torch.long) # <- this is the type that needs to match
else:
unet_inputs = ["sample", "timestep", "encoder_hidden_states", "return_dict"]
unet_scale = torch.tensor(False).to(
device=ctx.training_device, dtype=torch.bool
)
unet_in_channels = pipeline.unet.config.in_channels
unet_sample_size = pipeline.unet.config.sample_size
unet_path = output_path / "unet" / "model.onnx"
onnx_export(
pipeline.unet,
model_args=(
torch.randn(2, unet_in_channels, unet_sample_size, unet_sample_size).to(
device=ctx.training_device, dtype=dtype
),
torch.randn(2).to(device=ctx.training_device, dtype=dtype),
torch.randn(2, num_tokens, text_hidden_size).to(
device=ctx.training_device, dtype=dtype
),
unet_scale,
),
output_path=unet_path,
ordered_input_names=unet_inputs,
# has to be different from "sample" for correct tracing
output_names=["out_sample"],
dynamic_axes={
"sample": {0: "batch", 1: "channels", 2: "height", 3: "width"},
"timestep": {0: "batch"},
"encoder_hidden_states": {0: "batch", 1: "sequence"},
},
opset=ctx.opset,
use_external_data_format=True, # UNet is > 2GB, so the weights need to be split
) |
3d102b0
to
9b7810f
Compare
c2b748e
to
9b2c347
Compare
Looks good to me - thanks for checking the PR @anton-l :-) cc @williamberman could you also take a quick look? |
Merging to not block the community contributor here |
* [Onnx] add Stable Diffusion Upscale pipeline * add a test for the OnnxStableDiffusionUpscalePipeline * check for VAE config before adjusting scaling factor * update test assertions, lint fixes * run fix-copies target * switch test checkpoint to one hosted on huggingface * partially restore attention mask * reshape embeddings after running text encoder * add longer nightly test for ONNX upscale pipeline * use package import to fix tests * fix scheduler compatibility and class labels dtype * use more precise type * remove LMS from fast tests * lookup latent and timestamp types * add docs for ONNX upscaling, rename lookup table * replace deprecated pipeline names in ONNX docs
* [Onnx] add Stable Diffusion Upscale pipeline * add a test for the OnnxStableDiffusionUpscalePipeline * check for VAE config before adjusting scaling factor * update test assertions, lint fixes * run fix-copies target * switch test checkpoint to one hosted on huggingface * partially restore attention mask * reshape embeddings after running text encoder * add longer nightly test for ONNX upscale pipeline * use package import to fix tests * fix scheduler compatibility and class labels dtype * use more precise type * remove LMS from fast tests * lookup latent and timestamp types * add docs for ONNX upscaling, rename lookup table * replace deprecated pipeline names in ONNX docs
Hello. On version
|
Thanks for the ping @zetyquickly ! Would you like to open an issue to fix it? |
* [Onnx] add Stable Diffusion Upscale pipeline * add a test for the OnnxStableDiffusionUpscalePipeline * check for VAE config before adjusting scaling factor * update test assertions, lint fixes * run fix-copies target * switch test checkpoint to one hosted on huggingface * partially restore attention mask * reshape embeddings after running text encoder * add longer nightly test for ONNX upscale pipeline * use package import to fix tests * fix scheduler compatibility and class labels dtype * use more precise type * remove LMS from fast tests * lookup latent and timestamp types * add docs for ONNX upscaling, rename lookup table * replace deprecated pipeline names in ONNX docs
* [Onnx] add Stable Diffusion Upscale pipeline * add a test for the OnnxStableDiffusionUpscalePipeline * check for VAE config before adjusting scaling factor * update test assertions, lint fixes * run fix-copies target * switch test checkpoint to one hosted on huggingface * partially restore attention mask * reshape embeddings after running text encoder * add longer nightly test for ONNX upscale pipeline * use package import to fix tests * fix scheduler compatibility and class labels dtype * use more precise type * remove LMS from fast tests * lookup latent and timestamp types * add docs for ONNX upscaling, rename lookup table * replace deprecated pipeline names in ONNX docs
I think I have a working implemention of an
OnnxStableDiffusionUpscalePipeline
, which extendsStableDiffusionUpscalePipeline
to be compatible withOnnxRuntimeModel
. I'm hoping to get some feedback on whether this is the right approach, and if so, what else I need to do before this can be merged besides writing tests. There are a few spots in the code that I have questions about, marked with# TODO
s and noted at the bottom here.Motivation
Running the current
StableDiffusionUpscalePipeline
on a machine without CUDA acceleration can be pretty slow, even with relatively small 128x128 input images. I am writing a web UI for running ONNX pipelines that allows you to run a series of upscaling models (or one model repeatedly), but runningStableDiffusionUpscalePipeline
on a 1024px square input (split into 128px tiles) can easily take 60+ minutes on a 16 core CPU. Using the ONNX runtime is much faster, but that combination was not available, so I wrote this pipeline.StableDiffusionUpscalePipeline
:2.98s/it
or02:28
per tileOnnxStableDiffusionUpscalePipeline
w/ROCmExecutionProvider
:6.46it/s
or00:07
per tileOnnxStableDiffusionUpscalePipeline
w/DMLExecutionProvider
:1.17it/s
or00:42
per tileStableDiffusionUpscalePipeline
:finished pipeline in 0:41:00.270845
OnnxStableDiffusionUpscalePipeline
w/ROCmExecutionProvider
:finished pipeline in 0:02:10.359478
StableDiffusionUpscalePipeline
: still runningOnnxStableDiffusionUpscalePipeline
w/ROCmExecutionProvider
:finished pipeline in 0:05:53.323918
I have only tested this using the
CPUExecutionProvider
andROCmExecutionProvider
so far, but I have machines set up for testing theCUDAExecutionProvider
andDMLExecutionProvider
s and will check on them as well.I tried to make the least-necessary changes and ended up only overriding a few methods. It looks like the preference in some of the other pipelines is to copy methods, which I can also do, but I wanted to find the minimum viable diff. Most of the changes are around passing named parameters to the models and replacing
.sample
with[0]
, but there are a fewndarray.int()
calls that I'm not sure about, and theStableDiffusionUpscalePipeline
code used someconfig
values that do not appear to exist onOnnxRuntimeModel
.Example
TODOs
vae.config.latent_channels
: https://github.com/huggingface/diffusers/pull/2158/files#diff-3815a0888bb607ca69fe4022fa3b4a809687fe2b3ae4d0ea0397288fac3c920bR18unet.config.in_channels
: https://github.com/huggingface/diffusers/pull/2158/files#diff-3815a0888bb607ca69fe4022fa3b4a809687fe2b3ae4d0ea0397288fac3c920bR21text_embeddings.dtype
to torch: https://github.com/huggingface/diffusers/pull/2158/files#diff-3815a0888bb607ca69fe4022fa3b4a809687fe2b3ae4d0ea0397288fac3c920bR97text_input_ids.int()
is safe: https://github.com/huggingface/diffusers/pull/2158/files#diff-3815a0888bb607ca69fe4022fa3b4a809687fe2b3ae4d0ea0397288fac3c920bR226.astype(np.int32)
in https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_onnx_stable_diffusion.py#L150attention_mask
is needed: https://github.com/huggingface/diffusers/pull/2158/files#diff-3815a0888bb607ca69fe4022fa3b4a809687fe2b3ae4d0ea0397288fac3c920bR228text_embeddings.view
is needed: https://github.com/huggingface/diffusers/pull/2158/files#diff-3815a0888bb607ca69fe4022fa3b4a809687fe2b3ae4d0ea0397288fac3c920bR235