-
Notifications
You must be signed in to change notification settings - Fork 6k
[SDXL] Partial diffusion support for Text2Img and Img2Img Pipelines #4015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…rst_inference_step for img2img
…estamps in img2img
The documentation is not available anymore as the PR was closed or merged. |
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good to me. For me, the following things are TODO:
- Ensure the CI is green.
- Docs -- let's make sure to add an elaborate section about these things in our docs here: https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_xl.
- Add test cases to ensure feature robustness and parity.
More than happy to help if things are unclear.
@patrickvonplaten WDYT?
Hey @bghira, Thanks for the PR, could you maybe post a couple of examples with Copy-pasting it here: from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
pipe.to("cuda") # OR, pipe.enable_sequential_cpu_offload() OR, pipe.enable_model_cpu_offload()
# if using torch < 2.0
# pipe.enable_xformers_memory_efficient_attention()
prompt = "An astronaut riding a green horse"
# We're going to schedule 20 steps, and complete 10 of them.
image = pipe(prompt=prompt, output_type="latent",
num_inference_steps=20, max_inference_steps=10).images
# If you have low vram.
# del pipe
# import gc
# gc.collect()
# torch.clear_cache()
# torch.clear_autocast_cache()
# Put through the refiner now.
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
pipe.to("cuda") # OR, pipe.enable_sequential_cpu_offload() OR, pipe.enable_model_cpu_offload()
# if using torch < 2.0
# pipe.enable_xformers_memory_efficient_attention()
# If add_noise is False, we will no longer add noise during Img2Img.
# This is useful since the base details will be preserved. However,
# switching add_noise to True can yield truly interesting results.
images = pipe(prompt=prompt, image=image,
num_inference_steps=20, first_inference_step=10, add_noise=False).images |
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py
Show resolved
Hide resolved
Hey @bghira, Very cool PR, I think I've understood the full workflow now. So essentially what we're doing here is that we use SD-XL base to denoise the image for high noise timesteps and we use the refiner to denoise the images for low noise timesteps. In that sense your approach here is very similar to the idea of eDiffi - it's very cool that you made it work for SD-XL! From the eDiffi paper:
=> So in a sense "base" is used here mostly to generate textaligned content and the refiner is used for high visual fidelity results. Think we need to add tests and docs here as well :-) |
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py
Outdated
Show resolved
Hide resolved
@@ -619,6 +638,8 @@ def __call__( | |||
] = None, | |||
strength: float = 0.3, | |||
num_inference_steps: int = 50, | |||
max_inference_steps: Optional[int] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_inference_steps: Optional[int] = None, | |
denoising_end: Optional[float] = None, |
Same as: https://github.com/huggingface/diffusers/pull/4015/files#r1257982804 => can we switch to % here maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if you use percentages the accuracy of the calculation and converting to integer means you might misalign timesteps, which results in artifacting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have an idea i will implement a helper method to return a pair of values from two inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm can you explain how you might accidentally misalign?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think this can't really happen if we always round, e.g.:
max_inference_steps = int(round(denoising_end * num_inference_steps))
first_inference_steps = int(round(denoising_start * num_inference_Steps))
denoising_end
should always be equal to 1 - denoising_start
and in this case we will always have:
num_inference_steps = max_inference_steps + first_inference_steps
no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it happens when you have precise values for strength
that result in remainders in the int()
casting that become discarded. i have tested it:
In this test, num_inference_steps
is set to 27
and strength
is set to 0.73
- the calculation says that the model should run for about 19.71
steps, but since we can't run a fractional step, this would be truncated by int()
to 19
, but the value for begin_inference_step
is 7.27
which truncates to 7
.
This results in a misalignment of one step, which is, apparently, enough to disrupt the output altogether.
We can use round(), but instead, I have opted for what I've considered a more robust approach.
Furthermore, there are some prompts/seeds/subject material where these artifacts prove useful in making the output more variable and creative. Having the option to set those and override our best judgement seems like one of the things that makes Diffusers great, and I wanted to retain that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on @patrickvonplaten suggestion on using a percentage
It seems to me we will just have to set denoising_end = denoising_start
, and default it to be (1- strength)
?
then,
final_inference_steps = int(denoising_end * num_inference_steps)
begin_inference_steps = int(denoising_start * num_inference_steps)
so in the example num_inference_step = 27
, strength= 0.73
, we will have final_inference_steps = begin_inference_steps = 7
, i.e run 7
steps in base model and 20
in refiner - is this what's intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ive tested this, and while it seems like it should work, it has really broken the results for me, overall. i am digging into it.
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py
Outdated
Show resolved
Hide resolved
Very excited about this PR. This makes a lot of sense. From a user perspective I think it would be nicer to provide fractions/%'s, such as I think this has two advantages:
So the this API could look as follows: from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
pipe.to("cuda") # OR, pipe.enable_sequential_cpu_offload() OR, pipe.enable_model_cpu_offload()
# if using torch < 2.0
# pipe.enable_xformers_memory_efficient_attention()
prompt = "An astronaut riding a green horse"
# We're going to schedule 20 steps, and complete 10 of them.
image = pipe(prompt=prompt, output_type="latent", num_inference_steps=20, denoising_end=0.5).images
# If you have low vram.
# del pipe
# import gc
# gc.collect()
# torch.clear_cache()
# torch.clear_autocast_cache()
# Put through the refiner now.
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
pipe.to("cuda") # OR, pipe.enable_sequential_cpu_offload() OR, pipe.enable_model_cpu_offload()
# if using torch < 2.0
# pipe.enable_xformers_memory_efficient_attention()
# If add_noise is False, we will no longer add noise during Img2Img.
# This is useful since the base details will be preserved. However,
# switching add_noise to True can yield truly interesting results.
images = pipe(prompt=prompt, image=image, num_inference_steps=20, denoising_start=0.5).images |
Here is an updated example: Example:
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
pipe.to("cuda") # OR, pipe.enable_sequential_cpu_offload() OR, pipe.enable_model_cpu_offload()
# if using torch < 2.0
# pipe.enable_xformers_memory_efficient_attention()
prompt = "An astronaut riding a green horse"
# We're going to schedule 20 steps, and complete 50% of them using either model.
total_num_steps = 20
refiner_strength = 0.5
# These values can be manipulated manually to get different results.
final_inference_step, begin_inference_step = pipe.timesteps_from_strength(refiner_strength, total_num_steps)
image = pipe(prompt=prompt, output_type="latent",
num_inference_steps=total_num_steps, final_inference_step=final_inference_step).images
# If you have low vram.
# del pipe
# import gc
# gc.collect()
# torch.clear_cache()
# torch.clear_autocast_cache()
# Put through the refiner now.
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
pipe.to("cuda") # OR, pipe.enable_sequential_cpu_offload() OR, pipe.enable_model_cpu_offload()
# if using torch < 2.0
# pipe.enable_xformers_memory_efficient_attention()
images = pipe(prompt=prompt, image=image,
num_inference_steps=total_num_steps, begin_inference_step=begin_inference_step).images |
a7f47cd
to
942c6a5
Compare
942c6a5
to
062afbb
Compare
I took a crack at that today, for a few hours. I couldn't make the percentages work reliably. The risk of artifacting is pretty great. If anyone else would like to take a swing. I welcome more commits and attention on this work. Here's a comparison grid of the demo prompt with the fully denoised base model samples on the left column, and the partially denoised base model samples on the right column. The image at the top centre is what the base model did on its own. The results are using DDIM with 40 steps divided between the models from 10% up to 90% refiner strength. https://tripleback.net/public/partial-diffusion-comparison.png |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fantastic!
May I also have some a brief explanation of the two separate test cases being added?
IIUC, their purpose is to test for:
- Make sure this feature works for a set of schedulers
- Make sure
denoising_end
anddenoising_start
behaviour is robust - Make sure the internal computations for steps within the schedulers are as expected.
docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx
Outdated
Show resolved
Hide resolved
docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx
Outdated
Show resolved
Hide resolved
docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx
Outdated
Show resolved
Hide resolved
docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx
Outdated
Show resolved
Hide resolved
docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Failing test is unrelated - merging |
The docs for this seem to be wrong. It should actually be |
I was wrong. However I don't see the point of this method. Here is fraction 0.7 (top) vs 1.0 (bottom): Frac 1.0 also runs faster than 0.7. I feel like I'm missing something here. Frac 1.0 should skip the refiner entirely, right? But then why is my result so, so much sharper than the 70/30 split of base/refiner? |
After hours of debugging I found the issue: THIS METHOD DOES NOT WORK WITH KARRAS SCHEDULERS! Should either be fixed (if possible) or mentioned in the docs. |
Yes, I already opened an issue and I told them about the issue when they were implementing it, but it was still merged. |
the Karras sigmas pattern is generally less tested and sees more issues like that. I have ceased using them for a month or more now, in favour of the rescaled DDIM timesteps. |
DPM++ 2M (without karras) is also very washed out and less detailed than with the regular method. |
can you open a new issue report if you're seeing issues you'd like resolved? they're unlikely to receive the proper attention in a comment thread on a merged PR. |
I guess we're done here as #4060 already mentions improvements to this approach. |
Let's maybe continue the discussion in #4115 :-) |
…uggingface#4015) * diffusers#4003 - initial implementation of max_inference_steps * diffusers#4003 - initial implementation of max_inference_steps and first_inference_step for img2img * diffusers#4003 - use first_inference_step as an input arg for get_timestamps in img2img * diffusers#4003 Do not add noise during img2img when we have a defined first timestep * diffusers#4003 Mild updates after revert * diffusers#4003 Missing change * Show implementation with denoising_start and end * Apply suggestions from code review * Update src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * move to 0.19.0dev * Apply suggestions from code review * add exhaustive tests * add docs * finish * Apply suggestions from code review Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * make style --------- Co-authored-by: bghira <bghira@users.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
…uggingface#4015) * diffusers#4003 - initial implementation of max_inference_steps * diffusers#4003 - initial implementation of max_inference_steps and first_inference_step for img2img * diffusers#4003 - use first_inference_step as an input arg for get_timestamps in img2img * diffusers#4003 Do not add noise during img2img when we have a defined first timestep * diffusers#4003 Mild updates after revert * diffusers#4003 Missing change * Show implementation with denoising_start and end * Apply suggestions from code review * Update src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * move to 0.19.0dev * Apply suggestions from code review * add exhaustive tests * add docs * finish * Apply suggestions from code review Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * make style --------- Co-authored-by: bghira <bghira@users.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
…uggingface#4015) * diffusers#4003 - initial implementation of max_inference_steps * diffusers#4003 - initial implementation of max_inference_steps and first_inference_step for img2img * diffusers#4003 - use first_inference_step as an input arg for get_timestamps in img2img * diffusers#4003 Do not add noise during img2img when we have a defined first timestep * diffusers#4003 Mild updates after revert * diffusers#4003 Missing change * Show implementation with denoising_start and end * Apply suggestions from code review * Update src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * move to 0.19.0dev * Apply suggestions from code review * add exhaustive tests * add docs * finish * Apply suggestions from code review Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * make style --------- Co-authored-by: bghira <bghira@users.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
…uggingface#4015) * diffusers#4003 - initial implementation of max_inference_steps * diffusers#4003 - initial implementation of max_inference_steps and first_inference_step for img2img * diffusers#4003 - use first_inference_step as an input arg for get_timestamps in img2img * diffusers#4003 Do not add noise during img2img when we have a defined first timestep * diffusers#4003 Mild updates after revert * diffusers#4003 Missing change * Show implementation with denoising_start and end * Apply suggestions from code review * Update src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * move to 0.19.0dev * Apply suggestions from code review * add exhaustive tests * add docs * finish * Apply suggestions from code review Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * make style --------- Co-authored-by: bghira <bghira@users.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
…uggingface#4015) * diffusers#4003 - initial implementation of max_inference_steps * diffusers#4003 - initial implementation of max_inference_steps and first_inference_step for img2img * diffusers#4003 - use first_inference_step as an input arg for get_timestamps in img2img * diffusers#4003 Do not add noise during img2img when we have a defined first timestep * diffusers#4003 Mild updates after revert * diffusers#4003 Missing change * Show implementation with denoising_start and end * Apply suggestions from code review * Update src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * move to 0.19.0dev * Apply suggestions from code review * add exhaustive tests * add docs * finish * Apply suggestions from code review Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * make style --------- Co-authored-by: bghira <bghira@users.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
What does this PR do?
Fixes #4003
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.