Skip to content

Commit 322c8a4

Browse files
authored
Merge branch 'main' into memory-optims
2 parents eb8971f + 23c9802 commit 322c8a4

File tree

86 files changed

+8628
-391
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

86 files changed

+8628
-391
lines changed

.github/workflows/nightly_tests.yml

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -180,6 +180,55 @@ jobs:
180180
pip install slack_sdk tabulate
181181
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
182182
183+
run_torch_compile_tests:
184+
name: PyTorch Compile CUDA tests
185+
186+
runs-on:
187+
group: aws-g4dn-2xlarge
188+
189+
container:
190+
image: diffusers/diffusers-pytorch-compile-cuda
191+
options: --gpus 0 --shm-size "16gb" --ipc host
192+
193+
steps:
194+
- name: Checkout diffusers
195+
uses: actions/checkout@v3
196+
with:
197+
fetch-depth: 2
198+
199+
- name: NVIDIA-SMI
200+
run: |
201+
nvidia-smi
202+
- name: Install dependencies
203+
run: |
204+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
205+
python -m uv pip install -e [quality,test,training]
206+
- name: Environment
207+
run: |
208+
python utils/print_env.py
209+
- name: Run torch compile tests on GPU
210+
env:
211+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
212+
RUN_COMPILE: yes
213+
run: |
214+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "compile" --make-reports=tests_torch_compile_cuda tests/
215+
- name: Failure short reports
216+
if: ${{ failure() }}
217+
run: cat reports/tests_torch_compile_cuda_failures_short.txt
218+
219+
- name: Test suite reports artifacts
220+
if: ${{ always() }}
221+
uses: actions/upload-artifact@v4
222+
with:
223+
name: torch_compile_test_reports
224+
path: reports
225+
226+
- name: Generate Report and Notify Channel
227+
if: always()
228+
run: |
229+
pip install slack_sdk tabulate
230+
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
231+
183232
run_big_gpu_torch_tests:
184233
name: Torch tests on big GPU
185234
strategy:

.github/workflows/release_tests_fast.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -335,7 +335,7 @@ jobs:
335335
- name: Environment
336336
run: |
337337
python utils/print_env.py
338-
- name: Run example tests on GPU
338+
- name: Run torch compile tests on GPU
339339
env:
340340
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
341341
RUN_COMPILE: yes

docs/source/en/api/pipelines/flux.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -347,7 +347,7 @@ image = pipe(
347347
height=1024,
348348
prompt="wearing sunglasses",
349349
negative_prompt="",
350-
true_cfg=4.0,
350+
true_cfg_scale=4.0,
351351
generator=torch.Generator().manual_seed(4444),
352352
ip_adapter_image=image,
353353
).images[0]

docs/source/en/api/pipelines/wan.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424

2525
## Generating Videos with Wan 2.1
2626

27-
We will first need to install some addtional dependencies.
27+
We will first need to install some additional dependencies.
2828

2929
```shell
3030
pip install -u ftfy imageio-ffmpeg imageio

docs/source/en/training/cogvideox.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -216,7 +216,7 @@ Setting the `<ID_TOKEN>` is not necessary. From some limited experimentation, we
216216
> - The original repository uses a `lora_alpha` of `1`. We found this not suitable in many runs, possibly due to difference in modeling backends and training settings. Our recommendation is to set to the `lora_alpha` to either `rank` or `rank // 2`.
217217
> - If you're training on data whose captions generate bad results with the original model, a `rank` of 64 and above is good and also the recommendation by the team behind CogVideoX. If the generations are already moderately good on your training captions, a `rank` of 16/32 should work. We found that setting the rank too low, say `4`, is not ideal and doesn't produce promising results.
218218
> - The authors of CogVideoX recommend 4000 training steps and 100 training videos overall to achieve the best result. While that might yield the best results, we found from our limited experimentation that 2000 steps and 25 videos could also be sufficient.
219-
> - When using the Prodigy opitimizer for training, one can follow the recommendations from [this](https://huggingface.co/blog/sdxl_lora_advanced_script) blog. Prodigy tends to overfit quickly. From my very limited testing, I found a learning rate of `0.5` to be suitable in addition to `--prodigy_use_bias_correction`, `prodigy_safeguard_warmup` and `--prodigy_decouple`.
219+
> - When using the Prodigy optimizer for training, one can follow the recommendations from [this](https://huggingface.co/blog/sdxl_lora_advanced_script) blog. Prodigy tends to overfit quickly. From my very limited testing, I found a learning rate of `0.5` to be suitable in addition to `--prodigy_use_bias_correction`, `prodigy_safeguard_warmup` and `--prodigy_decouple`.
220220
> - The recommended learning rate by the CogVideoX authors and from our experimentation with Adam/AdamW is between `1e-3` and `1e-4` for a dataset of 25+ videos.
221221
>
222222
> Note that our testing is not exhaustive due to limited time for exploration. Our recommendation would be to play around with the different knobs and dials to find the best settings for your data.

docs/source/en/training/dreambooth.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -589,7 +589,7 @@ For stage 2 of DeepFloyd IF with DreamBooth, pay attention to these parameters:
589589

590590
* `--learning_rate=5e-6`, use a lower learning rate with a smaller effective batch size
591591
* `--resolution=256`, the expected resolution for the upscaler
592-
* `--train_batch_size=2` and `--gradient_accumulation_steps=6`, to effectively train on images wiht faces requires larger batch sizes
592+
* `--train_batch_size=2` and `--gradient_accumulation_steps=6`, to effectively train on images with faces requires larger batch sizes
593593

594594
```bash
595595
export MODEL_NAME="DeepFloyd/IF-II-L-v1.0"

docs/source/en/training/t2i_adapters.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ Many of the basic and important parameters are described in the [Text-to-image](
8989

9090
As with the script parameters, a walkthrough of the training script is provided in the [Text-to-image](text2image#training-script) training guide. Instead, this guide takes a look at the T2I-Adapter relevant parts of the script.
9191

92-
The training script begins by preparing the dataset. This incudes [tokenizing](https://github.com/huggingface/diffusers/blob/aab6de22c33cc01fb7bc81c0807d6109e2c998c9/examples/t2i_adapter/train_t2i_adapter_sdxl.py#L674) the prompt and [applying transforms](https://github.com/huggingface/diffusers/blob/aab6de22c33cc01fb7bc81c0807d6109e2c998c9/examples/t2i_adapter/train_t2i_adapter_sdxl.py#L714) to the images and conditioning images.
92+
The training script begins by preparing the dataset. This includes [tokenizing](https://github.com/huggingface/diffusers/blob/aab6de22c33cc01fb7bc81c0807d6109e2c998c9/examples/t2i_adapter/train_t2i_adapter_sdxl.py#L674) the prompt and [applying transforms](https://github.com/huggingface/diffusers/blob/aab6de22c33cc01fb7bc81c0807d6109e2c998c9/examples/t2i_adapter/train_t2i_adapter_sdxl.py#L714) to the images and conditioning images.
9393

9494
```py
9595
conditioning_image_transforms = transforms.Compose(

examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2181,7 +2181,7 @@ def get_sigmas(timesteps, n_dim=4, dtype=torch.float32):
21812181
# Predict the noise residual
21822182
model_pred = transformer(
21832183
hidden_states=packed_noisy_model_input,
2184-
# YiYi notes: divide it by 1000 for now because we scale it by 1000 in the transforme rmodel (we should not keep it but I want to keep the inputs same for the model for testing)
2184+
# YiYi notes: divide it by 1000 for now because we scale it by 1000 in the transformer model (we should not keep it but I want to keep the inputs same for the model for testing)
21852185
timestep=timesteps / 1000,
21862186
guidance=guidance,
21872187
pooled_projections=pooled_prompt_embeds,

examples/community/README.md

Lines changed: 49 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,7 @@ PIXART-α Controlnet pipeline | Implementation of the controlnet model for pixar
8686
| Perturbed-Attention Guidance |StableDiffusionPAGPipeline is a modification of StableDiffusionPipeline to support Perturbed-Attention Guidance (PAG).|[Perturbed-Attention Guidance](#perturbed-attention-guidance)|[Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/perturbed_attention_guidance.ipynb)|[Hyoungwon Cho](https://github.com/HyoungwonCho)|
8787
| CogVideoX DDIM Inversion Pipeline | Implementation of DDIM inversion and guided attention-based editing denoising process on CogVideoX. | [CogVideoX DDIM Inversion Pipeline](#cogvideox-ddim-inversion-pipeline) | - | [LittleNyima](https://github.com/LittleNyima) |
8888
| FaithDiff Stable Diffusion XL Pipeline | Implementation of [(CVPR 2025) FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolutionUnleashing Diffusion Priors for Faithful Image Super-resolution](https://arxiv.org/abs/2411.18824) - FaithDiff is a faithful image super-resolution method that leverages latent diffusion models by actively adapting the diffusion prior and jointly fine-tuning its components (encoder and diffusion model) with an alignment module to ensure high fidelity and structural consistency. | [FaithDiff Stable Diffusion XL Pipeline](#faithdiff-stable-diffusion-xl-pipeline) | [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue)](https://huggingface.co/jychen9811/FaithDiff) | [Junyang Chen, Jinshan Pan, Jiangxin Dong, IMAG Lab, (Adapted by Eliseu Silva)](https://github.com/JyChen9811/FaithDiff) |
89+
| Stable Diffusion 3 InstructPix2Pix Pipeline | Implementation of Stable Diffusion 3 InstructPix2Pix Pipeline | [Stable Diffusion 3 InstructPix2Pix Pipeline](#stable-diffusion-3-instructpix2pix-pipeline) | [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue)](https://huggingface.co/BleachNick/SD3_UltraEdit_freeform) [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue)](https://huggingface.co/CaptainZZZ/sd3-instructpix2pix) | [Jiayu Zhang](https://github.com/xduzhangjiayu) and [Haozhe Zhao](https://github.com/HaozheZhao)|
8990
To load a custom pipeline you just need to pass the `custom_pipeline` argument to `DiffusionPipeline`, as one of the files in `diffusers/examples/community`. Feel free to send a PR with your own pipelines, we will merge them quickly.
9091

9192
```py
@@ -5381,7 +5382,7 @@ pipe = DiffusionPipeline.from_pretrained(
53815382
# Here we need use pipeline internal unet model
53825383
pipe.unet = pipe.unet_model.from_pretrained(model_id, subfolder="unet", variant="fp16", use_safetensors=True)
53835384

5384-
# Load aditional layers to the model
5385+
# Load additional layers to the model
53855386
pipe.unet.load_additional_layers(weight_path="proc_data/faithdiff/FaithDiff.bin", dtype=dtype)
53865387

53875388
# Enable vae tiling
@@ -5432,4 +5433,50 @@ cropped_image = gen_image.crop((0, 0, width_init, height_init))
54325433
cropped_image.save("data/result.png")
54335434
````
54345435
### Result
5435-
[<img src="https://huggingface.co/datasets/DEVAIEXP/assets/resolve/main/faithdiff_restored.PNG" width="512px" height="512px"/>](https://imgsli.com/MzY1NzE2)
5436+
[<img src="https://huggingface.co/datasets/DEVAIEXP/assets/resolve/main/faithdiff_restored.PNG" width="512px" height="512px"/>](https://imgsli.com/MzY1NzE2)
5437+
5438+
5439+
# Stable Diffusion 3 InstructPix2Pix Pipeline
5440+
This the implementation of the Stable Diffusion 3 InstructPix2Pix Pipeline, based on the HuggingFace Diffusers.
5441+
5442+
## Example Usage
5443+
This pipeline aims to edit image based on user's instruction by using SD3
5444+
````py
5445+
import torch
5446+
from diffusers import SD3Transformer2DModel
5447+
from diffusers import DiffusionPipeline
5448+
from diffusers.utils import load_image
5449+
5450+
5451+
resolution = 512
5452+
image = load_image("https://hf.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png").resize(
5453+
(resolution, resolution)
5454+
)
5455+
edit_instruction = "Turn sky into a sunny one"
5456+
5457+
5458+
pipe = DiffusionPipeline.from_pretrained(
5459+
"stabilityai/stable-diffusion-3-medium-diffusers", custom_pipeline="pipeline_stable_diffusion_3_instruct_pix2pix", torch_dtype=torch.float16).to('cuda')
5460+
5461+
pipe.transformer = SD3Transformer2DModel.from_pretrained("CaptainZZZ/sd3-instructpix2pix",torch_dtype=torch.float16).to('cuda')
5462+
5463+
edited_image = pipe(
5464+
prompt=edit_instruction,
5465+
image=image,
5466+
height=resolution,
5467+
width=resolution,
5468+
guidance_scale=7.5,
5469+
image_guidance_scale=1.5,
5470+
num_inference_steps=30,
5471+
).images[0]
5472+
5473+
edited_image.save("edited_image.png")
5474+
````
5475+
|Original|Edited|
5476+
|---|---|
5477+
|![Original image](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/StableDiffusion3InstructPix2Pix/mountain.png)|![Edited image](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/StableDiffusion3InstructPix2Pix/edited.png)
5478+
5479+
### Note
5480+
This model is trained on 512x512, so input size is better on 512x512.
5481+
For better editing performance, please refer to this powerful model https://huggingface.co/BleachNick/SD3_UltraEdit_freeform and Paper "UltraEdit: Instruction-based Fine-Grained Image
5482+
Editing at Scale", many thanks to their contribution!

examples/community/dps_pipeline.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -312,9 +312,9 @@ def contributions(self, in_length, out_length, scale, kernel, kernel_width, anti
312312
# These are the coordinates of the output image
313313
out_coordinates = np.arange(1, out_length + 1)
314314

315-
# since both scale-factor and output size can be provided simulatneously, perserving the center of the image requires shifting
316-
# the output coordinates. the deviation is because out_length doesn't necesary equal in_length*scale.
317-
# to keep the center we need to subtract half of this deivation so that we get equal margins for boths sides and center is preserved.
315+
# since both scale-factor and output size can be provided simultaneously, preserving the center of the image requires shifting
316+
# the output coordinates. the deviation is because out_length doesn't necessary equal in_length*scale.
317+
# to keep the center we need to subtract half of this deviation so that we get equal margins for both sides and center is preserved.
318318
shifted_out_coordinates = out_coordinates - (out_length - in_length * scale) / 2
319319

320320
# These are the matching positions of the output-coordinates on the input image coordinates.

examples/community/fresco_v2v.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -351,7 +351,7 @@ def forward(
351351
cross_attention_kwargs (`dict`, *optional*):
352352
A kwargs dictionary that if specified is passed along to the [`AttnProcessor`].
353353
added_cond_kwargs: (`dict`, *optional*):
354-
A kwargs dictionary containin additional embeddings that if specified are added to the embeddings that
354+
A kwargs dictionary containing additional embeddings that if specified are added to the embeddings that
355355
are passed along to the UNet blocks.
356356
357357
Returns:
@@ -864,9 +864,9 @@ def get_flow_and_interframe_paras(flow_model, imgs):
864864
class AttentionControl:
865865
"""
866866
Control FRESCO-based attention
867-
* enable/diable spatial-guided attention
868-
* enable/diable temporal-guided attention
869-
* enable/diable cross-frame attention
867+
* enable/disable spatial-guided attention
868+
* enable/disable temporal-guided attention
869+
* enable/disable cross-frame attention
870870
* collect intermediate attention feature (for spatial-guided attention)
871871
"""
872872

examples/community/hd_painter.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ def __call__(
3434
temb: Optional[torch.Tensor] = None,
3535
scale: float = 1.0,
3636
) -> torch.Tensor:
37-
# Same as the default AttnProcessor up untill the part where similarity matrix gets saved
37+
# Same as the default AttnProcessor up until the part where similarity matrix gets saved
3838
downscale_factor = self.mask_resoltuion // hidden_states.shape[1]
3939
residual = hidden_states
4040

0 commit comments

Comments
 (0)