Skip to content

Commit 591af15

Browse files
authored
Merge branch 'main' into fix-vae-lora
2 parents ca918e4 + 5588725 commit 591af15

File tree

106 files changed

+9835
-3825
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

106 files changed

+9835
-3825
lines changed

.github/workflows/nightly_tests.yml

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -180,6 +180,62 @@ jobs:
180180
pip install slack_sdk tabulate
181181
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
182182
183+
run_big_gpu_torch_tests:
184+
name: Torch tests on big GPU
185+
strategy:
186+
fail-fast: false
187+
max-parallel: 2
188+
runs-on:
189+
group: aws-g6e-xlarge-plus
190+
container:
191+
image: diffusers/diffusers-pytorch-cuda
192+
options: --shm-size "16gb" --ipc host --gpus 0
193+
steps:
194+
- name: Checkout diffusers
195+
uses: actions/checkout@v3
196+
with:
197+
fetch-depth: 2
198+
- name: NVIDIA-SMI
199+
run: nvidia-smi
200+
- name: Install dependencies
201+
run: |
202+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
203+
python -m uv pip install -e [quality,test]
204+
python -m uv pip install peft@git+https://github.com/huggingface/peft.git
205+
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
206+
python -m uv pip install pytest-reportlog
207+
- name: Environment
208+
run: |
209+
python utils/print_env.py
210+
- name: Selected Torch CUDA Test on big GPU
211+
env:
212+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
213+
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
214+
CUBLAS_WORKSPACE_CONFIG: :16:8
215+
BIG_GPU_MEMORY: 40
216+
run: |
217+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
218+
-m "big_gpu_with_torch_cuda" \
219+
--make-reports=tests_big_gpu_torch_cuda \
220+
--report-log=tests_big_gpu_torch_cuda.log \
221+
tests/
222+
- name: Failure short reports
223+
if: ${{ failure() }}
224+
run: |
225+
cat reports/tests_big_gpu_torch_cuda_stats.txt
226+
cat reports/tests_big_gpu_torch_cuda_failures_short.txt
227+
- name: Test suite reports artifacts
228+
if: ${{ always() }}
229+
uses: actions/upload-artifact@v4
230+
with:
231+
name: torch_cuda_big_gpu_test_reports
232+
path: reports
233+
- name: Generate Report and Notify Channel
234+
if: always()
235+
run: |
236+
pip install slack_sdk tabulate
237+
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
238+
183239
run_flax_tpu_tests:
184240
name: Nightly Flax TPU Tests
185241
runs-on: docker-tpu

.github/workflows/ssh-runner.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,13 @@ on:
44
workflow_dispatch:
55
inputs:
66
runner_type:
7-
description: 'Type of runner to test (aws-g6-4xlarge-plus: a10 or aws-g4dn-2xlarge: t4)'
7+
description: 'Type of runner to test (aws-g6-4xlarge-plus: a10, aws-g4dn-2xlarge: t4, aws-g6e-xlarge-plus: L40)'
88
type: choice
99
required: true
1010
options:
1111
- aws-g6-4xlarge-plus
1212
- aws-g4dn-2xlarge
13+
- aws-g6e-xlarge-plus
1314
docker_image:
1415
description: 'Name of the Docker image'
1516
required: true

docs/source/en/_toctree.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -270,6 +270,8 @@
270270
title: LatteTransformer3DModel
271271
- local: api/models/lumina_nextdit2d
272272
title: LuminaNextDiT2DModel
273+
- local: api/models/mochi_transformer3d
274+
title: MochiTransformer3DModel
273275
- local: api/models/pixart_transformer2d
274276
title: PixArtTransformer2DModel
275277
- local: api/models/prior_transformer
@@ -306,6 +308,8 @@
306308
title: AutoencoderKLAllegro
307309
- local: api/models/autoencoderkl_cogvideox
308310
title: AutoencoderKLCogVideoX
311+
- local: api/models/autoencoderkl_mochi
312+
title: AutoencoderKLMochi
309313
- local: api/models/asymmetricautoencoderkl
310314
title: AsymmetricAutoencoderKL
311315
- local: api/models/consistency_decoder_vae
@@ -400,6 +404,8 @@
400404
title: Lumina-T2X
401405
- local: api/pipelines/marigold
402406
title: Marigold
407+
- local: api/pipelines/mochi
408+
title: Mochi
403409
- local: api/pipelines/panorama
404410
title: MultiDiffusion
405411
- local: api/pipelines/musicldm
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# AutoencoderKLMochi
13+
14+
The 3D variational autoencoder (VAE) model with KL loss used in [Mochi](https://github.com/genmoai/models) was introduced in [Mochi 1 Preview](https://huggingface.co/genmo/mochi-1-preview) by Tsinghua University & ZhipuAI.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import AutoencoderKLMochi
20+
21+
vae = AutoencoderKLMochi.from_pretrained("genmo/mochi-1-preview", subfolder="vae", torch_dtype=torch.float32).to("cuda")
22+
```
23+
24+
## AutoencoderKLMochi
25+
26+
[[autodoc]] AutoencoderKLMochi
27+
- decode
28+
- all
29+
30+
## DecoderOutput
31+
32+
[[autodoc]] models.autoencoders.vae.DecoderOutput

docs/source/en/api/models/controlnet.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,12 +39,12 @@ pipe = StableDiffusionControlNetPipeline.from_single_file(url, controlnet=contro
3939

4040
## ControlNetOutput
4141

42-
[[autodoc]] models.controlnet.ControlNetOutput
42+
[[autodoc]] models.controlnets.controlnet.ControlNetOutput
4343

4444
## FlaxControlNetModel
4545

4646
[[autodoc]] FlaxControlNetModel
4747

4848
## FlaxControlNetOutput
4949

50-
[[autodoc]] models.controlnet_flax.FlaxControlNetOutput
50+
[[autodoc]] models.controlnets.controlnet_flax.FlaxControlNetOutput

docs/source/en/api/models/controlnet_sd3.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,5 +38,5 @@ pipe = StableDiffusion3ControlNetPipeline.from_pretrained("stabilityai/stable-di
3838

3939
## SD3ControlNetOutput
4040

41-
[[autodoc]] models.controlnet_sd3.SD3ControlNetOutput
41+
[[autodoc]] models.controlnets.controlnet_sd3.SD3ControlNetOutput
4242

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# MochiTransformer3DModel
13+
14+
A Diffusion Transformer model for 3D video-like data was introduced in [Mochi-1 Preview](https://huggingface.co/genmo/mochi-1-preview) by Genmo.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import MochiTransformer3DModel
20+
21+
vae = MochiTransformer3DModel.from_pretrained("genmo/mochi-1-preview", subfolder="transformer", torch_dtype=torch.float16).to("cuda")
22+
```
23+
24+
## MochiTransformer3DModel
25+
26+
[[autodoc]] MochiTransformer3DModel
27+
28+
## Transformer2DModelOutput
29+
30+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput

docs/source/en/api/pipelines/mochi.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
-->
15+
16+
# Mochi
17+
18+
[Mochi 1 Preview](https://huggingface.co/genmo/mochi-1-preview) from Genmo.
19+
20+
*Mochi 1 preview is an open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence in preliminary evaluation. This model dramatically closes the gap between closed and open video generation systems. The model is released under a permissive Apache 2.0 license.*
21+
22+
<Tip>
23+
24+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
25+
26+
</Tip>
27+
28+
## MochiPipeline
29+
30+
[[autodoc]] MochiPipeline
31+
- all
32+
- __call__
33+
34+
## MochiPipelineOutput
35+
36+
[[autodoc]] pipelines.mochi.pipeline_output.MochiPipelineOutput

docs/source/en/training/distributed_inference.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,7 @@ Add the transformer model to the pipeline for denoising, but set the other model
183183

184184
```py
185185
pipeline = FluxPipeline.from_pretrained(
186-
"black-forest-labs/FLUX.1-dev", ,
186+
"black-forest-labs/FLUX.1-dev",
187187
text_encoder=None,
188188
text_encoder_2=None,
189189
tokenizer=None,

examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1778,15 +1778,10 @@ def load_model_hook(models, input_dir):
17781778
if not args.enable_t5_ti:
17791779
# pure textual inversion - only clip
17801780
if pure_textual_inversion:
1781-
params_to_optimize = [
1782-
text_parameters_one_with_lr,
1783-
]
1781+
params_to_optimize = [text_parameters_one_with_lr]
17841782
te_idx = 0
17851783
else: # regular te training or regular pivotal for clip
1786-
params_to_optimize = [
1787-
transformer_parameters_with_lr,
1788-
text_parameters_one_with_lr,
1789-
]
1784+
params_to_optimize = [transformer_parameters_with_lr, text_parameters_one_with_lr]
17901785
te_idx = 1
17911786
elif args.enable_t5_ti:
17921787
# pivotal tuning of clip & t5
@@ -1809,9 +1804,7 @@ def load_model_hook(models, input_dir):
18091804
]
18101805
te_idx = 1
18111806
else:
1812-
params_to_optimize = [
1813-
transformer_parameters_with_lr,
1814-
]
1807+
params_to_optimize = [transformer_parameters_with_lr]
18151808

18161809
# Optimizer creation
18171810
if not (args.optimizer.lower() == "prodigy" or args.optimizer.lower() == "adamw"):
@@ -1871,7 +1864,6 @@ def load_model_hook(models, input_dir):
18711864
params_to_optimize[-1]["lr"] = args.learning_rate
18721865
optimizer = optimizer_class(
18731866
params_to_optimize,
1874-
lr=args.learning_rate,
18751867
betas=(args.adam_beta1, args.adam_beta2),
18761868
beta3=args.prodigy_beta3,
18771869
weight_decay=args.adam_weight_decay,

0 commit comments

Comments
 (0)