Skip to content

Commit 0d3f911

Browse files
committed
feedback
1 parent ec5594c commit 0d3f911

File tree

4 files changed

+67
-29
lines changed

4 files changed

+67
-29
lines changed

docs/source/en/api/pipelines/cogvideox.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,9 @@
1515

1616
<div style="float: right;">
1717
<div class="flex flex-wrap space-x-1">
18-
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
18+
<a href="https://huggingface.co/docs/diffusers/main/en/tutorials/using_peft_for_inference" target="_blank" rel="noopener">
19+
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
20+
</a>
1921
</div>
2022
</div>
2123

@@ -90,7 +92,7 @@ export_to_video(video, "output.mp4", fps=8)
9092
</hfoption>
9193
<hfoption id="inference speed">
9294

93-
Compilation is slow the first time but subsequent calls to the pipeline are faster.
95+
[Compilation](../../optimization/fp16#torchcompile) is slow the first time but subsequent calls to the pipeline are faster.
9496

9597
The average inference time with torch.compile on a 80GB A100 is 76.27 seconds compared to 96.89 seconds for an uncompiled model.
9698

@@ -132,6 +134,9 @@ export_to_video(video, "output.mp4", fps=8)
132134

133135
- CogVideoX supports LoRAs with [`~loaders.CogVideoXLoraLoaderMixin.load_lora_weights`].
134136

137+
<details>
138+
<summary>Show example code</summary>
139+
135140
```py
136141
import torch
137142
from diffusers import CogVideoXPipeline
@@ -167,6 +172,8 @@ export_to_video(video, "output.mp4", fps=8)
167172
export_to_video(video, "output.mp4", fps=16)
168173
```
169174

175+
</details>
176+
170177
- The text-to-video (T2V) checkpoints work best with a resolution of 1360x768 because that was the resolution it was pretrained on.
171178

172179
- The image-to-video (I2V) checkpoints work with multiple resolutions. The width can vary from 768 to 1360, but the height must be 758. Both height and width must be divisible by 16.

docs/source/en/api/pipelines/hunyuan_video.md

Lines changed: 30 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,9 @@
1414

1515
<div style="float: right;">
1616
<div class="flex flex-wrap space-x-1">
17-
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
17+
<a href="https://huggingface.co/docs/diffusers/main/en/tutorials/using_peft_for_inference" target="_blank" rel="noopener">
18+
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
19+
</a>
1820
</div>
1921
</div>
2022

@@ -46,13 +48,13 @@ from diffusers.utils import export_to_video
4648

4749
# quantize weights to int4 with bitsandbytes
4850
pipeline_quant_config = PipelineQuantizationConfig(
49-
quant_backend="bitsandbytes_4bit",
50-
quant_kwargs={
51-
"load_in_4bit": True,
52-
"bnb_4bit_quant_type": "nf4",
53-
"bnb_4bit_compute_dtype": torch.bfloat16
54-
},
55-
components_to_quantize=["transformer"]
51+
quant_backend="bitsandbytes_4bit",
52+
quant_kwargs={
53+
"load_in_4bit": True,
54+
"bnb_4bit_quant_type": "nf4",
55+
"bnb_4bit_compute_dtype": torch.bfloat16
56+
},
57+
components_to_quantize=["transformer"]
5658
)
5759

5860
pipeline = HunyuanVideoPipeline.from_pretrained(
@@ -73,7 +75,7 @@ export_to_video(video, "output.mp4", fps=15)
7375
</hfoption>
7476
<hfoption id="inference speed">
7577

76-
Compilation is slow the first time but subsequent calls to the pipeline are faster.
78+
[Compilation](../../optimization/fp16#torchcompile) is slow the first time but subsequent calls to the pipeline are faster.
7779

7880
```py
7981
import torch
@@ -83,13 +85,13 @@ from diffusers.utils import export_to_video
8385

8486
# quantize weights to int4 with bitsandbytes
8587
pipeline_quant_config = PipelineQuantizationConfig(
86-
quant_backend="bitsandbytes_4bit",
87-
quant_kwargs={
88-
"load_in_4bit": True,
89-
"bnb_4bit_quant_type": "nf4",
90-
"bnb_4bit_compute_dtype": torch.bfloat16
91-
},
92-
components_to_quantize=["transformer"]
88+
quant_backend="bitsandbytes_4bit",
89+
quant_kwargs={
90+
"load_in_4bit": True,
91+
"bnb_4bit_quant_type": "nf4",
92+
"bnb_4bit_compute_dtype": torch.bfloat16
93+
},
94+
components_to_quantize=["transformer"]
9395
)
9496

9597
pipeline = HunyuanVideoPipeline.from_pretrained(
@@ -120,6 +122,9 @@ export_to_video(video, "output.mp4", fps=15)
120122

121123
- HunyuanVideo supports LoRAs with [`~loaders.HunyuanVideoLoraLoaderMixin.load_lora_weights`].
122124

125+
<details>
126+
<summary>Show example code</summary>
127+
123128
```py
124129
import torch
125130
from diffusers import AutoModel, HunyuanVideoPipeline
@@ -128,13 +133,13 @@ export_to_video(video, "output.mp4", fps=15)
128133

129134
# quantize weights to int4 with bitsandbytes
130135
pipeline_quant_config = PipelineQuantizationConfig(
131-
quant_backend="bitsandbytes_4bit",
132-
quant_kwargs={
133-
"load_in_4bit": True,
134-
"bnb_4bit_quant_type": "nf4",
135-
"bnb_4bit_compute_dtype": torch.bfloat16
136-
},
137-
components_to_quantize=["transformer"]
136+
quant_backend="bitsandbytes_4bit",
137+
quant_kwargs={
138+
"load_in_4bit": True,
139+
"bnb_4bit_quant_type": "nf4",
140+
"bnb_4bit_compute_dtype": torch.bfloat16
141+
},
142+
components_to_quantize=["transformer"]
138143
)
139144

140145
pipeline = HunyuanVideoPipeline.from_pretrained(
@@ -159,6 +164,8 @@ export_to_video(video, "output.mp4", fps=15)
159164
export_to_video(video, "output.mp4", fps=15)
160165
```
161166

167+
</details>
168+
162169
- Refer to the table below for recommended inference values.
163170

164171
| parameter | recommended value |

docs/source/en/api/pipelines/ltx_video.md

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,9 @@
1414

1515
<div style="float: right;">
1616
<div class="flex flex-wrap space-x-1">
17-
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
17+
<a href="https://huggingface.co/docs/diffusers/main/en/tutorials/using_peft_for_inference" target="_blank" rel="noopener">
18+
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
19+
</a>
1820
</div>
1921
</div>
2022

@@ -82,7 +84,7 @@ export_to_video(video, "output.mp4", fps=24)
8284
</hfoption>
8385
<hfoption id="inference speed">
8486

85-
Compilation is slow the first time but subsequent calls to the pipeline are faster.
87+
[Compilation](../../optimization/fp16#torchcompile) is slow the first time but subsequent calls to the pipeline are faster.
8688

8789
```py
8890
import torch
@@ -124,6 +126,9 @@ export_to_video(video, "output.mp4", fps=24)
124126

125127
- LTX-Video supports LoRAs with [`~loaders.LTXVideoLoraLoaderMixin.load_lora_weights`].
126128

129+
<details>
130+
<summary>Show example code</summary>
131+
127132
```py
128133
import torch
129134
from diffusers import LTXConditionPipeline
@@ -153,8 +158,13 @@ export_to_video(video, "output.mp4", fps=24)
153158
export_to_video(video, "output.mp4", fps=26)
154159
```
155160

161+
</details>
162+
156163
- LTX-Video supports loading from single files, such as [GGUF checkpoints](../../quantization/gguf), with [`loaders.FromOriginalModelMixin.from_single_file`] or [`loaders.FromSingleFileMixin.from_single_file`].
157164

165+
<details>
166+
<summary>Show example code</summary>
167+
158168
```py
159169
import torch
160170
from diffusers.utils import export_to_video
@@ -172,6 +182,8 @@ export_to_video(video, "output.mp4", fps=24)
172182
)
173183
```
174184

185+
</details>
186+
175187
## LTXPipeline
176188

177189
[[autodoc]] LTXPipeline

docs/source/en/api/pipelines/wan.md

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,9 @@
1414

1515
<div style="float: right;">
1616
<div class="flex flex-wrap space-x-1">
17-
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
17+
<a href="https://huggingface.co/docs/diffusers/main/en/tutorials/using_peft_for_inference" target="_blank" rel="noopener">
18+
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
19+
</a>
1820
</div>
1921
</div>
2022

@@ -100,7 +102,7 @@ export_to_video(output, "output.mp4", fps=16)
100102
</hfoption>
101103
<hfoption id="inference speed">
102104

103-
Compilation is slow the first time but subsequent calls to the pipeline are faster.
105+
[Compilation](../../optimization/fp16#torchcompile) is slow the first time but subsequent calls to the pipeline are faster.
104106

105107
```py
106108
# pip install ftfy
@@ -159,6 +161,9 @@ export_to_video(output, "output.mp4", fps=16)
159161

160162
- Wan2.1 supports LoRAs with [`~loaders.WanLoraLoaderMixin.load_lora_weights`].
161163

164+
<details>
165+
<summary>Show example code</summary>
166+
162167
```py
163168
# pip install ftfy
164169
import torch
@@ -199,8 +204,13 @@ export_to_video(output, "output.mp4", fps=16)
199204
export_to_video(output, "output.mp4", fps=16)
200205
```
201206

207+
</details>
208+
202209
- [`WanTransformer3DModel`] and [`AutoencoderKLWan`] supports loading from single files with [`~loaders.FromSingleFileMixin.from_single_file`].
203210

211+
<details>
212+
<summary>Show example code</summary>
213+
204214
```py
205215
# pip install ftfy
206216
import torch
@@ -221,6 +231,8 @@ export_to_video(output, "output.mp4", fps=16)
221231
)
222232
```
223233

234+
</details>
235+
224236
- Set the [`AutoencoderKLWan`] dtype to `torch.float32` for better decoding quality.
225237

226238
- The number of frames per second (fps) or `k` should be calculated by `4 * k + 1`.

0 commit comments

Comments
 (0)