You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage
68
+
A woman with long brown hair and light skin smiles at another woman with long blonde hair.
69
+
The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek.
70
+
The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and
71
+
natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage
A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage
109
+
A woman with long brown hair and light skin smiles at another woman with long blonde hair.
110
+
The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek.
111
+
The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and
112
+
natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage
- Refer to the following recommended settings for generation from the [LTX-Video](https://github.com/Lightricks/LTX-Video) repository.
135
+
136
+
- The recommended dtype for the transformer, VAE, and text encoder is `torch.bfloat16`. The VAE and text encoder can also be `torch.float32` or `torch.float16`.
137
+
- For guidance-distilled variants of LTX-Video, set `guidance_scale` to `1.0`. The `guidance_scale` for any other model should be set higher, like `5.0`, for good generation quality.
138
+
- For timestep-aware VAE variants (LTX-Video 0.9.1 and above), set `decode_timestep` to `0.05` and `image_cond_noise_scale` to `0.025`.
139
+
- For variants that support interpolation between multiple conditioning images and videos (LTX-Video 0.9.5 and above), use similar images and videos for the best results. Divergence from the conditioning inputs may lead to abrupt transitionts in the generated video.
140
+
141
+
- LTX-Video 0.9.7 includes a spatial latent upscaler and a 13B parameter transformer. During inference, a low resolution video is quickly generated first and then upscaled and refined.
142
+
143
+
<details>
144
+
<summary>Show example code</summary>
145
+
146
+
```py
147
+
import torch
148
+
from diffusers import LTXConditionPipeline, LTXLatentUpsamplePipeline
149
+
from diffusers.pipelines.ltx.pipeline_ltx_condition import LTXVideoCondition
150
+
from diffusers.utils import export_to_video, load_video
# 3. Denoise the upscaled video with few steps to improve texture (optional, but recommended)
209
+
video = pipeline(
210
+
conditions=[condition1],
211
+
prompt=prompt,
212
+
negative_prompt=negative_prompt,
213
+
width=upscaled_width,
214
+
height=upscaled_height,
215
+
num_frames=num_frames,
216
+
denoise_strength=0.4, # Effectively, 4 inference steps out of 10
217
+
num_inference_steps=10,
218
+
latents=upscaled_latents,
219
+
decode_timestep=0.05,
220
+
decode_noise_scale=0.025,
221
+
image_cond_noise_scale=0.0,
222
+
guidance_scale=5.0,
223
+
guidance_rescale=0.7,
224
+
generator=torch.Generator().manual_seed(0),
225
+
output_type="pil",
226
+
).frames[0]
227
+
228
+
# 4. Downscale the video to the expected resolution
229
+
video = [frame.resize((expected_width, expected_height)) for frame in video]
230
+
231
+
export_to_video(video, "output.mp4", fps=24)
232
+
```
233
+
234
+
</details>
235
+
236
+
- LTX-Video 0.9.7 distilled model is guidance and timestep-distilled to speedup generation. It requires `guidance_scale` to be set to `1.0` and `num_inference_steps` should be set between `4` and `10` for good generation quality. You should also use the following custom timesteps for the best results.
237
+
238
+
- Base model inference to prepare for upscaling: `[1000, 993, 987, 981, 975, 909, 725, 0.03]`.
239
+
- Upscaling: `[1000, 909, 725, 421, 0]`.
240
+
241
+
<details>
242
+
<summary>Show example code</summary>
243
+
244
+
```py
245
+
import torch
246
+
from diffusers import LTXConditionPipeline, LTXLatentUpsamplePipeline
247
+
from diffusers.pipelines.ltx.pipeline_ltx_condition import LTXVideoCondition
248
+
from diffusers.utils import export_to_video, load_video
0 commit comments