Skip to content

Commit e5b94b4

Browse files
a-r-r-o-wDN6
andauthored
[core] Move community AnimateDiff ControlNet to core (#8972)
* add animatediff controlnet to core * make style; remove unused method * fix copied from comment * add tests * changes to make tests work * add utility function to load videos * update docs * update pipeline example * make style * update docs with example * address review comments * add latest freeinit test from #8969 * LoraLoaderMixin -> StableDiffusionLoraLoaderMixin * fix docs * Update src/diffusers/utils/loading_utils.py Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> * fix: variable out of scope --------- Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
1 parent 69e72b1 commit e5b94b4

File tree

10 files changed

+1682
-8
lines changed

10 files changed

+1682
-8
lines changed

docs/source/en/api/pipelines/animatediff.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,9 @@ The abstract of the paper is the following:
2525
| Pipeline | Tasks | Demo
2626
|---|---|:---:|
2727
| [AnimateDiffPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff.py) | *Text-to-Video Generation with AnimateDiff* |
28+
| [AnimateDiffControlNetPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff_controlnet.py) | *Controlled Video-to-Video Generation with AnimateDiff using ControlNet* |
29+
| [AnimateDiffSparseControlNetPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff_sparsectrl.py) | *Controlled Video-to-Video Generation with AnimateDiff using SparseCtrl* |
30+
| [AnimateDiffSDXLPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff_sdxl.py) | *Video-to-Video Generation with AnimateDiff* |
2831
| [AnimateDiffVideoToVideoPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff_video2video.py) | *Video-to-Video Generation with AnimateDiff* |
2932

3033
## Available checkpoints
@@ -100,6 +103,83 @@ AnimateDiff tends to work better with finetuned Stable Diffusion models. If you
100103

101104
</Tip>
102105

106+
### AnimateDiffControlNetPipeline
107+
108+
AnimateDiff can also be used with ControlNets ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. With a ControlNet model, you can provide an additional control image to condition and control Stable Diffusion generation. For example, if you provide depth maps, the ControlNet model generates a video that'll preserve the spatial information from the depth maps. It is a more flexible and accurate way to control the video generation process.
109+
110+
```python
111+
import torch
112+
from diffusers import AnimateDiffControlNetPipeline, AutoencoderKL, ControlNetModel, MotionAdapter, LCMScheduler
113+
from diffusers.utils import export_to_gif, load_video
114+
115+
# Additionally, you will need a preprocess videos before they can be used with the ControlNet
116+
# HF maintains just the right package for it: `pip install controlnet_aux`
117+
from controlnet_aux.processor import ZoeDetector
118+
119+
# Download controlnets from https://huggingface.co/lllyasviel/ControlNet-v1-1 to use .from_single_file
120+
# Download Diffusers-format controlnets, such as https://huggingface.co/lllyasviel/sd-controlnet-depth, to use .from_pretrained()
121+
controlnet = ControlNetModel.from_single_file("control_v11f1p_sd15_depth.pth", torch_dtype=torch.float16)
122+
123+
# We use AnimateLCM for this example but one can use the original motion adapters as well (for example, https://huggingface.co/guoyww/animatediff-motion-adapter-v1-5-3)
124+
motion_adapter = MotionAdapter.from_pretrained("wangfuyun/AnimateLCM")
125+
126+
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse", torch_dtype=torch.float16)
127+
pipe: AnimateDiffControlNetPipeline = AnimateDiffControlNetPipeline.from_pretrained(
128+
"SG161222/Realistic_Vision_V5.1_noVAE",
129+
motion_adapter=motion_adapter,
130+
controlnet=controlnet,
131+
vae=vae,
132+
).to(device="cuda", dtype=torch.float16)
133+
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config, beta_schedule="linear")
134+
pipe.load_lora_weights("wangfuyun/AnimateLCM", weight_name="AnimateLCM_sd15_t2v_lora.safetensors", adapter_name="lcm-lora")
135+
pipe.set_adapters(["lcm-lora"], [0.8])
136+
137+
depth_detector = ZoeDetector.from_pretrained("lllyasviel/Annotators").to("cuda")
138+
video = load_video("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-vid2vid-input-1.gif")
139+
conditioning_frames = []
140+
141+
with pipe.progress_bar(total=len(video)) as progress_bar:
142+
for frame in video:
143+
conditioning_frames.append(depth_detector(frame))
144+
progress_bar.update()
145+
146+
prompt = "a panda, playing a guitar, sitting in a pink boat, in the ocean, mountains in background, realistic, high quality"
147+
negative_prompt = "bad quality, worst quality"
148+
149+
video = pipe(
150+
prompt=prompt,
151+
negative_prompt=negative_prompt,
152+
num_frames=len(video),
153+
num_inference_steps=10,
154+
guidance_scale=2.0,
155+
conditioning_frames=conditioning_frames,
156+
generator=torch.Generator().manual_seed(42),
157+
).frames[0]
158+
159+
export_to_gif(video, "animatediff_controlnet.gif", fps=8)
160+
```
161+
162+
Here are some sample outputs:
163+
164+
<table align="center">
165+
<tr>
166+
<th align="center">Source Video</th>
167+
<th align="center">Output Video</th>
168+
</tr>
169+
<tr>
170+
<td align="center">
171+
raccoon playing a guitar
172+
<br />
173+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-vid2vid-input-1.gif" alt="racoon playing a guitar" />
174+
</td>
175+
<td align="center">
176+
a panda, playing a guitar, sitting in a pink boat, in the ocean, mountains in background, realistic, high quality
177+
<br/>
178+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-controlnet-output.gif" alt="a panda, playing a guitar, sitting in a pink boat, in the ocean, mountains in background, realistic, high quality" />
179+
</td>
180+
</tr>
181+
</table>
182+
103183
### AnimateDiffSparseControlNetPipeline
104184

105185
[SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models](https://arxiv.org/abs/2311.16933) for achieving controlled generation in text-to-video diffusion models by Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, and Bo Dai.
@@ -762,6 +842,12 @@ pipe = AnimateDiffPipeline.from_pretrained("emilianJR/epiCRealism", motion_adapt
762842
- all
763843
- __call__
764844

845+
## AnimateDiffControlNetPipeline
846+
847+
[[autodoc]] AnimateDiffControlNetPipeline
848+
- all
849+
- __call__
850+
765851
## AnimateDiffSparseControlNetPipeline
766852

767853
[[autodoc]] AnimateDiffSparseControlNetPipeline

src/diffusers/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -232,6 +232,7 @@
232232
"AmusedImg2ImgPipeline",
233233
"AmusedInpaintPipeline",
234234
"AmusedPipeline",
235+
"AnimateDiffControlNetPipeline",
235236
"AnimateDiffPipeline",
236237
"AnimateDiffSDXLPipeline",
237238
"AnimateDiffSparseControlNetPipeline",
@@ -652,6 +653,7 @@
652653
AmusedImg2ImgPipeline,
653654
AmusedInpaintPipeline,
654655
AmusedPipeline,
656+
AnimateDiffControlNetPipeline,
655657
AnimateDiffPipeline,
656658
AnimateDiffSDXLPipeline,
657659
AnimateDiffSparseControlNetPipeline,

src/diffusers/pipelines/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,7 @@
118118
_import_structure["amused"] = ["AmusedImg2ImgPipeline", "AmusedInpaintPipeline", "AmusedPipeline"]
119119
_import_structure["animatediff"] = [
120120
"AnimateDiffPipeline",
121+
"AnimateDiffControlNetPipeline",
121122
"AnimateDiffSDXLPipeline",
122123
"AnimateDiffSparseControlNetPipeline",
123124
"AnimateDiffVideoToVideoPipeline",
@@ -419,6 +420,7 @@
419420
else:
420421
from .amused import AmusedImg2ImgPipeline, AmusedInpaintPipeline, AmusedPipeline
421422
from .animatediff import (
423+
AnimateDiffControlNetPipeline,
422424
AnimateDiffPipeline,
423425
AnimateDiffSDXLPipeline,
424426
AnimateDiffSparseControlNetPipeline,

src/diffusers/pipelines/animatediff/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
_dummy_objects.update(get_objects_from_module(dummy_torch_and_transformers_objects))
2323
else:
2424
_import_structure["pipeline_animatediff"] = ["AnimateDiffPipeline"]
25+
_import_structure["pipeline_animatediff_controlnet"] = ["AnimateDiffControlNetPipeline"]
2526
_import_structure["pipeline_animatediff_sdxl"] = ["AnimateDiffSDXLPipeline"]
2627
_import_structure["pipeline_animatediff_sparsectrl"] = ["AnimateDiffSparseControlNetPipeline"]
2728
_import_structure["pipeline_animatediff_video2video"] = ["AnimateDiffVideoToVideoPipeline"]
@@ -35,6 +36,7 @@
3536

3637
else:
3738
from .pipeline_animatediff import AnimateDiffPipeline
39+
from .pipeline_animatediff_controlnet import AnimateDiffControlNetPipeline
3840
from .pipeline_animatediff_sdxl import AnimateDiffSDXLPipeline
3941
from .pipeline_animatediff_sparsectrl import AnimateDiffSparseControlNetPipeline
4042
from .pipeline_animatediff_video2video import AnimateDiffVideoToVideoPipeline

0 commit comments

Comments
 (0)