You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/api/pipelines/animatediff.md
+86Lines changed: 86 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -25,6 +25,9 @@ The abstract of the paper is the following:
25
25
| Pipeline | Tasks | Demo
26
26
|---|---|:---:|
27
27
|[AnimateDiffPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff.py)|*Text-to-Video Generation with AnimateDiff*|
28
+
|[AnimateDiffControlNetPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff_controlnet.py)|*Controlled Video-to-Video Generation with AnimateDiff using ControlNet*|
29
+
|[AnimateDiffSparseControlNetPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff_sparsectrl.py)|*Controlled Video-to-Video Generation with AnimateDiff using SparseCtrl*|
30
+
|[AnimateDiffSDXLPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff_sdxl.py)|*Video-to-Video Generation with AnimateDiff*|
28
31
|[AnimateDiffVideoToVideoPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff_video2video.py)|*Video-to-Video Generation with AnimateDiff*|
29
32
30
33
## Available checkpoints
@@ -100,6 +103,83 @@ AnimateDiff tends to work better with finetuned Stable Diffusion models. If you
100
103
101
104
</Tip>
102
105
106
+
### AnimateDiffControlNetPipeline
107
+
108
+
AnimateDiff can also be used with ControlNets ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. With a ControlNet model, you can provide an additional control image to condition and control Stable Diffusion generation. For example, if you provide depth maps, the ControlNet model generates a video that'll preserve the spatial information from the depth maps. It is a more flexible and accurate way to control the video generation process.
109
+
110
+
```python
111
+
import torch
112
+
from diffusers import AnimateDiffControlNetPipeline, AutoencoderKL, ControlNetModel, MotionAdapter, LCMScheduler
113
+
from diffusers.utils import export_to_gif, load_video
114
+
115
+
# Additionally, you will need a preprocess videos before they can be used with the ControlNet
116
+
# HF maintains just the right package for it: `pip install controlnet_aux`
117
+
from controlnet_aux.processor import ZoeDetector
118
+
119
+
# Download controlnets from https://huggingface.co/lllyasviel/ControlNet-v1-1 to use .from_single_file
120
+
# Download Diffusers-format controlnets, such as https://huggingface.co/lllyasviel/sd-controlnet-depth, to use .from_pretrained()
# We use AnimateLCM for this example but one can use the original motion adapters as well (for example, https://huggingface.co/guoyww/animatediff-motion-adapter-v1-5-3)
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-vid2vid-input-1.gif" alt="racoon playing a guitar" />
174
+
</td>
175
+
<td align="center">
176
+
a panda, playing a guitar, sitting in a pink boat, in the ocean, mountains in background, realistic, high quality
177
+
<br/>
178
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-controlnet-output.gif" alt="a panda, playing a guitar, sitting in a pink boat, in the ocean, mountains in background, realistic, high quality" />
179
+
</td>
180
+
</tr>
181
+
</table>
182
+
103
183
### AnimateDiffSparseControlNetPipeline
104
184
105
185
[SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models](https://arxiv.org/abs/2311.16933) for achieving controlled generation in text-to-video diffusion models by Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, and Bo Dai.
0 commit comments