Skip to content

Commit 9df566e

Browse files
authored
[Community] StyleAligned Pipeline (#6489)
* add stylealigned sdxl pipeline * bugfix * update docs * remove einops dependency * update README * update example docstring
1 parent be0b425 commit 9df566e

File tree

2 files changed

+2086
-0
lines changed

2 files changed

+2086
-0
lines changed

examples/community/README.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ prompt-to-prompt | change parts of a prompt and retain image structure (see [pap
5757
| DemoFusion Pipeline | Implementation of [DemoFusion: Democratising High-Resolution Image Generation With No $$$](https://arxiv.org/abs/2311.16973) | [DemoFusion Pipeline](#DemoFusion) | - | [Ruoyi Du](https://github.com/RuoyiDu) |
5858
| Null-Text Inversion Pipeline | Implement [Null-text Inversion for Editing Real Images using Guided Diffusion Models](https://arxiv.org/abs/2211.09794) as a pipeline. | [Null-Text Inversion](https://github.com/google/prompt-to-prompt/) | - | [Junsheng Luan](https://github.com/Junsheng121) |
5959
| Rerender A Video Pipeline | Implementation of [[SIGGRAPH Asia 2023] Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation](https://arxiv.org/abs/2306.07954) | [Rerender A Video Pipeline](#Rerender_A_Video) | - | [Yifan Zhou](https://github.com/SingleZombie) |
60+
| StyleAligned Pipeline | Implementation of [Style Aligned Image Generation via Shared Attention](https://arxiv.org/abs/2312.02133) | [StyleAligned Pipeline](#stylealigned-pipeline) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://drive.google.com/file/d/15X2E0jFPTajUIjS0FzX50OaHsCbP2lQ0/view?usp=sharing) | [Aryan V S](https://github.com/a-r-r-o-w) |
6061

6162
To load a custom pipeline you just need to pass the `custom_pipeline` argument to `DiffusionPipeline`, as one of the files in `diffusers/examples/community`. Feel free to send a PR with your own pipelines, we will merge them quickly.
6263
```py
@@ -3027,7 +3028,9 @@ export_to_gif(result.frames[0], "result.gif")
30273028
<td align=center><img src="https://github.com/huggingface/diffusers/assets/72266394/eb7d2952-72e4-44fa-b664-077c79b4fc70" alt="gif-2"></td>
30283029
</tr>
30293030
</table>
3031+
30303032
### DemoFusion
3033+
30313034
This pipeline is the official implementation of [DemoFusion: Democratising High-Resolution Image Generation With No $$$](https://arxiv.org/abs/2311.16973).
30323035
The original repo can be found at [repo](https://github.com/PRIS-CV/DemoFusion).
30333036
- `view_batch_size` (`int`, defaults to 16):
@@ -3272,4 +3275,62 @@ output_frames = pipe(
32723275

32733276
export_to_video(
32743277
output_frames, "/path/to/video.mp4", 5)
3278+
```
3279+
3280+
### StyleAligned Pipeline
3281+
3282+
This pipeline is the implementation of [Style Aligned Image Generation via Shared Attention](https://arxiv.org/abs/2312.02133).
3283+
3284+
> Large-scale Text-to-Image (T2I) models have rapidly gained prominence across creative fields, generating visually compelling outputs from textual prompts. However, controlling these models to ensure consistent style remains challenging, with existing methods necessitating fine-tuning and manual intervention to disentangle content and style. In this paper, we introduce StyleAligned, a novel technique designed to establish style alignment among a series of generated images. By employing minimal `attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models. This approach allows for the creation of style-consistent images using a reference style through a straightforward inversion operation. Our method's evaluation across diverse styles and text prompts demonstrates high-quality synthesis and fidelity, underscoring its efficacy in achieving consistent style across various inputs.
3285+
3286+
```python
3287+
from typing import List
3288+
3289+
import torch
3290+
from diffusers.pipelines.pipeline_utils import DiffusionPipeline
3291+
from PIL import Image
3292+
3293+
model_id = "a-r-r-o-w/dreamshaper-xl-turbo"
3294+
pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16, variant="fp16", custom_pipeline="pipeline_sdxl_style_aligned")
3295+
pipe = pipe.to("cuda")
3296+
3297+
# Enable memory saving techniques
3298+
pipe.enable_vae_slicing()
3299+
pipe.enable_vae_tiling()
3300+
3301+
prompt = [
3302+
"a toy train. macro photo. 3d game asset",
3303+
"a toy airplane. macro photo. 3d game asset",
3304+
"a toy bicycle. macro photo. 3d game asset",
3305+
"a toy car. macro photo. 3d game asset",
3306+
]
3307+
negative_prompt = "low quality, worst quality, "
3308+
3309+
# Enable StyleAligned
3310+
pipe.enable_style_aligned(
3311+
share_group_norm=False,
3312+
share_layer_norm=False,
3313+
share_attention=True,
3314+
adain_queries=True,
3315+
adain_keys=True,
3316+
adain_values=False,
3317+
full_attention_share=False,
3318+
shared_score_scale=1.0,
3319+
shared_score_shift=0.0,
3320+
only_self_level=0.0,
3321+
)
3322+
3323+
# Run inference
3324+
images = pipe(
3325+
prompt=prompt,
3326+
negative_prompt=negative_prompt,
3327+
guidance_scale=2,
3328+
height=1024,
3329+
width=1024,
3330+
num_inference_steps=10,
3331+
generator=torch.Generator().manual_seed(42),
3332+
).images
3333+
3334+
# Disable StyleAligned if you do not wish to use it anymore
3335+
pipe.disable_style_aligned()
32753336
```

0 commit comments

Comments
 (0)