Skip to content

Commit 0042efd

Browse files
nipunjindalnjindal
and
njindal
authored
[1929]: Add CLIP guidance for Img2Img stable diffusion pipeline (#2723)
* [Img2Img]: Copyover img2img pipeline * [Img2Img]: img2img pipeline * [Img2Img]: img2img pipeline * [Img2Img]: img2img pipeline --------- Co-authored-by: njindal <njindal@adobe.com>
1 parent f024e00 commit 0042efd

File tree

2 files changed

+556
-0
lines changed

2 files changed

+556
-0
lines changed

examples/community/README.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ MagicMix | Diffusion Pipeline for semantic mixing of an image and a text prompt
3030
| UnCLIP Text Interpolation Pipeline | Diffusion Pipeline that allows passing two prompts and produces images while interpolating between the text-embeddings of the two prompts | [UnCLIP Text Interpolation Pipeline](#unclip-text-interpolation-pipeline) | - | [Naga Sai Abhinay Devarinti](https://github.com/Abhinay1997/) |
3131
| UnCLIP Image Interpolation Pipeline | Diffusion Pipeline that allows passing two images/image_embeddings and produces images while interpolating between their image-embeddings | [UnCLIP Image Interpolation Pipeline](#unclip-image-interpolation-pipeline) | - | [Naga Sai Abhinay Devarinti](https://github.com/Abhinay1997/) |
3232
| DDIM Noise Comparative Analysis Pipeline | Investigating how the diffusion models learn visual concepts from each noise level (which is a contribution of [P2 weighting (CVPR 2022)](https://arxiv.org/abs/2204.00227)) | [DDIM Noise Comparative Analysis Pipeline](#ddim-noise-comparative-analysis-pipeline) | - |[Aengus (Duc-Anh)](https://github.com/aengusng8) |
33+
| CLIP Guided Img2Img Stable Diffusion Pipeline | Doing CLIP guidance for image to image generation with Stable Diffusion | [CLIP Guided Img2Img Stable Diffusion](#clip-guided-img2img-stable-diffusion) | - | [Nipun Jindal](https://github.com/nipunjindal/) |
3334

3435

3536

@@ -1074,3 +1075,58 @@ for strength in np.linspace(0.1, 1, 25):
10741075
Here is the result of this pipeline (which is DDIM) on CelebA-HQ dataset.
10751076

10761077
![noise-comparative-analysis](https://user-images.githubusercontent.com/67547213/224677066-4474b2ed-56ab-4c27-87c6-de3c0255eb9c.jpeg)
1078+
1079+
### CLIP Guided Img2Img Stable Diffusion
1080+
1081+
CLIP guided Img2Img stable diffusion can help to generate more realistic images with an initial image
1082+
by guiding stable diffusion at every denoising step with an additional CLIP model.
1083+
1084+
The following code requires roughly 12GB of GPU RAM.
1085+
1086+
```python
1087+
from io import BytesIO
1088+
import requests
1089+
import torch
1090+
from diffusers import DiffusionPipeline
1091+
from PIL import Image
1092+
from transformers import CLIPFeatureExtractor, CLIPModel
1093+
feature_extractor = CLIPFeatureExtractor.from_pretrained(
1094+
"laion/CLIP-ViT-B-32-laion2B-s34B-b79K"
1095+
)
1096+
clip_model = CLIPModel.from_pretrained(
1097+
"laion/CLIP-ViT-B-32-laion2B-s34B-b79K", torch_dtype=torch.float16
1098+
)
1099+
guided_pipeline = DiffusionPipeline.from_pretrained(
1100+
"CompVis/stable-diffusion-v1-4",
1101+
# custom_pipeline="clip_guided_stable_diffusion",
1102+
custom_pipeline="/home/njindal/diffusers/examples/community/clip_guided_stable_diffusion.py",
1103+
clip_model=clip_model,
1104+
feature_extractor=feature_extractor,
1105+
torch_dtype=torch.float16,
1106+
)
1107+
guided_pipeline.enable_attention_slicing()
1108+
guided_pipeline = guided_pipeline.to("cuda")
1109+
prompt = "fantasy book cover, full moon, fantasy forest landscape, golden vector elements, fantasy magic, dark light night, intricate, elegant, sharp focus, illustration, highly detailed, digital painting, concept art, matte, art by WLOP and Artgerm and Albert Bierstadt, masterpiece"
1110+
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
1111+
response = requests.get(url)
1112+
init_image = Image.open(BytesIO(response.content)).convert("RGB")
1113+
image = guided_pipeline(
1114+
prompt=prompt,
1115+
num_inference_steps=30,
1116+
image=init_image,
1117+
strength=0.75,
1118+
guidance_scale=7.5,
1119+
clip_guidance_scale=100,
1120+
num_cutouts=4,
1121+
use_cutouts=False,
1122+
).images[0]
1123+
display(image)
1124+
```
1125+
1126+
Init Image
1127+
1128+
![img2img_init_clip_guidance](https://huggingface.co/datasets/njindal/images/resolve/main/clip_guided_img2img_init.jpg)
1129+
1130+
Output Image
1131+
1132+
![img2img_clip_guidance](https://huggingface.co/datasets/njindal/images/resolve/main/clip_guided_img2img.jpg)

0 commit comments

Comments
 (0)