Skip to content

Commit 43e1f7a

Browse files
committed
stereo image generation
1 parent f8a600c commit 43e1f7a

File tree

2 files changed

+134
-9
lines changed

2 files changed

+134
-9
lines changed

README.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,17 @@
11
# High Resolution Depth Maps for Stable Diffusion WebUI
2-
This script is an addon for [AUTOMATIC1111's Stable Diffusion WebUI](https://github.com/AUTOMATIC1111/stable-diffusion-webui) that creates `depth maps` from the generated images. The result can be viewed on 3D or holographic devices like VR headsets or [Looking Glass](https://lookingglassfactory.com/) displays, used in Render- or Game- Engines on a plane with a displacement modifier, and maybe even 3D printed.
2+
This script is an addon for [AUTOMATIC1111's Stable Diffusion WebUI](https://github.com/AUTOMATIC1111/stable-diffusion-webui) that creates `depth maps` from the generated or existing images. The result can be viewed on 3D or holographic devices like VR headsets or [Looking Glass](https://lookingglassfactory.com/) displays, used in Render- or Game- Engines on a plane with a displacement modifier, and maybe even 3D printed.
33

44
To generate realistic depth maps from a single image, this script uses code and models from the [MiDaS](https://github.com/isl-org/MiDaS) repository by Intel ISL (see [https://pytorch.org/hub/intelisl_midas_v2/](https://pytorch.org/hub/intelisl_midas_v2/) for more info), or LeReS from the [AdelaiDepth](https://github.com/aim-uofa/AdelaiDepth) repository by Advanced Intelligent Machines. Multi-resolution merging as implemented by [BoostingMonocularDepth](https://github.com/compphoto/BoostingMonocularDepth) is used to generate high resolution depth maps.
55

6+
3D stereo, and red/cyan anaglyph images are generated using code from the [stereo-image-generation](https://github.com/m5823779/stereo-image-generation) repository. Thanks to [@sina-masoud-ansari](https://github.com/sina-masoud-ansari) for the tip! Discussion [here](https://github.com/thygate/stable-diffusion-webui-depthmap-script/discussions/45).
7+
68
## Examples
79
[![screenshot](examples.png)](https://raw.githubusercontent.com/thygate/stable-diffusion-webui-depthmap-script/main/examples.png)
810

911
## Changelog
12+
* v0.2.9 new feature
13+
* 3D Stereo (side-by-side) and red/cyan anaglyph image generation.
14+
(Thanks to [@sina-masoud-ansari](https://github.com/sina-masoud-ansari) for the tip! Discussion [here](https://github.com/thygate/stable-diffusion-webui-depthmap-script/discussions/45))
1015
* v0.2.8 bugfix
1116
* boost (pix2pix) now also able to compute on cpu
1217
* res101 able to compute on cpu
@@ -94,11 +99,15 @@ To see the generated output in the webui `Show DepthMap` should be enabled. When
9499
To make the depthmap easier to analyze for human eyes, `Show HeatMap` shows an extra image in the WebUI that has a color gradient applied. It is not saved.
95100

96101
When `Combine into one image` is enabled, the depthmap will be combined with the original image, the orientation can be selected with `Combine axis`. When disabled, the depthmap will be saved as a 16 bit single channel PNG as opposed to a three channel (RGB), 8 bit per channel image when the option is enabled.
102+
103+
When either `Generate Stereo` or `Generate anaglyph` is enabled, a stereo image will be generated. The `IPD`, or Pupillary distance is given in centimeter along with the `Screen Width`.
104+
97105
> 💡 Saving as any format other than PNG always produces an 8 bit, 3 channel RGB image. A single channel 16 bit image is only supported when saving as PNG.
98106
99107
## FAQ
100108

101109
* `Can I use this on existing images ?`
110+
- Yes, you can now use the Depth tab to easily process existing images.
102111
- Yes, in img2img, set denoising strength to 0. This will effectively skip stable diffusion and use the input image. You will still have to set the correct size, and need to select `Crop and resize` instead of `Just resize` when the input image resolution does not match the set size perfectly.
103112
* `Can I run this on google colab ?`
104113
- You can run the MiDaS network on their colab linked here https://pytorch.org/hub/intelisl_midas_v2/

scripts/depthmap.py

Lines changed: 124 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@
5151

5252
whole_size_threshold = 1600 # R_max from the paper
5353
pix2pixsize = 1024
54-
scriptname = "DepthMap v0.2.8"
54+
scriptname = "DepthMap v0.2.9"
5555

5656
class Script(scripts.Script):
5757
def title(self):
@@ -78,13 +78,21 @@ def ui(self, is_img2img):
7878
save_depth = gr.Checkbox(label="Save DepthMap",value=True)
7979
show_depth = gr.Checkbox(label="Show DepthMap",value=True)
8080
show_heat = gr.Checkbox(label="Show HeatMap",value=False)
81+
with gr.Group():
82+
with gr.Row():
83+
gen_stereo = gr.Checkbox(label="Generate Stereo side-by-side image",value=False)
84+
gen_anaglyph = gr.Checkbox(label="Generate Stereo anaglyph image (red/cyan)",value=False)
85+
with gr.Row():
86+
stereo_ipd = gr.Slider(minimum=5, maximum=7.5, step=0.1, label='IPD (cm)', value=6.4)
87+
stereo_size = gr.Slider(minimum=20, maximum=100, step=0.5, label='Screen Width (cm)', value=38.5)
88+
8189
with gr.Box():
8290
gr.HTML("Instructions, comment and share @ <a href='https://github.com/thygate/stable-diffusion-webui-depthmap-script'>https://github.com/thygate/stable-diffusion-webui-depthmap-script</a>")
8391

84-
return [compute_device, model_type, net_width, net_height, match_size, invert_depth, boost, save_depth, show_depth, show_heat, combine_output, combine_output_axis]
92+
return [compute_device, model_type, net_width, net_height, match_size, invert_depth, boost, save_depth, show_depth, show_heat, combine_output, combine_output_axis, gen_stereo, gen_anaglyph, stereo_ipd, stereo_size]
8593

8694
# run from script in txt2img or img2img
87-
def run(self, p, compute_device, model_type, net_width, net_height, match_size, invert_depth, boost, save_depth, show_depth, show_heat, combine_output, combine_output_axis):
95+
def run(self, p, compute_device, model_type, net_width, net_height, match_size, invert_depth, boost, save_depth, show_depth, show_heat, combine_output, combine_output_axis, gen_stereo, gen_anaglyph, stereo_ipd, stereo_size):
8896

8997
# sd process
9098
processed = processing.process_images(p)
@@ -98,13 +106,13 @@ def run(self, p, compute_device, model_type, net_width, net_height, match_size,
98106
continue
99107
inputimages.append(processed.images[count])
100108

101-
newmaps = run_depthmap(processed, p.outpath_samples, inputimages, None, compute_device, model_type, net_width, net_height, match_size, invert_depth, boost, save_depth, show_depth, show_heat, combine_output, combine_output_axis)
109+
newmaps = run_depthmap(processed, p.outpath_samples, inputimages, None, compute_device, model_type, net_width, net_height, match_size, invert_depth, boost, save_depth, show_depth, show_heat, combine_output, combine_output_axis, gen_stereo, gen_anaglyph, stereo_ipd, stereo_size)
102110
for img in newmaps:
103111
processed.images.append(img)
104112

105113
return processed
106114

107-
def run_depthmap(processed, outpath, inputimages, inputnames, compute_device, model_type, net_width, net_height, match_size, invert_depth, boost, save_depth, show_depth, show_heat, combine_output, combine_output_axis):
115+
def run_depthmap(processed, outpath, inputimages, inputnames, compute_device, model_type, net_width, net_height, match_size, invert_depth, boost, save_depth, show_depth, show_heat, combine_output, combine_output_axis, gen_stereo, gen_anaglyph, stereo_ipd, stereo_size):
108116

109117
# unload sd model
110118
shared.sd_model.cond_stage_model.to(devices.cpu)
@@ -320,6 +328,30 @@ def run_depthmap(processed, outpath, inputimages, inputnames, compute_device, mo
320328
heatmap = (colormap(img_output2[:,:,0] / 256.0) * 2**16).astype(np.uint16)[:,:,:3]
321329
outimages.append(heatmap)
322330

331+
if gen_stereo or gen_anaglyph:
332+
print("Generating Stereo image..")
333+
#img_output = cv2.blur(img_output, (3, 3))
334+
left_img = np.asarray(inputimages[count])
335+
right_img = generate_stereo(left_img, img_output, stereo_ipd, stereo_size)
336+
stereo_img = np.hstack([right_img, inputimages[count]])
337+
if gen_stereo:
338+
outimages.append(stereo_img)
339+
if gen_anaglyph:
340+
print("Generating Anaglyph image..")
341+
anaglyph_img = overlap(right_img, left_img)
342+
outimages.append(anaglyph_img)
343+
if (processed is not None):
344+
if gen_stereo:
345+
images.save_image(Image.fromarray(stereo_img), outpath, "", processed.all_seeds[count], processed.all_prompts[count], opts.samples_format, info=info, p=processed, suffix="_stereo")
346+
if gen_anaglyph:
347+
images.save_image(Image.fromarray(anaglyph_img), outpath, "", processed.all_seeds[count], processed.all_prompts[count], opts.samples_format, info=info, p=processed, suffix="_anaglyph")
348+
else:
349+
# from tab
350+
if gen_stereo:
351+
images.save_image(Image.fromarray(stereo_img), path=outpath, basename=basename, seed=None, prompt=None, extension=opts.samples_format, info=info, short_filename=True,no_prompt=True, grid=False, pnginfo_section_name="extras", existing_info=None, forced_filename=None, suffix="_stereo")
352+
if gen_anaglyph:
353+
images.save_image(Image.fromarray(anaglyph_img), path=outpath, basename=basename, seed=None, prompt=None, extension=opts.samples_format, info=info, short_filename=True,no_prompt=True, grid=False, pnginfo_section_name="extras", existing_info=None, forced_filename=None, suffix="_anaglyph")
354+
323355
print("Done.")
324356

325357
except RuntimeError as e:
@@ -343,6 +375,74 @@ def run_depthmap(processed, outpath, inputimages, inputnames, compute_device, mo
343375

344376
return outimages
345377

378+
379+
380+
def generate_stereo(left_img, depth, ipd, monitor_w):
381+
#MONITOR_W = 38.5 #50 #38.5
382+
h, w, c = left_img.shape
383+
384+
depth_min = depth.min()
385+
depth_max = depth.max()
386+
depth = (depth - depth_min) / (depth_max - depth_min)
387+
388+
right = np.zeros_like(left_img)
389+
390+
deviation_cm = ipd * 0.12
391+
deviation = deviation_cm * monitor_w * (w / 1920)
392+
393+
print("deviation:", deviation)
394+
395+
for row in range(h):
396+
for col in range(w):
397+
col_r = col - int((1 - depth[row][col] ** 2) * deviation)
398+
# col_r = col - int((1 - depth[row][col]) * deviation)
399+
if col_r >= 0:
400+
right[row][col_r] = left_img[row][col]
401+
402+
right_fix = np.array(right)
403+
gray = cv2.cvtColor(right_fix, cv2.COLOR_BGR2GRAY)
404+
rows, cols = np.where(gray == 0)
405+
for row, col in zip(rows, cols):
406+
for offset in range(1, int(deviation)):
407+
r_offset = col + offset
408+
l_offset = col - offset
409+
if r_offset < w and not np.all(right_fix[row][r_offset] == 0):
410+
right_fix[row][col] = right_fix[row][r_offset]
411+
break
412+
if l_offset >= 0 and not np.all(right_fix[row][l_offset] == 0):
413+
right_fix[row][col] = right_fix[row][l_offset]
414+
break
415+
416+
return right_fix
417+
418+
def overlap(im1, im2):
419+
width1 = im1.shape[1]
420+
height1 = im1.shape[0]
421+
width2 = im2.shape[1]
422+
height2 = im2.shape[0]
423+
424+
# final image
425+
composite = np.zeros((height2, width2, 3), np.uint8)
426+
427+
# iterate through "left" image, filling in red values of final image
428+
for i in range(height1):
429+
for j in range(width1):
430+
try:
431+
composite[i, j, 0] = im1[i, j, 0]
432+
except IndexError:
433+
pass
434+
435+
# iterate through "right" image, filling in blue/green values of final image
436+
for i in range(height2):
437+
for j in range(width2):
438+
try:
439+
composite[i, j, 1] = im2[i, j, 1]
440+
composite[i, j, 2] = im2[i, j, 2]
441+
except IndexError:
442+
pass
443+
444+
return composite
445+
346446
def run_generate(depthmap_mode,
347447
depthmap_image,
348448
image_batch,
@@ -359,7 +459,11 @@ def run_generate(depthmap_mode,
359459
show_depth,
360460
show_heat,
361461
combine_output,
362-
combine_output_axis
462+
combine_output_axis,
463+
gen_stereo,
464+
gen_anaglyph,
465+
stereo_ipd,
466+
stereo_size
363467
):
364468

365469
imageArr = []
@@ -396,7 +500,7 @@ def run_generate(depthmap_mode,
396500
outpath = opts.outdir_samples or opts.outdir_extras_samples
397501

398502

399-
outputs = run_depthmap(None, outpath, imageArr, imageNameArr, compute_device, model_type, net_width, net_height, match_size, invert_depth, boost, save_depth, show_depth, show_heat, combine_output, combine_output_axis)
503+
outputs = run_depthmap(None, outpath, imageArr, imageNameArr, compute_device, model_type, net_width, net_height, match_size, invert_depth, boost, save_depth, show_depth, show_heat, combine_output, combine_output_axis, gen_stereo, gen_anaglyph, stereo_ipd, stereo_size)
400504

401505
return outputs, plaintext_to_html('info'), ''
402506

@@ -441,6 +545,14 @@ def on_ui_tabs():
441545
save_depth = gr.Checkbox(label="Save DepthMap",value=True)
442546
show_depth = gr.Checkbox(label="Show DepthMap",value=True)
443547
show_heat = gr.Checkbox(label="Show HeatMap",value=False)
548+
with gr.Group():
549+
with gr.Row():
550+
gen_stereo = gr.Checkbox(label="Generate Stereo side-by-side image",value=False)
551+
gen_anaglyph = gr.Checkbox(label="Generate Stereo anaglyph image (red/cyan)",value=False)
552+
with gr.Row():
553+
stereo_ipd = gr.Slider(minimum=5, maximum=7.5, step=0.1, label='IPD (cm)', value=6.4)
554+
stereo_size = gr.Slider(minimum=20, maximum=100, step=0.5, label='Screen Width (cm)', value=38.5)
555+
444556
with gr.Box():
445557
gr.HTML("Instructions, comment and share @ <a href='https://github.com/thygate/stable-diffusion-webui-depthmap-script'>https://github.com/thygate/stable-diffusion-webui-depthmap-script</a>")
446558

@@ -474,7 +586,11 @@ def on_ui_tabs():
474586
show_depth,
475587
show_heat,
476588
combine_output,
477-
combine_output_axis
589+
combine_output_axis,
590+
gen_stereo,
591+
gen_anaglyph,
592+
stereo_ipd,
593+
stereo_size
478594
],
479595
outputs=[
480596
result_images,

0 commit comments

Comments
 (0)