Skip to content

support for marigold #385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 14, 2023
Merged

support for marigold #385

merged 8 commits into from
Dec 14, 2023

Conversation

affromero
Copy link
Contributor

@affromero affromero commented Dec 13, 2023

Discussion: #379
Issue: #383
I continued the implementation and this is an example:

Input Leres No Boost Marigold No Boost
Leres Boosted Marigold Boosted

@affromero
Copy link
Contributor Author

Input Leres Boosted Marigold No Boost Marigold boosted


# From Marigold repository run.py
with torch.no_grad():
image = (image * 255).astype(np.uint8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to simplify by removing
image = (image * 255).astype(np.uint8)
and turning line 431

rgb = np.transpose(image, (2, 0, 1))  # [H, W, rgb] -> [rgb, H, W]
rgb_norm = rgb / 255.0

into
rgb_norm = np.transpose(image, (2, 0, 1)) # [H, W, rgb] -> [rgb, H, W]

That way we aren't multiplying and dividing 255.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale behind that is that in Marigold they resize the image, and the input of that function is PIL.Image, so I just thought of having the same option. But yeah, maybe then multiplying and dividing in L427?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does look cleaner to resize with a numpy array using something like
image = cv2.resize(image, (h, w), interpolation=cv2.INTER_CUBIC).

Although I'm not sure if there's any subtle differences between cv2 resize and PIL.Image resize.

Great changes overall!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong feeling either way.

@semjon00
Copy link
Collaborator

Hello and thank you so much for this! I will test as soon as I have time. Meanwhile I will ask for some clarifications...

@@ -94,8 +94,8 @@ def set_input_train(self, input):
self.real_A = torch.cat((self.outer, self.inner), 1)

def set_input(self, outer, inner):
inner = torch.from_numpy(inner).unsqueeze(0).unsqueeze(0)
outer = torch.from_numpy(outer).unsqueeze(0).unsqueeze(0)
inner = torch.from_numpy(inner).unsqueeze(0).unsqueeze(0).float()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you super-duper sure this does not break anything?

Copy link
Contributor Author

@affromero affromero Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether this break something to be honest. Are there tests/unittests I can run?
I modified this part because for some reason when selecting Marigold mode was raising an error because inner and outer were double tensors and the network weights are in float. Not sure why this is a particular error of marigold mode and not for the others.

@semjon00
Copy link
Collaborator

semjon00 commented Dec 13, 2023

Overall looks good. I almost did not believe that I stopped short of only 16 lines with my branch - let alone adding boost with the same 16 lines.

@semjon00
Copy link
Collaborator

My only concern is that the depthmaps are inverted. In this script, white=near convention is used. I think the only thing to fix that would be adding Marigold (10) to line 298 in depthmap_generation.py: raw_prediction_invert = self.depth_model_type in [0, 7, 8, 9]. However, after changing, please test if it would actually work nicely, also that it will it generate sensible stereoimages, etc.

@semjon00
Copy link
Collaborator

Also, would you be so kind to append README with Marigold mentions and an Acknowledgement? Before pushing Marigold support to main we will also need a version bump (misc.py) and a changelog (CHANGELOG.md) with one change.

@graemeniedermayer
Copy link
Contributor

graemeniedermayer commented Dec 13, 2023

My only concern is that the depthmaps are inverted. In this script, white=near convention is used. I think the only thing to fix that would be adding Marigold (10) to line 298 in depthmap_generation.py: raw_prediction_invert = self.depth_model_type in [0, 7, 8, 9]. However, after changing, please test if it would actually work nicely, also that it will it generate sensible stereoimages, etc.

I have tested this. It works without boost.

Maybe Marigold should be git clone into "extensions/stable-diffusion-webui-depthmap-script/Marigold" rather than "repositories/Marigold" to avoid future conflicts (this is why midas is installed there). Also I believe diffusers is now a requirement.

@semjon00
Copy link
Collaborator

@graemeniedermayer Indeed, it needs diffusers>=0.20.1, should be added to install.py. Thankfully, it seems to be a rather lightweight dependency. About conflits - I'm not sure about it... maybe?

@graemeniedermayer
Copy link
Contributor

@graemeniedermayer Indeed, it needs diffusers>=0.20.1, should be added to install.py. Thankfully, it seems to be a rather lightweight dependency. About conflits - I'm not sure about it... maybe?

Also marigold standalone mode doesn't function with the current imports.

@semjon00
Copy link
Collaborator

semjon00 commented Dec 13, 2023

Oh, supporting standalone is definitely needed - a change to requirements.txt.

@affromero
Copy link
Contributor Author

Hey folks, I added some of your suggested changes. I am not entirely sure if there was a protocol for the requirements/install files, so let me know what should I change. In particular, I use this project as standalone, so the install.py I am not sure about it.

@graemeniedermayer
Copy link
Contributor

I think this should be merged onto the marigold branch. There might be some small changes/extra testing to do before merging into main.

Great work!

@semjon00 semjon00 merged commit 128fe5b into thygate:marigold Dec 14, 2023
@semjon00
Copy link
Collaborator

Merged and did some fixes. Now it would be awesome to support automatic repository pull for standalone. Yes, we can just copy all the files into the root again... But I'd rather not have any new code linked this way.

@aulerius
Copy link

aulerius commented Dec 14, 2023

This is really good news!

I also want do draw some attention to equivalent implementation in ComfyUI, and that it implements floating point EXR export:

I added a remap node to see the full range better, and OpenEXR node to save the full range, works wonders compared to default png when used in VFX/3D modeling software.

Which relates to #372 and #370
Could it be a nice opportunity to try that as well?

@semjon00
Copy link
Collaborator

semjon00 commented Dec 14, 2023

Not now, I am reeeaaally busy. Should be doing my homework rn, in fact.

@semjon00
Copy link
Collaborator

@affromero @graemeniedermayer I can't get it to work! Please help... :(
I get things like
WARNING:xformers:WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 2.1.1+cu121 with CUDA 1201 (you have 2.1.1+cpu) Python 3.10.11 (you have 3.10.6) Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers) Memory-efficient attention, SwiGLU, sparse and more won't be available. Set XFORMERS_MORE_DETAILS=1 for more details
For whatever reason trying to have XFORMERS automatically installs the wrong torch version

GPU: 1080Ti
OS: Windows 10

@graemeniedermayer
Copy link
Contributor

graemeniedermayer commented Dec 14, 2023

Oh I just realized I tested it with a different install.py file maybe removing all the new requirements besides diffusers would be best to avoid conflicts. The others are very likely to cause conflicts with other repos. And I think the others are required by a1111

@graemeniedermayer
Copy link
Contributor

I also want do draw some attention to equivalent implementation in ComfyUI, and that it implements floating point EXR export:

It shouldn't be too challenging to save the numpy arrays is there a library for converting to EXR?

@aulerius
Copy link

I also want do draw some attention to equivalent implementation in ComfyUI, and that it implements floating point EXR export:

It shouldn't be too challenging to save the numpy arrays is there a library for converting to EXR?

Apparently so, simply called "OpenEXR".
This is how it's done in the aforementioned ComfyUI implementation.
And TIF files also support floating point 32bit precision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants