Skip to content

Create videos from the API #378

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 9, 2023
Merged

Create videos from the API #378

merged 1 commit into from
Dec 9, 2023

Conversation

davidmartinrius
Copy link
Contributor

@davidmartinrius davidmartinrius commented Dec 3, 2023

Hello!

How are you doing? 😊 My name is David Martin Rius and I added a new functionality to create a video from an image with the API.

It does not affect the current extension behaviour. It is just an enhancement.

The endpoint is the same, /depth/generate I added a new option "RUN_MAKEVIDEO_API" in common_constants.py (available when calling /depth/get_options)

If this code is ok I will create a new endpoint in https://github.com/mix1009/sdwebuiapi to integrate it. If you don't know what is this other project, it is by now the most advanced and open source API for automatic1111.

This object "video_parameters" has this properties:

        'mesh_fi_filename': "/your/stablediffusionui/automatic1111/stable-diffusion-webui/outputs/extras-images/depthmap-0026.obj",
        "vid_numframes": 300,
        "vid_fps": 40,
        "vid_traj": 1,
        "vid_shift": "-0.015, 0.0, -0.05",
        "vid_border": "0.03, 0.03, 0.05, 0.03",
        "dolly": False,
        "vid_format": "mp4", #vid_format and output_filename extension must match
        "vid_ssaa": 3,
        "output_filename": "/your/desired/output/path/filename.mp4"

Important:

  1. By now the output filename will have an extra underscore. For example if passed /my/folder/video.mp4 the ouput will be /my/folder/video.mp4_

  2. The property "mesh_fi_filename" is optional, Is not needed. Also can be None or a path of a .obj file.
    If you already have an .obj file it will create a video much faster. Creating a mesh is the lowest part of the process, so I recommend using it if you are rendering multiple videos.

On the other hand, if any of this properties is required and not passed is controlled by an exception inside run_makevideo_api function and will tell you what is missing.
The function run_makevideo_api can be found in scr/core.py

You can use this snippet example to create a video from the API:

import requests
import base64
from PIL import Image
import json

image_path = "/path/to/your/image.jpg"
image = Image.open(image_path)

available_models = {
    'dpt_beit_large_512': 1, #midas 3.1
    'dpt_beit_large_384': 2, #midas 3.1
    'dpt_large_384': 3, #midas 3.0
    'dpt_hybrid_384': 4, #midas 3.0
    'midas_v21': 5,
    'midas_v21_small': 6,
    'zoedepth_n': 7, #indoor
    'zoedepth_k': 8, #outdoor
    'zoedepth_nk': 9,
}

if __name__ == '__main__':
    with open(image_path, "rb") as image_file:
        img = base64.b64encode(image_file.read()).decode()
    url = 'http://127.0.0.1:7860/depth/generate'
    dics = {
        "depth_input_images": [img],
        "options": {
            "compute_device": "GPU",
            "boost": True,
            "model_type": available_models['midas_v21'], #can be an integer or a string
            "video_parameters": {
                # The property "mesh_fi_filename" is optional, Is not needed. Also can be None or a path of a .obj file.
                # If you already have an .obj file it will create a video much faster. Creating a mesh is the lowest part of the process, so I recommend using it if you are rendering multiple videos.
                #'mesh_fi_filename': None, #optional
                'mesh_fi_filename': "/your/stablediffusionui/automatic1111/stable-diffusion-webui/outputs/extras-images/depthmap-0026.obj",
                "vid_numframes": 300,
                "vid_fps": 40,
                "vid_traj": 1,
                "vid_shift": "-0.015, 0.0, -0.05",
                "vid_border": "0.03, 0.03, 0.05, 0.03",
                "dolly": False,
                "vid_format": "mp4", #vid_format and output_filename extension must match
                "vid_ssaa": 3,
                "output_filename": "/your/desired/output/path/filename.mp4"
            }
        }
    }

    x = requests.post(url, json=dics)
    response = json.loads(x.text)

Thank you! Enjoy it! 😊

David Martin Rius

@semjon00
Copy link
Collaborator

semjon00 commented Dec 4, 2023

Hello! Overall, quite good! However, for code extendability, a slightly different approach is required. I try to keep the code_generation_funnel as separated from the API as possible. I will give some specific things to look for using code review. This functionality is a great addition, I like that it does not break anything.

Also a question: is it possible to later make the code support strings for model selection? Like, could we have a parameter that could be either int or a string? If not, do you think it would be a reasonable thing to do, to have it as string from the beginning and later add support for model names? Not sure if anybody wants to add it (would certainly be welcome), but just would be nice to work around this potential backward-compatibility issue.

@davidmartinrius
Copy link
Contributor Author

davidmartinrius commented Dec 4, 2023

Well, if you wanted I can create a specific endpoint for this functionality like /depth/generate/video and do not use the code_generation_funnel, but implement a separated function for that endpoint.
Or better, if you could explain what exactly would you want I could try to program it.

About this question "is it possible to later make the code support strings for model selection? "

I do not understand what do you refer. Could you explain with more details and examples, please? 🤔

Thank you!

@davidmartinrius
Copy link
Contributor Author

Ah, I suppose you meant this:
available_models = {
'dpt_beit_large_512': 1, #midas 3.1
'dpt_beit_large_384': 2, #midas 3.1
'dpt_large_384': 3, #midas 3.0
'dpt_hybrid_384': 4, #midas 3.0
'midas_v21': 5,
'midas_v21_small': 6,
'zoedepth_n': 7, #indoor
'zoedepth_k': 8, #outdoor
'zoedepth_nk': 9,
}

To be available inside the API and pass a string of the model name or pass an integer that matches that model, it that right?

Like
"model_type": "dpt_beit_large_512"
OR
"model_type": 1

?

@semjon00
Copy link
Collaborator

semjon00 commented Dec 4, 2023

So basically depthmap_api.py should "wrap around" the core and ask it to do stuff, and then transform this stuff for API-specific needs. But in this state of MR, its leaking a bit - core is "aware" of the API stuff happening.

@semjon00
Copy link
Collaborator

semjon00 commented Dec 4, 2023

Exactly. Sorry, did not sleep well tonight 🥶

@semjon00
Copy link
Collaborator

semjon00 commented Dec 4, 2023

| if you wanted I can create a specific endpoint for this functionality like /depth/generate/video
Not sure... Full disclosure: i touched most of the original code in this project, but it was some time ago... Can't be very sure of everything right now, need to think a bit. Looking for maintainers :)

@davidmartinrius
Copy link
Contributor Author

So basically depthmap_api.py should "wrap around" the core and ask it to do stuff, and then transform this stuff for API-specific needs. But in this state of MR, its leaking a bit - core is "aware" of the API stuff happening.

Yes, I am aware of that. I just followed the order of what was already programmed. It is clear that the API should wrap around the core and not be mixed into it.

Although I don't know the project enough to make such big changes. So I continued in the order in which things were already done.

I think restructuring the API and refactoring it requires another separate pull request and I don't know if I would be able to do it without very clear instructions on how you would want it.

@semjon00
Copy link
Collaborator

semjon00 commented Dec 4, 2023

I feel the same way, that a big task. Then just please try to make the core_generation_funnel not call the API specific code and then I think we can call it a day. Parameter names, optimal design decisions, et cetera can wait I suppose.

@davidmartinrius
Copy link
Contributor Author

| if you wanted I can create a specific endpoint for this functionality like /depth/generate/video Not sure... Full disclosure: i touched most of the original code in this project, but it was some time ago... Can't be very sure of everything right now, need to think a bit. Looking for maintainers :)

Absoulutely! :D You have a much better vision than me. If you want to do it, it will be appreciated.

@davidmartinrius
Copy link
Contributor Author

I have tried to separate the code from lines 345 to 353 into a function outside of the core_generation_funnel. Although I haven't found a clean way to do it. In any case, I need variables set in the core_funnel to process the video later. So, the only way I found is to yield that variables instead of calling run_makevideo_api. And in another part of the code call run_makevideo_api. But I think is not a good approach..

Please, could you suggest me to abstract that part into another function, so that the core doesn't get mixed up with the API tasks?

Thank you!

@graemeniedermayer
Copy link
Contributor

You might be able to use a similar method to run_makevideo in common_ui.py. Maybe you could use the if line 336 inp[go.GEN_INPAINTED_MESH]: to generate a mesh from the core funnel and then run_makevideo afterwards in the API code. It does feel like inpainted mesh generation should be extracted into a separate function core_generation_funnel (but this seems like a separate task). Is that the key issue, the inpainted mesh generation requiring variables from the core_funnel?

It does also feel like the second part of run_makevideo_api could call run_makevideo directly to reduce code repetition. I might be missing the difference.

Great work!

@davidmartinrius
Copy link
Contributor Author

Ok, I'll do it this way. As soon as programmed I'll let you know. Thanks!

@davidmartinrius
Copy link
Contributor Author

I made several changes.

  1. The model_type can be an integer or a string. The API will manage it.
  2. I removed the extra code from the core_generation_funnel
  3. The function run_makevideo_api has been removed, there is only run_makevideo and now it can receive 2 extra nullable parameters: outpath and basename. So, when an user call the api can pass the desired output folder and file name. (It adds an extra underscore that I can't control without making too much changes. For example if I pass /my/folder/video.mp4 the ouput will be /my/folder/video_.mp4) This is because /inpaint/mesh.py output_3d_photo adds an extra underscore. I prefered not to touch anything else because it would increase the complexity.
  4. I updated the first message in this thread I changed several parameters. So users can copy paste the code and will work.
  5. I only call core_generation_funnel to generate the mesh

So, the core is not mixed with the api anymore when generating videos from the api.

What do you think about this changes?

Thank you!

David Martin Rius

@semjon00 semjon00 self-requested a review December 9, 2023 19:13
@semjon00
Copy link
Collaborator

semjon00 commented Dec 9, 2023

Hello again :)

I did not really get in to all the details of the code, but overall (at least architecturally), it looks very, very good 👍
I will give it some more attention and then merge.

This commit adds a new API endpoint for generating depth maps from input images in a video format. The endpoint supports various depth model options, including different pre-trained models. It also validates and processes video parameters such as number of frames, frames per second, trajectory, shift, border, dolly, format, and super-sampling anti-aliasing.

The commit includes error handling for missing input images, invalid model types, and required video parameters. Additionally, it checks if a mesh file already exists, and if not, it generates a new one. The generated mesh is then used to create a depth video based on the specified parameters.

See more information in the pull request description.
@semjon00 semjon00 merged commit 4887a9c into thygate:main Dec 9, 2023
@semjon00
Copy link
Collaborator

semjon00 commented Dec 9, 2023

Indeed, very good code! Merging with next to no modifications. There are some security risks from exposing this functionality in API, but since we never advertised the API as something that can be made accessible from the internet, this is ok.

Thank you so much for contributing this code ❤️
I am happy that now this project can be more useful for people.
I would be glad to collaborate with you more, choose you to create more code for this project 😊
Feel free to let me know if something does not work right.

@davidmartinrius
Copy link
Contributor Author

Indeed, very good code! Merging with next to no modifications. There are some security risks from exposing this functionality in API, but since we never advertised the API as something that can be made accessible from the internet, this is ok.

Thank you so much for contributing this code ❤️ I am happy that now this project can be more useful for people. I would be glad to collaborate with you more, choose you to create more code for this project 😊 Feel free to let me know if something does not work right.

Hi @semjon00 !!

I am very glad to contribute. Thank you so much for merging the code.

  1. Now I will create a new endpoint to https://github.com/mix1009/sdwebuiapi in this way, this plugin will be easier to use via API for any user. (For both endpoints available in the API)

  2. On the other hand, I am trying to make the mesh generation to work with pytorch instead numpy to accelerate the mesh generation process. Actually some mesh generation steps already use pytorch, but not in a optimal way and do not use pytorch everywhere. (I refer to the code inside inpaint/mesh.py, inpaint/mesh_tools.py, inpaint/bilateral_filtering.py, etc)
    And because of that the mesh generation takes too much time. It uses the cpu because of numpy. So, my idea is to migrate to pytorch tensors. Is not an easy task and it requires to modify the code in blocks and very carefully. Maybe there are other reasons that affect to the performance, but I still do not know.

Do you know the critical points when generating a 3D mesh? I mean that ones that specially slow down the process and could use the gpu to speed up the process.

If you had to do it where would you start?

Thank you!

David Martin Rius

@semjon00
Copy link
Collaborator

semjon00 commented Dec 9, 2023

If you had to do it where would you start?

Honestly, I am not even exactly sure what magic happens over there. @thygate is the person who added this into the script - I think he borrowed this code from somewhere else - he knows how it works better than me. The proper way to change the code would be to find the upstream and contribute there (granted, hopefully that repository is still maintained), and then tweak this repository to use the newest upstream version.

@davidmartinrius
Copy link
Contributor Author

davidmartinrius commented Dec 10, 2023

I think that code may come from https://github.com/vt-vl-lab/3d-photo-inpainting and also facebookresearch. Unfortunately, the last updates are from 3 years ago. So, this is the most updated repository altough it is almost a copy&paste of that project if I am not wrong.

Maybe @thygate has an idea on how to do it.

Thank you!

@semjon00
Copy link
Collaborator

Right, I remember now. Indeed, then the most reasonable thing to do is to just patch it here. After the fact I/we might try to find other "forks" and MR the changes there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants