Skip to content

Fix Callback Tensor Inputs of the SD Controlnet Pipelines are missing some elements. #10907

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

CyberVy
Copy link
Contributor

@CyberVy CyberVy commented Feb 25, 2025

The properties _callback_tensor_inputs of StableDiffusionControlNetImg2ImgPipeline,StableDiffusionControlNetInpaintPipeline are missing an important element controlnet_image, which makes it impossible to retrieve controlnet image from callback_kwargs of callback_on_step_end in these pipelines.

StableDiffusionControlNetPipeline is a bit of different, because it is missing image, which is another name of control image in this pipeline.

This PR is to fix the bug above.

Copy link

@sanggusti sanggusti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've get similar problem which I'm unable to get image from controlnet as callback in my logging for alignment purpose. This fix resolves it.

@CyberVy
Copy link
Contributor Author

CyberVy commented Feb 26, 2025

image or control_image

I've also noticed that the name of the control image differs in different text-to-image ControlNet pipelines.

For some old models like SD, SDXL, the control image is called image.
For SD3 and Flux, the control image is called contol_image.

AFAIK the element of the control image in ._callback_tensor_inputs depends on its parameter name in the __call__ API.

Here are the parameter names of StableDiffusionControlNetPipeline.__call__, the control image is called image .

@torch.no_grad()
@replace_example_docstring(EXAMPLE_DOC_STRING)
def __call__(
    self,
    prompt: Union[str, List[str]] = None,
    image: PipelineImageInput = None, # Please look at here, the parameter name of control image is image.
    height: Optional[int] = None,
    width: Optional[int] = None,
    num_inference_steps: int = 50,
    timesteps: List[int] = None,
    sigmas: List[float] = None,
    guidance_scale: float = 7.5,
    negative_prompt: Optional[Union[str, List[str]]] = None,
    num_images_per_prompt: Optional[int] = 1,
    eta: float = 0.0,
    generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
    latents: Optional[torch.Tensor] = None,
    prompt_embeds: Optional[torch.Tensor] = None,
    negative_prompt_embeds: Optional[torch.Tensor] = None,
    ip_adapter_image: Optional[PipelineImageInput] = None,
    ip_adapter_image_embeds: Optional[List[torch.Tensor]] = None,
    output_type: Optional[str] = "pil",
    return_dict: bool = True,
    cross_attention_kwargs: Optional[Dict[str, Any]] = None,
    controlnet_conditioning_scale: Union[float, List[float]] = 1.0,
    guess_mode: bool = False,
    control_guidance_start: Union[float, List[float]] = 0.0,
    control_guidance_end: Union[float, List[float]] = 1.0,
    clip_skip: Optional[int] = None,
    callback_on_step_end: Optional[
        Union[Callable[[int, int, Dict], None], PipelineCallback, MultiPipelineCallbacks]
    ] = None,
    callback_on_step_end_tensor_inputs: List[str] = ["latents"],
    **kwargs,
):...

Here are the parameter names of StableDiffusion3ControlNetPipeline.__call__, the control image is called control_image .

@torch.no_grad()
@replace_example_docstring(EXAMPLE_DOC_STRING)
def __call__(
    self,
    prompt: Union[str, List[str]] = None,
    prompt_2: Optional[Union[str, List[str]]] = None,
    prompt_3: Optional[Union[str, List[str]]] = None,
    height: Optional[int] = None,
    width: Optional[int] = None,
    num_inference_steps: int = 28,
    sigmas: Optional[List[float]] = None,
    guidance_scale: float = 7.0,
    control_guidance_start: Union[float, List[float]] = 0.0,
    control_guidance_end: Union[float, List[float]] = 1.0,

    control_image: PipelineImageInput = None, # Please look at here, the parameter name of control image is control_image.
    
    controlnet_conditioning_scale: Union[float, List[float]] = 1.0,
    controlnet_pooled_projections: Optional[torch.FloatTensor] = None,
    negative_prompt: Optional[Union[str, List[str]]] = None,
    negative_prompt_2: Optional[Union[str, List[str]]] = None,
    negative_prompt_3: Optional[Union[str, List[str]]] = None,
    num_images_per_prompt: Optional[int] = 1,
    generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
    latents: Optional[torch.FloatTensor] = None,
    prompt_embeds: Optional[torch.FloatTensor] = None,
    negative_prompt_embeds: Optional[torch.FloatTensor] = None,
    pooled_prompt_embeds: Optional[torch.FloatTensor] = None,
    negative_pooled_prompt_embeds: Optional[torch.FloatTensor] = None,
    output_type: Optional[str] = "pil",
    return_dict: bool = True,
    joint_attention_kwargs: Optional[Dict[str, Any]] = None,
    clip_skip: Optional[int] = None,
    callback_on_step_end: Optional[Callable[[int, int, Dict], None]] = None,
    callback_on_step_end_tensor_inputs: List[str] = ["latents"],
    max_sequence_length: int = 256,
):...

I think different API standards can cause some confusion. However, changing them is not easy because some people are already using them, and the changes may cause issues for them.

image or control_image, it will also have some impact on this PR.

@CyberVy
Copy link
Contributor Author

CyberVy commented Feb 26, 2025

BTW SD3 and Flux have the same issue that this PR is trying to fix.
I'll fix them in other PRs.

@asomoza
Copy link
Member

asomoza commented Feb 26, 2025

thanks again, we're aware of the different name in the pipelines, that's why the new ones have a more uniform name instead of a changing one between the controlnet and non controlnet pipelines. This is something we're going to fix eventually but it's a breaking change so we have to plan for it and probably we will wait for modular diffusers first.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@asomoza
Copy link
Member

asomoza commented Feb 26, 2025

failing test is not related to this PR.

@asomoza asomoza merged commit 9a8e8db into huggingface:main Feb 26, 2025
11 of 12 checks passed
@CyberVy CyberVy deleted the sd_controlnet_callback_tensor_inputs branch February 27, 2025 21:17
@CyberVy
Copy link
Contributor Author

CyberVy commented Feb 27, 2025

thanks again, we're aware of the different name in the pipelines, that's why the new ones have a more uniform name instead of a changing one between the controlnet and non controlnet pipelines. This is something we're going to fix eventually but it's a breaking change so we have to plan for it and probably we will wait for modular diffusers first.

Thank you! @asomoza
Recently I want to fix the classes in diffusers.fallback, which requires me to address the issue in this PR first.
However if the names are not uniform, it will be impossible to fix them elegantly.
So I think I'll do that sometime later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants