Fix Callback Tensor Inputs of the SD Controlnet Pipelines are missing some elements. #10907

CyberVy · 2025-02-25T18:17:54Z

The properties _callback_tensor_inputs of StableDiffusionControlNetImg2ImgPipeline,StableDiffusionControlNetInpaintPipeline are missing an important element controlnet_image, which makes it impossible to retrieve controlnet image from callback_kwargs of callback_on_step_end in these pipelines.

StableDiffusionControlNetPipeline is a bit of different, because it is missing image, which is another name of control image in this pipeline.

This PR is to fix the bug above.

sanggusti

I've get similar problem which I'm unable to get image from controlnet as callback in my logging for alignment purpose. This fix resolves it.

CyberVy · 2025-02-26T04:08:44Z

`image` or `control_image`

I've also noticed that the name of the control image differs in different text-to-image ControlNet pipelines.

For some old models like SD, SDXL, the control image is called image.
For SD3 and Flux, the control image is called contol_image.

AFAIK the element of the control image in ._callback_tensor_inputs depends on its parameter name in the __call__ API.

Here are the parameter names of StableDiffusionControlNetPipeline.__call__, the control image is called image .

@torch.no_grad()
@replace_example_docstring(EXAMPLE_DOC_STRING)
def __call__(
    self,
    prompt: Union[str, List[str]] = None,
    image: PipelineImageInput = None, # Please look at here, the parameter name of control image is image.
    height: Optional[int] = None,
    width: Optional[int] = None,
    num_inference_steps: int = 50,
    timesteps: List[int] = None,
    sigmas: List[float] = None,
    guidance_scale: float = 7.5,
    negative_prompt: Optional[Union[str, List[str]]] = None,
    num_images_per_prompt: Optional[int] = 1,
    eta: float = 0.0,
    generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
    latents: Optional[torch.Tensor] = None,
    prompt_embeds: Optional[torch.Tensor] = None,
    negative_prompt_embeds: Optional[torch.Tensor] = None,
    ip_adapter_image: Optional[PipelineImageInput] = None,
    ip_adapter_image_embeds: Optional[List[torch.Tensor]] = None,
    output_type: Optional[str] = "pil",
    return_dict: bool = True,
    cross_attention_kwargs: Optional[Dict[str, Any]] = None,
    controlnet_conditioning_scale: Union[float, List[float]] = 1.0,
    guess_mode: bool = False,
    control_guidance_start: Union[float, List[float]] = 0.0,
    control_guidance_end: Union[float, List[float]] = 1.0,
    clip_skip: Optional[int] = None,
    callback_on_step_end: Optional[
        Union[Callable[[int, int, Dict], None], PipelineCallback, MultiPipelineCallbacks]
    ] = None,
    callback_on_step_end_tensor_inputs: List[str] = ["latents"],
    **kwargs,
):...

Here are the parameter names of StableDiffusion3ControlNetPipeline.__call__, the control image is called control_image .

@torch.no_grad()
@replace_example_docstring(EXAMPLE_DOC_STRING)
def __call__(
    self,
    prompt: Union[str, List[str]] = None,
    prompt_2: Optional[Union[str, List[str]]] = None,
    prompt_3: Optional[Union[str, List[str]]] = None,
    height: Optional[int] = None,
    width: Optional[int] = None,
    num_inference_steps: int = 28,
    sigmas: Optional[List[float]] = None,
    guidance_scale: float = 7.0,
    control_guidance_start: Union[float, List[float]] = 0.0,
    control_guidance_end: Union[float, List[float]] = 1.0,

    control_image: PipelineImageInput = None, # Please look at here, the parameter name of control image is control_image.
    
    controlnet_conditioning_scale: Union[float, List[float]] = 1.0,
    controlnet_pooled_projections: Optional[torch.FloatTensor] = None,
    negative_prompt: Optional[Union[str, List[str]]] = None,
    negative_prompt_2: Optional[Union[str, List[str]]] = None,
    negative_prompt_3: Optional[Union[str, List[str]]] = None,
    num_images_per_prompt: Optional[int] = 1,
    generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
    latents: Optional[torch.FloatTensor] = None,
    prompt_embeds: Optional[torch.FloatTensor] = None,
    negative_prompt_embeds: Optional[torch.FloatTensor] = None,
    pooled_prompt_embeds: Optional[torch.FloatTensor] = None,
    negative_pooled_prompt_embeds: Optional[torch.FloatTensor] = None,
    output_type: Optional[str] = "pil",
    return_dict: bool = True,
    joint_attention_kwargs: Optional[Dict[str, Any]] = None,
    clip_skip: Optional[int] = None,
    callback_on_step_end: Optional[Callable[[int, int, Dict], None]] = None,
    callback_on_step_end_tensor_inputs: List[str] = ["latents"],
    max_sequence_length: int = 256,
):...

I think different API standards can cause some confusion. However, changing them is not easy because some people are already using them, and the changes may cause issues for them.

image or control_image, it will also have some impact on this PR.

…puts

CyberVy · 2025-02-26T04:27:00Z

BTW SD3 and Flux have the same issue that this PR is trying to fix.
I'll fix them in other PRs.

asomoza · 2025-02-26T18:03:27Z

thanks again, we're aware of the different name in the pipelines, that's why the new ones have a more uniform name instead of a changing one between the controlnet and non controlnet pipelines. This is something we're going to fix eventually but it's a breaking change so we have to plan for it and probably we will wait for modular diffusers first.

HuggingFaceDocBuilderDev · 2025-02-26T18:09:53Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

asomoza · 2025-02-26T18:36:26Z

failing test is not related to this PR.

CyberVy · 2025-02-27T21:26:57Z

thanks again, we're aware of the different name in the pipelines, that's why the new ones have a more uniform name instead of a changing one between the controlnet and non controlnet pipelines. This is something we're going to fix eventually but it's a breaking change so we have to plan for it and probably we will wait for modular diffusers first.

Thank you! @asomoza
Recently I want to fix the classes in diffusers.fallback, which requires me to address the issue in this PR first.
However if the names are not uniform, it will be impossible to fix them elegantly.
So I think I'll do that sometime later.

CyberVy added 4 commits February 26, 2025 00:51

Update pipeline_controlnet_img2img.py

50c9009

Update pipeline_controlnet_inpaint.py

989a8e7

Update pipeline_controlnet.py

52b36df

Merge branch 'main' into sd_controlnet_callback_tensor_inputs

6a46926

sanggusti approved these changes Feb 26, 2025

View reviewed changes

Merge branch 'huggingface:main' into sd_controlnet_callback_tensor_in…

5811782

…puts

asomoza approved these changes Feb 26, 2025

View reviewed changes

Merge branch 'main' into sd_controlnet_callback_tensor_inputs

0958ee2

asomoza merged commit 9a8e8db into huggingface:main Feb 26, 2025
11 of 12 checks passed

CyberVy deleted the sd_controlnet_callback_tensor_inputs branch February 27, 2025 21:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Callback Tensor Inputs of the SD Controlnet Pipelines are missing some elements. #10907

Fix Callback Tensor Inputs of the SD Controlnet Pipelines are missing some elements. #10907

Uh oh!

CyberVy commented Feb 25, 2025

Uh oh!

sanggusti left a comment

Uh oh!

CyberVy commented Feb 26, 2025 •

edited

Loading

Uh oh!

CyberVy commented Feb 26, 2025

Uh oh!

asomoza commented Feb 26, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Feb 26, 2025

Uh oh!

asomoza commented Feb 26, 2025

Uh oh!

Uh oh!

CyberVy commented Feb 27, 2025

Uh oh!

Uh oh!

Fix Callback Tensor Inputs of the SD Controlnet Pipelines are missing some elements. #10907

Fix Callback Tensor Inputs of the SD Controlnet Pipelines are missing some elements. #10907

Uh oh!

Conversation

CyberVy commented Feb 25, 2025

Uh oh!

sanggusti left a comment

Choose a reason for hiding this comment

Uh oh!

CyberVy commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

image or control_image

Uh oh!

CyberVy commented Feb 26, 2025

Uh oh!

asomoza commented Feb 26, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Feb 26, 2025

Uh oh!

asomoza commented Feb 26, 2025

Uh oh!

Uh oh!

CyberVy commented Feb 27, 2025

Uh oh!

Uh oh!

CyberVy commented Feb 26, 2025 •

edited

Loading

`image` or `control_image`