Skip to content

Lower VRAM usage in CPU offload for Flux ControlNet Pipeline #10790

Open
@NielsPichon

Description

@NielsPichon

Is your feature request related to a problem? Please describe.

I have a 24GB VRAM GPU. When running a diffusion model like Flux1, I can barely fit the model in memory during inference with batch size 1. Enabling CPU offload does not help because the offload does not occur between the controlnet forward pass and the transformer foward pass (which makes sense perfromance-wise).

I would be great to enable offloading between controlnet call and transformer denoising steps (or any other auxiliary model that does not currently get offloaded in the middle of the denoising process) to further reduce VRAM requirements.

Describe the solution you'd like.

What I would suggest is having a "slow" offload mode where the models do get offloaded to CPU, even if it is really slow.

def enable_sequential_cpu_offload(self, gpu_id: Optional[int] = None, device: Union[torch.device, str] = "cuda", enable_slow_mode: bool = False)
    ...

For instance, in the image-to-image pipeline on line 927:

                controlnet_block_samples, controlnet_single_block_samples = self.controlnet(
                       ....
                )
          
                if self._enable_slow_cpu_offload:
                    self.maybe_free_model_hooks()

                ...

                noise_pred = self.transformer(
                    ...
                )[0]

                if self._enable_slow_cpu_offload:
                    self.maybe_free_model_hooks()

Describe alternatives you've considered.

I am not sure there are alternatives if the usage of these models is to be allowed at the desired fp precision (in my case bfloat16).

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleIssues that haven't received updates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions