Skip to content

[modular diffusers] introducing ModularLoader #11462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

yiyixuxu
Copy link
Collaborator

@yiyixuxu yiyixuxu commented Apr 30, 2025

ComponentSpec

ComponentSpec was created to enable the "lazy loading" behavior needed for the Modular Diffusers system. This means you can define PipelineBlocks with their associated models, but the models are only created later when they're actually needed after blocks are assembled and ready to be executed.

to create a ComponentSpec

from diffusers.pipelines.modular_pipeline_utils import ComponentSpec
from diffusers import UNet2DConditionModel
unet_spec = ComponentSpec(
    name="unet",
    type_hint=UNet2DConditionModel,
    repo="stabilityai/stable-diffusion-xl-base-1.0",
    subfolder="unet",
)

we use it to define expected_components for pipeline blocks

class MyCustomStep(pipelineBlock):

    @property
    def inputs(self)
          return [...]
    
    @property
    def expected_components(self) -> List[ComponentSpec]:
        return [
            ComponentSpec("my_model", ModelXX),
         ]
    ....

it is just a dataclass, but it can load itself when needed, e.g. this would create a SDXL unet

unet = unet_spec.create()

additionally, each component created by a ComponentSpec is tagged with a unique _diffusers_load_id attribute that encodes its creation parameters.

# this prints "stabilityai/stable-diffusion-xl-base-1.0|unet|null|nul"
unet._diffusers_load_id

load_id is pretty useful for model management, e.g. our component manager system use this info to prevent duplicated loading; it also allows us to very easily build, packaging, and porting entire pipelines with guaranteed component identity

# A component can be recreated from its load_id
unet_spec_recreated = ComponentSpec.from_component(unet)
unet_recreated = unet_spec_recreated.create()

ModularLoader

Overview

ModularLoader is designed to work with a new modular_model_index.json. Unlike traditional model_index.json that only loads models from subfolders within a single repo, ModularLoader lets you reference components across different repositories for maximum flexibility.

a modular_model_index.json looks like this (an example https://huggingface.co/YiYiXu/modular-loader-t2i/blob/main/modular_model_index.json)
each components contains 3 key elements: library, class and loading specs map {}

"text_encoder": [
  null, # library (same as model_index.json)
  null, # class (same as model_index.json)
  {  # loading specs map, this is new in modular_model_index.json
    "repo": "stabilityai/stable-diffusion-xl-base-1.0", # cam be a different repo
    "revision": null,
    "subfolder": "text_encoder",
    "type_hint": [ # (library, class) for the expected class 
      "transformers",  
      "CLIPTextModel"
    ],
    "variant": null
  }
],

You can load like this

# Simple usage
from diffusers import ModularLoader 
# Only loads specs by default (spec_only=True)
loader = ModularLoader.from_pretrained("YiYiXu/modular-loader-t2i", spec_only=True)
# Components actually loaded only you hit `load()`
loader.load()

Use with DiffusionPipeline

the components will be registered to the loader along with config, just like DiffusionPipeline - in fact, they are compatible, you can use from_pipe to transfer all the contents to a regular DiffusionPipeline

# Compatible with standard pipelines
from diffusers import StableDiffusionXLPipeline
pipe = StableDiffusionXLPipeline.from_pipe(loader, torch_dtype=dtype)
pipe.to(device)

# Use as normal
prompt = "A crystal orb resting on a wooden table with a yellow rubber duck..."
image = pipe(prompt=prompt, num_inference_steps=25).images[0]
image.save("yiyi_test_6.png")

related to #10413

Using ModularLoader (Basics)

create a simple modular pipeline

from diffusers.pipelines.modular_pipeline import SequentialPipelineBlocks
from diffusers.pipelines.stable_diffusion_xl.pipeline_stable_diffusion_xl_modular import TEXT2IMAGE_BLOCKS

t2i_blocks = TEXT2IMAGE_BLOCKS.copy()
class Text2ImageBlocks(SequentialPipelineBlocks):
    block_classes = list(t2i_blocks.values())
    block_names = list(t2i_blocks.keys())

t2i = Text2ImageBlocks()

this is just the SDXL text-to-image pipeline but built with modular blocks. But since we had defined the models associated with each block with ComponentSpecs, when we put these blocks together into our final pipeline, we also assembled their model requirements too, check out its expected_components attribute;

for c in t2i.expected_components:
    print(c)
    print(" -----")
(Click to see the outputs)
ComponentSpec(name='text_encoder', type_hint=<class 'transformers.models.clip.modeling_clip.CLIPTextModel'>, description=None, config=None, repo=None, subfolder=None, variant=None, revision=None, default_creation_method='from_pretrained')
 -----
ComponentSpec(name='text_encoder_2', type_hint=<class 'transformers.models.clip.modeling_clip.CLIPTextModelWithProjection'>, description=None, config=None, repo=None, subfolder=None, variant=None, revision=None, default_creation_method='from_pretrained')
 -----
ComponentSpec(name='tokenizer', type_hint=<class 'transformers.models.clip.tokenization_clip.CLIPTokenizer'>, description=None, config=None, repo=None, subfolder=None, variant=None, revision=None, default_creation_method='from_pretrained')
 -----
ComponentSpec(name='tokenizer_2', type_hint=<class 'transformers.models.clip.tokenization_clip.CLIPTokenizer'>, description=None, config=None, repo=None, subfolder=None, variant=None, revision=None, default_creation_method='from_pretrained')
 -----
ComponentSpec(name='guider', type_hint=<class 'diffusers.guiders.classifier_free_guidance.ClassifierFreeGuidance'>, description=None, config=FrozenDict([('guidance_scale', 7.5)]), repo=None, subfolder=None, variant=None, revision=None, default_creation_method='from_config')
 -----
ComponentSpec(name='image_encoder', type_hint=<class 'transformers.models.clip.modeling_clip.CLIPVisionModelWithProjection'>, description=None, config=None, repo=None, subfolder=None, variant=None, revision=None, default_creation_method='from_pretrained')
 -----
ComponentSpec(name='feature_extractor', type_hint=<class 'transformers.models.clip.image_processing_clip.CLIPImageProcessor'>, description=None, config=FrozenDict([('size', 224), ('crop_size', 224)]), repo=None, subfolder=None, variant=None, revision=None, default_creation_method='from_config')
 -----
ComponentSpec(name='unet', type_hint=<class 'diffusers.models.unets.unet_2d_condition.UNet2DConditionModel'>, description=None, config=None, repo=None, subfolder=None, variant=None, revision=None, default_creation_method='from_pretrained')
 -----
ComponentSpec(name='scheduler', type_hint=<class 'diffusers.schedulers.scheduling_euler_discrete.EulerDiscreteScheduler'>, description=None, config=None, repo=None, subfolder=None, variant=None, revision=None, default_creation_method='from_pretrained')
 -----
ComponentSpec(name='vae', type_hint=<class 'diffusers.models.autoencoders.autoencoder_kl.AutoencoderKL'>, description=None, config=None, repo=None, subfolder=None, variant=None, revision=None, default_creation_method='from_pretrained')
 -----
ComponentSpec(name='image_processor', type_hint=<class 'diffusers.image_processor.VaeImageProcessor'>, description=None, config=FrozenDict([('vae_scale_factor', 8)]), repo=None, subfolder=None, variant=None, revision=None, default_creation_method='from_config')
 -----

setup the built-in modular loader

Modular Pipeline blocks has a setup_loader method, it can create a build-in ModularLoader based on the requirement of the pipeline, and you can access the loader via the loader attribute

t2i.setup_loader()
t2i.loader

It is pretty empty now, because we did not specify repo info when we defined pipelines, but you can see that it already has the name and type for each component, matching with what's in its expected_components

(Click to see the outputs)
StableDiffusionXLModularLoader {
  "_class_name": "StableDiffusionXLModularLoader",
  "_diffusers_version": "0.33.0.dev0",
  "force_zeros_for_empty_prompt": true,
  "image_encoder": [
    null,
    null,
    {
      "repo": null,
      "revision": null,
      "subfolder": null,
      "type_hint": [
        "transformers",
        "CLIPVisionModelWithProjection"
      ],
      "variant": null
    }
  ],
  "scheduler": [
    null,
    null,
    {
      "repo": null,
      "revision": null,
      "subfolder": null,
      "type_hint": [
        "diffusers",
        "EulerDiscreteScheduler"
      ],
      "variant": null
    }
  ],
  "text_encoder": [
    null,
    null,
    {
      "repo": null,
      "revision": null,
      "subfolder": null,
      "type_hint": [
        "transformers",
        "CLIPTextModel"
      ],
      "variant": null
    }
  ],
  "text_encoder_2": [
    null,
    null,
    {
      "repo": null,
      "revision": null,
      "subfolder": null,
      "type_hint": [
        "transformers",
        "CLIPTextModelWithProjection"
      ],
      "variant": null
    }
  ],
  "tokenizer": [
    null,
    null,
    {
      "repo": null,
      "revision": null,
      "subfolder": null,
      "type_hint": [
        "transformers",
        "CLIPTokenizer"
      ],
      "variant": null
    }
  ],
  "tokenizer_2": [
    null,
    null,
    {
      "repo": null,
      "revision": null,
      "subfolder": null,
      "type_hint": [
        "transformers",
        "CLIPTokenizer"
      ],
      "variant": null
    }
  ],
  "unet": [
    null,
    null,
    {
      "repo": null,
      "revision": null,
      "subfolder": null,
      "type_hint": [
        "diffusers",
        "UNet2DConditionModel"
      ],
      "variant": null
    }
  ],
  "vae": [
    null,
    null,
    {
      "repo": null,
      "revision": null,
      "subfolder": null,
      "type_hint": [
        "diffusers",
        "AutoencoderKL"
      ],
      "variant": null
    }
  ]
}

we can add a modular repo name (one contains modular_model_index.json) to setup_loader() method, the loader will be able to fetch all the loading related specs from the repo and will be able to load the required models when you call load()

we will use this repo

t2i.setup_loader("YiYiXu/modular-loader-t2i")
t2i.loader.load(torch_dtype=torch.float16)

for n, m in t2i.loader.components.items():
    print(f"{n}[{m.class.name}]")
    if isinstance(m, torch.nn.Module):
        print(f"    {m.dtype},{m.device}")

all the components are loaded into the loader now

(Click to see the outputs)
text_encoder[CLIPTextModel]
    torch.float16,cpu
text_encoder_2[CLIPTextModelWithProjection]
    torch.float16,cpu
tokenizer[CLIPTokenizer]
tokenizer_2[CLIPTokenizer]
guider[ClassifierFreeGuidance]
image_encoder[CLIPVisionModelWithProjection]
    torch.float16,cpu
feature_extractor[CLIPImageProcessor]
unet[UNet2DConditionModel]
    torch.float16,cpu
scheduler[EulerDiscreteScheduler]
vae[AutoencoderKL]
    torch.float16,cpu
image_processor[VaeImageProcessor]

make a modular repo from scratch

When you're developing your custom pipeline and want to store all the component creation info conveniently in a modular repo (like the one I made), you can create one from scratch. This is my process: I set up an empty loader, run save_pretrained() to save the modular_model_index.json, push it to the hub, and update it with the components specs I need on the hub (you can also do it locally if you want)

t2i.setup_loader() 
print(t2i.loader) 
t2i.loader.save_pretrained("YiYiXu/modular-loader-t2i", push_to_hub=True) 
import shutil 
shutil.rmtree("YiYiXu/modular-loader-t2i")

Use ModularLoader (with Components Manager)

We ran a basic example of setting up the built-in modular loader with a modular repo for a modular pipeline. Now I want to talk about how to work with multiple different workflows and manage your components efficiently.

Component Manager is the very important to the modular system when working with multiple pipelines. With modular model indexes, it's easy to accidentally load duplicated models since different repos can point to the same subfolders. You won't download the same file twice because the hub uses the same cache, but you could create duplicated model instances without realizing it.

A ComponentManager helps you:

  • Detect and manage duplicate models (it warns when a model with existing load_id is registered)
  • Easily reuse components across different workflows
  • Apply offloading strategies across all your workflows

To use it, just pass the component manager to setup_loader() with an optional collection name (like "t2i") to tag all models from this loader for easier retrieval later.

t2i.setup_loader(modular_repo="YiYiXu/modular-loader-t2i", component_manager=components, collection="t2i")
t2i.loader.load(torch_dtype=dtype)
# set up offloading strategy on component manager, this way they will only be loaded when used and offloaded when not used
components.enable_auto_cpu_offload(device=device)

This same ComponentManager can now be used with a refiner pipeline:

# Create refiner blocks by using img2img blocks
i2i_blocks = IMAGE2IMAGE_BLOCKS.copy()
i2i_blocks.pop("image_encoder")  # Not needed for refiner

class RefinerBlocks(SequentialPipelineBlocks):
    block_classes = list(i2i_blocks.values())
    block_names = list(i2i_blocks.keys())

# Setup refiner with same component manager
refiner = RefinerBlocks()
refiner.setup_loader(modular_repo="YiYiXu/modular_refiner", component_manager=components, collection="refiner")

# Only load the refiner-specific UNet to avoid duplication 
refiner.loader.load(component_names="unet", torch_dtype=dtype)

# Reuse components from text2image pipeline
reuse_components = components.get("!unet|text_encoder|tokenizer", collection="t2i", as_name_component_tuples=True)
refiner.loader.update(**dict(reuse_components))

# Run two-stage generation
latents = t2i.run(prompt=prompt, num_inference_steps=25, output="latents")
image = refiner.run(image_latents=latents, prompt=prompt, denoising_start=0.8, num_inference_steps=25, output="images")
image.images[0].save("refined_image.png")

Complete Example Script

Below is a complete example script more usage examples of the ModularLoader, including LoRA and IP-Adapter integration.

Code example (Click to expand)
# ModulaarLoader PR examples


from diffusers.pipelines.modular_pipeline import SequentialPipelineBlocks
from diffusers.pipelines.components_manager import ComponentsManager
from diffusers.pipelines.stable_diffusion_xl.pipeline_stable_diffusion_xl_modular import TEXT2IMAGE_BLOCKS, IMAGE2IMAGE_BLOCKS
from diffusers.utils import load_image
import torch
dtype = torch.float16
device = torch.device("cuda:2")

# create pipeline blocks (here we use diffusers official block presets and just assmeble them, but you can create your own)
t2i_blocks = TEXT2IMAGE_BLOCKS.copy()
i2i_blocks = IMAGE2IMAGE_BLOCKS.copy()
i2i_blocks.pop("image_encoder")

class Text2ImageBlocks(SequentialPipelineBlocks):
    block_classes = list(t2i_blocks.values())
    block_names = list(t2i_blocks.keys())


class RefinerBlocks(SequentialPipelineBlocks):
    block_classes = list(i2i_blocks.values())
    block_names = list(i2i_blocks.keys())

# this is your text2image pipeline
t2i = Text2ImageBlocks()

# this is your refiner pipeline
refiner = RefinerBlocks()

# create components manager
components = ComponentsManager()

# setup loader with component manager
t2i.setup_loader(modular_repo="YiYiXu/modular-loader-t2i", component_manager=components, collection="t2i")
t2i.loader.load(torch_dtype=dtype)

# set up offloading strategy on component manager, this way they will only be loaded when used and offloaded when not used
components.enable_auto_cpu_offload(device=device)



prompt = "A crystal orb resting on a wooden table with a yellow rubber duck, surrounded by aged scrolls and alchemy tools, illuminated by candlelight, detailed texture, high resolution image"

# generate image, use `run` method here so it will:
# 1. run the pipeline blocks in the order/logic defined in SequentialPipelineBlocks
# 2. prepare the inputs for each block.__call__() method: pipeline_state and `pipeline`
#    - the `pipeline` input passed to each blocks that contains all the models the block needs is actually just the ModularLoader we just setup here! 
image = t2i.run(prompt=prompt, num_inference_steps=25, output="images").images[0]
image.save("yiyi_test_7_t2i.png")



# ok now I want to setup refiner, but reuse the same components because I know only the unet is different 
# here I already have repo made for refiner-specific configs,i will just use it 
refiner.setup_loader(modular_repo="YiYiXu/modular_refiner", component_manager=components, collection="refiner")

# if you run refiner.load() here it would just work, but you get complaints from component manager about duplicated components
# it is easy to remove the duplicates, but let's not do that for now 
# feel free to uncomment and try it out
# refiner.loader.load(torch_dtype=dtype)

# let's only load unet for now
refiner.loader.load(component_names="unet", torch_dtype=dtype)
# uncomment to check the loader: you can verify only unet is loaded 
# print(refiner.loader)
# uncomment this line below to check out component manager:
# you should see that only unet is registered in component manager under the "refiner" collection
# print(components)

# let's reuse the text_encoder and tokenizer from t2i pipeline
# this gets you everything under "t2i" collection that is not unet/text_encoder/tokenizer
# we get them as tuples of (name, component)
reuse_components = components.get("!unet|text_encoder|tokenizer", collection="t2i", as_name_component_tuples=True)
for name, component in iter(reuse_components):
    print(f"reuse {name}: {component.__class__.__name__}")


# ok now let's update the refiner loader with the reuse components and the new unet
# since these are exact same objects (same id() and everything), component manager won't re-register them either
refiner.loader.update(**dict(reuse_components))
# uncomment this line below to check out component manager:
# you should see that only refiner unet is registered in component manager under the "refiner" collection
# print(refiner.loader)
# print(components)




# running a refiner example 

latents = t2i.run(prompt=prompt, num_inference_steps=25, output="latents")
image = refiner.run(image_latents=latents, prompt=prompt, denoising_start=0.8, num_inference_steps=25, output="images")
image.images[0].save("yiyi_test_7_example2.png")

# all the loading related methods are availabe on the "loader", not the pipeline itself
# for example, use lora

t2i.loader.load_lora_weights("rajkumaralma/dissolve_dust_style", weight_name="ral-dissolve-sdxl.safetensors", adapter_name="ral-dissolve")
image = t2i.run(prompt=prompt, num_inference_steps=25, output="images").images[0]
image.save("yiyi_test_7_example3_lora.png")

# uncomment this line below to check out component manager:
# you should see info about the lora weights loaded in unet/text_encoder
# print(components)


# ip-adapter
t2i.loader.unload_lora_weights()
t2i.loader.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin")
t2i.loader.set_ip_adapter_scale(0.6)

ip_adapter_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_diner.png")
image = t2i.run(prompt=prompt, num_inference_steps=25, ip_adapter_image=ip_adapter_image, output="images").images[0]
image.save("yiyi_test_7_example4_ip-adapter.png")
# uncomment this line below to check out component manager:
# you should see info about the ip-adapter weights loaded in unet/text_encoder
# print(components)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

# YiYi TODO:
# 1. should support save some components too! currently only modular_model_index.json is saved
# 2. maybe order the json file to make it more readable: configs first, then components
def save_pretrained(self, save_directory: Union[str, os.PathLike], push_to_hub: bool = False, spec_only: bool = True, **kwargs):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO:
currently I only save config (a.k.a the model_index), we could totally save selected components into an subfolder too, so for example fine tune trainers could make a modular repo that contains fine tuned components and with the modular_model_index points to other repos for frozen components

same goes for from_pretrained()

@yiyixuxu yiyixuxu merged commit 6d5beef into modular-refactor Apr 30, 2025
2 checks passed
@yiyixuxu
Copy link
Collaborator Author

yiyixuxu commented Apr 30, 2025

cc @DN6
i merged this into the PR #11235, you can branch off from that PR

I left more info on this PR's description for Modular Loader, please read them if you have time

this refactor is mainly based on your feedback from last time, so would love for you to help think through the design and implementation more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants