[benchmarks] overhaul benchmarks #11565

sayakpaul · 2025-05-16T08:28:23Z

What does this PR do?

This PR considerably simplifies how we do benchmarks. Instead of using entire pipeline-level benchmarks across different tasks, we will now ONLY benchmark the diffusion network that is the most compute-intensive part in a standard diffusion workflow.

To make the estimates more realistic, we will make use of pre-trained checkpoints and dummy inputs with reasonable dimensionalities.

I ran benchmarking_flux.py on an 80GB A100 on a batch size of 1 and got the following results:

By default, all benchmarks will use a batch size of 1, eliminating CFG.

How to add your benchmark?

Adding benchmarks for a new model class (SanaTransformer2DModel, for example) boils down to the following:

Define the dummy inputs of the model.
Define the benchmarking scenarios we should run the benchmark on.

This is what benchmarking_flux.py does. More modularization can be shipped afterward.

Idea would be to merge this PR with pre-configured benchmarks for a few popular models and open others to the community.

TODOs

Utilities:

To fire the execution of the individual model-level benchmarks sequentially.
To combine CSVs from multiple different model classes.
Central dataset update and Slack notification.

@DN6 could you give the approach a quick look? I can then work on resolving the TODOs.

sayakpaul · 2025-05-16T08:33:25Z

benchmarks/benchmarking_utils.py

+logger = logging.get_logger(__name__)
+
+
+def benchmark_fn(f, *args, **kwargs):


This automatically warms up the model. No need to do it explicitly.

sayakpaul · 2025-05-16T08:34:30Z

benchmarks/benchmarking_flux.py

+
+
+if __name__ == "__main__":
+    scenarios = [


Covered the following scenarios:

Regular BF16 with compilation

NF4

Layerwise upcasting

Group offloading

sayakpaul · 2025-05-20T07:08:38Z

Added SDXL, Wan (14B), and LTX (13B) on top of Flux:

Results

	scenario	model_cls	num_params_M	flops_M	time_plain_s	mem_plain_GB	time_compile_s	mem_compile_GB	fullgraph	mode
0	Wan-AI/Wan2.1-T2V-14B-Diffusers-bf16	WanTransformer3DModel	14288.5	7.85612e+08	10.797	31.17	8.974	31.77	1	default
1	Wan-AI/Wan2.1-T2V-14B-Diffusers-layerwise-upcasting	WanTransformer3DModel	14288.5	7.85612e+08	10.702	26.78	nan	nan	nan	nan
2	Wan-AI/Wan2.1-T2V-14B-Diffusers-group-offload-leaf	WanTransformer3DModel	14288.5	7.85612e+08	10.83	4.48	nan	nan	nan	nan
3	stabilityai/stable-diffusion-xl-base-1.0-bf16	UNet2DConditionModel	2567.46	5.9791e+06	0.085	5.05	0.058	5.39	1	default
4	stabilityai/stable-diffusion-xl-base-1.0-layerwise-upcasting	UNet2DConditionModel	2567.46	5.9791e+06	0.175	4.89	nan	nan	nan	nan
5	stabilityai/stable-diffusion-xl-base-1.0-group-offload-leaf	UNet2DConditionModel	2567.46	5.9791e+06	0.383	0.2	nan	nan	nan	nan
6	black-forest-labs/FLUX.1-dev-bf16	FluxTransformer2DModel	11901.4	5.95295e+07	0.535	22.61	0.388	22.85	1	default
7	black-forest-labs/FLUX.1-dev-bnb-nf4	FluxTransformer2DModel	5952.25	17263.8	0.574	6.7	nan	nan	nan	nan
8	black-forest-labs/FLUX.1-dev-layerwise-upcasting	FluxTransformer2DModel	11901.4	5.95295e+07	0.621	22.18	nan	nan	nan	nan
9	black-forest-labs/FLUX.1-dev-group-offload-leaf	FluxTransformer2DModel	11901.4	5.95295e+07	1.536	0.53	nan	nan	nan	nan
10	Lightricks/LTX-Video-0.9.7-dev-bf16	LTXVideoTransformer3DModel	13042.6	1.67583e+08	1.446	25.21	1.137	25.63	1	default
11	Lightricks/LTX-Video-0.9.7-dev-layerwise-upcasting	LTXVideoTransformer3DModel	13042.6	1.67583e+08	1.529	24.38	nan	nan	nan	nan
12	Lightricks/LTX-Video-0.9.7-dev-group-offload-leaf	LTXVideoTransformer3DModel	13042.6	1.67583e+08	1.917	1.04	nan	nan	nan	nan

sayakpaul · 2025-05-20T11:09:24Z

Cc: @a-r-r-o-w if you want to add some caching benchmarks (in a later PR), I think that would be really great!

sayakpaul · 2025-05-20T12:10:38Z

@DN6 this is ready for a review.

This is how the final CSV for this stage looks like:
https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/collated_results.csv

I have confirmed in this run that it works as expected:
https://github.com/huggingface/diffusers/actions/runs/15138495257/job/42570011907

a-r-r-o-w · 2025-05-20T12:40:14Z

Cc: @a-r-r-o-w if you want to add some caching benchmarks (in a later PR), I think that would be really great!

Sounds good, I'll take it up in near future once this PR is in

sayakpaul added 8 commits May 15, 2025 18:05

start overhauling the benchmarking suite.

24a46cc

fixes

ab7f381

fixes

cc0a38a

checking.

169f831

checking

ad18983

fixes.

31e34d5

error handling and logging.

36afdea

Merge branch 'main' into benchmarking-overhaul

0d3af90

sayakpaul commented May 16, 2025

View reviewed changes

sayakpaul added 4 commits May 19, 2025 13:17

Merge branch 'main' into benchmarking-overhaul

fd85fbc

Merge branch 'main' into benchmarking-overhaul

a2c03a4

add flops and params.

4d83a47

add more models.

6815cae

sayakpaul added 5 commits May 20, 2025 15:21

utility to fire execution of all benchmarking scripts.

5635bf8

utility to push to the hub.

cfbd21e

push utility improvement

4ccfad0

seems to be working.

dff3144

okay

accd598

sayakpaul marked this pull request as ready for review May 20, 2025 11:08

sayakpaul changed the title ~~[WIP][benchmarks] overhaul benchmarks~~ [benchmarks] overhaul benchmarks May 20, 2025

sayakpaul added 4 commits May 20, 2025 16:41

add torchprofile dep.

41f79a0

remove total gpu memory

befdd9e

fixes

4784b8b

fix

c19dc5b

sayakpaul requested a review from DN6 May 20, 2025 12:09

need a big gpu

2da4fac

sayakpaul added 2 commits May 20, 2025 17:50

better

7367bb1

what's happening.

1cd472f

sayakpaul added 2 commits May 20, 2025 18:42

okay

214795d

Merge branch 'main' into benchmarking-overhaul

7d4f459

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[benchmarks] overhaul benchmarks #11565

[benchmarks] overhaul benchmarks #11565

sayakpaul commented May 16, 2025 •

edited

Loading

sayakpaul May 16, 2025

sayakpaul May 16, 2025

sayakpaul commented May 20, 2025 •

edited

Loading

sayakpaul commented May 20, 2025

sayakpaul commented May 20, 2025 •

edited

Loading

a-r-r-o-w commented May 20, 2025

		logger = logging.get_logger(__name__)


		def benchmark_fn(f, args, *kwargs):

[benchmarks] overhaul benchmarks #11565

Are you sure you want to change the base?

[benchmarks] overhaul benchmarks #11565

Conversation

sayakpaul commented May 16, 2025 • edited Loading

What does this PR do?

How to add your benchmark?

TODOs

sayakpaul May 16, 2025

Choose a reason for hiding this comment

sayakpaul May 16, 2025

Choose a reason for hiding this comment

sayakpaul commented May 20, 2025 • edited Loading

sayakpaul commented May 20, 2025

sayakpaul commented May 20, 2025 • edited Loading

a-r-r-o-w commented May 20, 2025

sayakpaul commented May 16, 2025 •

edited

Loading

sayakpaul commented May 20, 2025 •

edited

Loading

sayakpaul commented May 20, 2025 •

edited

Loading