Skip to content

Commit 00b179f

Browse files
sayakpaulstevhliu
andauthored
[docs] add compilation bits to the bitsandbytes docs. (#11693)
* add compilation bits to the bitsandbytes docs. * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * finish --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
1 parent 47ef794 commit 00b179f

File tree

1 file changed

+39
-0
lines changed

1 file changed

+39
-0
lines changed

docs/source/en/quantization/bitsandbytes.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -416,6 +416,45 @@ text_encoder_2_4bit.dequantize()
416416
transformer_4bit.dequantize()
417417
```
418418

419+
## torch.compile
420+
421+
Speed up inference with `torch.compile`. Make sure you have the latest `bitsandbytes` installed and we also recommend installing [PyTorch nightly](https://pytorch.org/get-started/locally/).
422+
423+
<hfoptions id="bnb">
424+
<hfoption id="8-bit">
425+
```py
426+
torch._dynamo.config.capture_dynamic_output_shape_ops = True
427+
428+
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
429+
transformer_4bit = AutoModel.from_pretrained(
430+
"black-forest-labs/FLUX.1-dev",
431+
subfolder="transformer",
432+
quantization_config=quant_config,
433+
torch_dtype=torch.float16,
434+
)
435+
transformer_4bit.compile(fullgraph=True)
436+
```
437+
438+
</hfoption>
439+
<hfoption id="4-bit">
440+
441+
```py
442+
quant_config = DiffusersBitsAndBytesConfig(load_in_4bit=True)
443+
transformer_4bit = AutoModel.from_pretrained(
444+
"black-forest-labs/FLUX.1-dev",
445+
subfolder="transformer",
446+
quantization_config=quant_config,
447+
torch_dtype=torch.float16,
448+
)
449+
transformer_4bit.compile(fullgraph=True)
450+
```
451+
</hfoption>
452+
</hfoptions>
453+
454+
On an RTX 4090 with compilation, 4-bit Flux generation completed in 25.809 seconds versus 32.570 seconds without.
455+
456+
Check out the [benchmarking script](https://gist.github.com/sayakpaul/0db9d8eeeb3d2a0e5ed7cf0d9ca19b7d) for more details.
457+
419458
## Resources
420459

421460
* [End-to-end notebook showing Flux.1 Dev inference in a free-tier Colab](https://gist.github.com/sayakpaul/c76bd845b48759e11687ac550b99d8b4)

0 commit comments

Comments
 (0)