Skip to content

Commit 2ead9c8

Browse files
Merge PTQ/QAT tutorial for x86Inductor
1 parent 6857c39 commit 2ead9c8

File tree

3 files changed

+35
-83
lines changed

3 files changed

+35
-83
lines changed

prototype_source/prototype_index.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,12 @@ Prototype features are not available as part of binary distributions like PyPI o
8989
:link: ../prototype/pt2e_quant_qat.html
9090
:tags: Quantization
9191

92+
.. customcarditem::
93+
:header: PyTorch 2 Export Quantization with X86 Backend through Inductor
94+
:card_description: Learn how to use PT2 Export Quantization with X86 Backend through Inductor.
95+
:image: ../_static/img/thumbnails/cropped/generic-pytorch-logo.png
96+
:link: ../prototype/pt2e_quant_x86_inductor.html
97+
:tags: Quantization
9298

9399
.. Sparsity
94100

prototype_source/pt2e_quant_qat_x86_inductor.rst

Lines changed: 0 additions & 77 deletions
This file was deleted.

prototype_source/pt2e_quant_ptq_x86_inductor.rst renamed to prototype_source/pt2e_quant_x86_inductor.rst

Lines changed: 29 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
PyTorch 2 Export Post Training Quantization with X86 Backend through Inductor
1+
PyTorch 2 Export Quantization with X86 Backend through Inductor
22
========================================================================================
33

44
**Author**: `Leslie Fang <https://github.com/leslie-fang-intel>`_, `Weiwen Xia <https://github.com/Xia-Weiwen>`_, `Jiong Gong <https://github.com/jgong5>`_, `Jerry Zhang <https://github.com/jerryzh168>`_
@@ -7,6 +7,7 @@ Prerequisites
77
^^^^^^^^^^^^^^^
88

99
- `PyTorch 2 Export Post Training Quantization <https://pytorch.org/tutorials/prototype/pt2e_quant_ptq.html>`_
10+
- `PyTorch 2 Export Quantization-Aware Training tutorial <https://pytorch.org/tutorials/prototype/pt2e_quant_qat.html>`_
1011
- `TorchInductor and torch.compile concepts in PyTorch <https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html>`_
1112
- `Inductor C++ Wrapper concepts <https://pytorch.org/tutorials/prototype/inductor_cpp_wrapper_tutorial.html>`_
1213

@@ -16,14 +17,15 @@ Introduction
1617
This tutorial introduces the steps for utilizing the PyTorch 2 Export Quantization flow to generate a quantized model customized
1718
for the x86 inductor backend and explains how to lower the quantized model into the inductor.
1819

19-
The new quantization 2 flow uses the PT2 Export to capture the model into a graph and perform quantization transformations on top of the ATen graph. This approach is expected to have significantly higher model coverage, better programmability, and a simplified UX.
20+
The new quantization 2 flow uses the PT2 Export to capture the model into a graph and perform quantization transformations on top of the ATen graph.
21+
This approach is expected to have significantly higher model coverage, better programmability, and a simplified UX.
2022
TorchInductor is the new compiler backend that compiles the FX Graphs generated by TorchDynamo into optimized C++/Triton kernels.
2123

2224
This flow of quantization 2 with Inductor mainly includes three steps:
2325

2426
- Step 1: Capture the FX Graph from the eager Model based on the `torch export mechanism <https://pytorch.org/docs/main/export.html>`_.
2527
- Step 2: Apply the Quantization flow based on the captured FX Graph, including defining the backend-specific quantizer, generating the prepared model with observers,
26-
performing the prepared model's calibration, and converting the prepared model into the quantized model.
28+
performing the prepared model's calibration or quantization-aware training, and converting the prepared model into the quantized model.
2729
- Step 3: Lower the quantized model into inductor with the API ``torch.compile``.
2830

2931
The high-level architecture of this flow could look like this:
@@ -83,6 +85,8 @@ We will start by performing the necessary imports, capturing the FX Graph from t
8385
model = models.__dict__[model_name](pretrained=True)
8486

8587
# Set the model to eval mode
88+
# Only apply it for post-training static quantization
89+
# Skip this step for quantization-aware training
8690
model = model.eval()
8791

8892
# Create the data, using the dummy data here as an example
@@ -116,26 +120,44 @@ Next, we will have the FX Module to be quantized.
116120
After we capture the FX Module to be quantized, we will import the Backend Quantizer for X86 CPU and configure how to
117121
quantize the model.
118122

123+
For post-training static quantization:
124+
119125
::
120126

121127
quantizer = X86InductorQuantizer()
122128
quantizer.set_global(xiq.get_default_x86_inductor_quantization_config())
123129

130+
For quantization-aware training:
131+
132+
::
133+
134+
quantizer = X86InductorQuantizer()
135+
quantizer.set_global(xiq.get_default_x86_inductor_quantization_config(is_qat=True))
136+
124137
.. note::
125138

126139
The default quantization configuration in ``X86InductorQuantizer`` uses 8-bits for both activations and weights.
127140
When Vector Neural Network Instruction is not available, the oneDNN backend silently chooses kernels that assume
128141
`multiplications are 7-bit x 8-bit <https://oneapi-src.github.io/oneDNN/dev_guide_int8_computations.html#inputs-of-mixed-type-u8-and-s8>`_. In other words, potential
129142
numeric saturation and accuracy issue may happen when running on CPU without Vector Neural Network Instruction.
130143

131-
After we import the backend-specific Quantizer, we will prepare the model for post-training quantization.
132-
``prepare_pt2e`` folds BatchNorm operators into preceding Conv2d operators, and inserts observers in appropriate places in the model.
144+
After we import the backend-specific Quantizer, we will prepare the model for post-training quantization or quantization-aware training.
145+
146+
For post-training static quantization, ``prepare_pt2e`` folds BatchNorm operators into preceding Conv2d operators, and inserts observers in appropriate places in the model.
133147

134148
::
135149

136150
prepared_model = prepare_pt2e(exported_model, quantizer)
137151

138-
Now, we will calibrate the ``prepared_model`` after the observers are inserted in the model.
152+
For quantization-aware training:
153+
154+
::
155+
156+
prepared_model = prepare_qat_pt2e(exported_model, quantizer)
157+
158+
159+
Now, we will do calibration for post-training static quantization or quantization-aware training. Here is the example code
160+
for post-training static quantization. The example code omits quantization-aware training for simplicity.
139161

140162
::
141163

@@ -155,6 +177,7 @@ Finally, we will convert the calibrated Model to a quantized Model. ``convert_pt
155177
::
156178

157179
converted_model = convert_pt2e(prepared_model)
180+
torch.ao.quantization.move_exported_model_to_eval(converted_model)
158181

159182
After these steps, we finished running the quantization flow and we will get the quantized model.
160183

0 commit comments

Comments
 (0)