Skip to content

Commit 8b46ad5

Browse files
change the words
1 parent aa12743 commit 8b46ad5

File tree

1 file changed

+5
-26
lines changed

1 file changed

+5
-26
lines changed

prototype_source/pt2e_quant_x86_inductor.rst

Lines changed: 5 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Prerequisites
77
^^^^^^^^^^^^^^^
88

99
- `PyTorch 2 Export Post Training Quantization <https://pytorch.org/tutorials/prototype/pt2e_quant_ptq.html>`_
10-
- `PyTorch 2 Export Quantization-Aware Training tutorial <https://pytorch.org/tutorials/prototype/pt2e_quant_qat.html>`_
10+
- `PyTorch 2 Export Quantization-Aware Training <https://pytorch.org/tutorials/prototype/pt2e_quant_qat.html>`_
1111
- `TorchInductor and torch.compile concepts in PyTorch <https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html>`_
1212

1313
Introduction
@@ -16,7 +16,7 @@ Introduction
1616
This tutorial introduces the steps for utilizing the PyTorch 2 Export Quantization flow to generate a quantized model customized
1717
for the x86 inductor backend and explains how to lower the quantized model into the inductor.
1818

19-
The new quantization 2 flow uses the PT2 Export to capture the model into a graph and perform quantization transformations on top of the ATen graph.
19+
The pytorch 2 export quantization flow uses the torch.export to capture the model into a graph and perform quantization transformations on top of the ATen graph.
2020
This approach is expected to have significantly higher model coverage, better programmability, and a simplified UX.
2121
TorchInductor is the new compiler backend that compiles the FX Graphs generated by TorchDynamo into optimized C++/Triton kernels.
2222

@@ -84,8 +84,6 @@ We will start by performing the necessary imports, capturing the FX Graph from t
8484
model = models.__dict__[model_name](pretrained=True)
8585

8686
# Set the model to eval mode
87-
# Only apply it for post-training static quantization
88-
# Skip this step for quantization-aware training
8987
model = model.eval()
9088

9189
# Create the data, using the dummy data here as an example
@@ -119,44 +117,26 @@ Next, we will have the FX Module to be quantized.
119117
After we capture the FX Module to be quantized, we will import the Backend Quantizer for X86 CPU and configure how to
120118
quantize the model.
121119

122-
For post-training static quantization:
123-
124120
::
125121

126122
quantizer = X86InductorQuantizer()
127123
quantizer.set_global(xiq.get_default_x86_inductor_quantization_config())
128124

129-
For quantization-aware training:
130-
131-
::
132-
133-
quantizer = X86InductorQuantizer()
134-
quantizer.set_global(xiq.get_default_x86_inductor_quantization_config(is_qat=True))
135-
136125
.. note::
137126

138127
The default quantization configuration in ``X86InductorQuantizer`` uses 8-bits for both activations and weights.
139128
When Vector Neural Network Instruction is not available, the oneDNN backend silently chooses kernels that assume
140129
`multiplications are 7-bit x 8-bit <https://oneapi-src.github.io/oneDNN/dev_guide_int8_computations.html#inputs-of-mixed-type-u8-and-s8>`_. In other words, potential
141130
numeric saturation and accuracy issue may happen when running on CPU without Vector Neural Network Instruction.
142131

143-
After we import the backend-specific Quantizer, we will prepare the model for post-training quantization or quantization-aware training.
144-
145-
For post-training static quantization, ``prepare_pt2e`` folds BatchNorm operators into preceding Conv2d operators, and inserts observers in appropriate places in the model.
132+
After we import the backend-specific Quantizer, we will prepare the model for post-training quantization.
133+
``prepare_pt2e`` folds BatchNorm operators into preceding Conv2d operators, and inserts observers in appropriate places in the model.
146134

147135
::
148136

149137
prepared_model = prepare_pt2e(exported_model, quantizer)
150138

151-
For quantization-aware training:
152-
153-
::
154-
155-
prepared_model = prepare_qat_pt2e(exported_model, quantizer)
156-
157-
158-
Now, we will do calibration for post-training static quantization or quantization-aware training. Here is the example code
159-
for post-training static quantization. The example code omits quantization-aware training for simplicity.
139+
Now, we will calibrate the ``prepared_model`` after the observers are inserted in the model.
160140

161141
::
162142

@@ -176,7 +156,6 @@ Finally, we will convert the calibrated Model to a quantized Model. ``convert_pt
176156
::
177157

178158
converted_model = convert_pt2e(prepared_model)
179-
torch.ao.quantization.move_exported_model_to_eval(converted_model)
180159

181160
After these steps, we finished running the quantization flow and we will get the quantized model.
182161

0 commit comments

Comments
 (0)