You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- `PyTorch 2 Export Post Training Quantization <https://pytorch.org/tutorials/prototype/pt2e_quant_ptq.html>`_
10
10
- `TorchInductor and torch.compile concepts in PyTorch <https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html>`_
11
11
- `Inductor C++ Wrapper concepts <https://pytorch.org/tutorials/prototype/inductor_cpp_wrapper_tutorial.html>`_
12
12
13
13
Introduction
14
-
^^^^^^^^^^^^^^
14
+
------------
15
15
16
16
This tutorial introduces the steps for utilizing the PyTorch 2 Export Quantization flow to generate a quantized model customized
17
17
for the x86 inductor backend and explains how to lower the quantized model into the inductor.
@@ -63,8 +63,8 @@ further boost the models' performance by leveraging the
63
63
64
64
Now, we will walk you through a step-by-step tutorial for how to use it with `torchvision resnet18 model <https://download.pytorch.org/models/resnet18-f37072fd.pth>`_.
65
65
66
-
1. Capture FX Graph
67
-
---------------------
66
+
Capture FX Graph
67
+
----------------
68
68
69
69
We will start by performing the necessary imports, capturing the FX Graph from the eager module.
70
70
@@ -110,8 +110,8 @@ We will start by performing the necessary imports, capturing the FX Graph from t
110
110
111
111
Next, we will have the FX Module to be quantized.
112
112
113
-
2. Apply Quantization
114
-
----------------------------
113
+
Apply Quantization
114
+
------------------
115
115
116
116
After we capture the FX Module to be quantized, we will import the Backend Quantizer for X86 CPU and configure how to
117
117
quantize the model.
@@ -159,8 +159,8 @@ Finally, we will convert the calibrated Model to a quantized Model. ``convert_pt
159
159
After these steps, we finished running the quantization flow and we will get the quantized model.
160
160
161
161
162
-
3. Lower into Inductor
163
-
------------------------
162
+
Lower into Inductor
163
+
-------------------
164
164
165
165
After we get the quantized model, we will further lower it to the inductor backend. The default Inductor wrapper
166
166
generates Python code to invoke both generated kernels and external kernels. Additionally, Inductor supports
@@ -222,8 +222,8 @@ With PyTorch 2.1 release, all CNN models from TorchBench test suite have been me
222
222
to `this document <https://dev-discuss.pytorch.org/t/torchinductor-update-6-cpu-backend-performance-update-and-new-features-in-pytorch-2-1/1514#int8-inference-with-post-training-static-quantization-3>`_
223
223
for detail benchmark number.
224
224
225
-
4. Conclusion
226
-
---------------
225
+
Conclusion
226
+
----------
227
227
228
228
With this tutorial, we introduce how to use Inductor with X86 CPU in PyTorch 2 Quantization. Users can learn about
229
229
how to use ``X86InductorQuantizer`` to quantize a model and lower it into the inductor with X86 CPU devices.
0 commit comments