change the words

leslie-fang-intel · leslie-fang-intel · commit 2ca1927504f2 · 2024-01-12T16:35:15.000+08:00
diff --git a/prototype_source/pt2e_quant_x86_inductor.rst b/prototype_source/pt2e_quant_x86_inductor.rst
@@ -7,7 +7,7 @@ Prerequisites
 ^^^^^^^^^^^^^^^
 
 -  `PyTorch 2 Export Post Training Quantization <https://pytorch.org/tutorials/prototype/pt2e_quant_ptq.html>`_
--  `PyTorch 2 Export Quantization-Aware Training tutorial <https://pytorch.org/tutorials/prototype/pt2e_quant_qat.html>`_
+-  `PyTorch 2 Export Quantization-Aware Training <https://pytorch.org/tutorials/prototype/pt2e_quant_qat.html>`_
 -  `TorchInductor and torch.compile concepts in PyTorch <https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html>`_
 -  `Inductor C++ Wrapper concepts <https://pytorch.org/tutorials/prototype/inductor_cpp_wrapper_tutorial.html>`_
 
@@ -17,7 +17,7 @@ Introduction
 This tutorial introduces the steps for utilizing the PyTorch 2 Export Quantization flow to generate a quantized model customized
 for the x86 inductor backend and explains how to lower the quantized model into the inductor.
 
-The new quantization 2 flow uses the PT2 Export to capture the model into a graph and perform quantization transformations on top of the ATen graph.
+The pytorch 2 export quantization flow uses the torch.export to capture the model into a graph and perform quantization transformations on top of the ATen graph.
 This approach is expected to have significantly higher model coverage, better programmability, and a simplified UX.
 TorchInductor is the new compiler backend that compiles the FX Graphs generated by TorchDynamo into optimized C++/Triton kernels.
 
@@ -85,8 +85,6 @@ We will start by performing the necessary imports, capturing the FX Graph from t
     model = models.__dict__[model_name](pretrained=True)
 
     # Set the model to eval mode
-    # Only apply it for post-training static quantization
-    # Skip this step for quantization-aware training
     model = model.eval()
 
     # Create the data, using the dummy data here as an example
@@ -120,44 +118,26 @@ Next, we will have the FX Module to be quantized.
 After we capture the FX Module to be quantized, we will import the Backend Quantizer for X86 CPU and configure how to
 quantize the model.
 
-For post-training static quantization:
-
 ::
 
     quantizer = X86InductorQuantizer()
     quantizer.set_global(xiq.get_default_x86_inductor_quantization_config())
 
-For quantization-aware training:
-
-::
-    
-    quantizer = X86InductorQuantizer()
-    quantizer.set_global(xiq.get_default_x86_inductor_quantization_config(is_qat=True))
-
 .. note::
 
    The default quantization configuration in ``X86InductorQuantizer`` uses 8-bits for both activations and weights.
   When Vector Neural Network Instruction is not available, the oneDNN backend silently chooses kernels that assume
   `multiplications are 7-bit x 8-bit <https://oneapi-src.github.io/oneDNN/dev_guide_int8_computations.html#inputs-of-mixed-type-u8-and-s8>`_. In other words, potential
   numeric saturation and accuracy issue may happen when running on CPU without Vector Neural Network Instruction.
 
-After we import the backend-specific Quantizer, we will prepare the model for post-training quantization or quantization-aware training.
-
-For post-training static quantization, ``prepare_pt2e`` folds BatchNorm operators into preceding Conv2d operators, and inserts observers in appropriate places in the model.
+After we import the backend-specific Quantizer, we will prepare the model for post-training quantization.
+``prepare_pt2e`` folds BatchNorm operators into preceding Conv2d operators, and inserts observers in appropriate places in the model.
 
 ::
 
     prepared_model = prepare_pt2e(exported_model, quantizer)
 
-For quantization-aware training:
-
-::
-
-    prepared_model = prepare_qat_pt2e(exported_model, quantizer)    
-
-
-Now, we will do calibration for post-training static quantization or quantization-aware training. Here is the example code
-for post-training static quantization. The example code omits quantization-aware training for simplicity.
+Now, we will calibrate the ``prepared_model`` after the observers are inserted in the model.
 
 ::
 
@@ -177,7 +157,6 @@ Finally, we will convert the calibrated Model to a quantized Model. ``convert_pt
 ::
 
     converted_model = convert_pt2e(prepared_model)
-    torch.ao.quantization.move_exported_model_to_eval(converted_model)
 
 After these steps, we finished running the quantization flow and we will get the quantized model.