Apply suggestions from code review

jerryzh168 · Svetlana Karslioglu · web-flow · commit 93aedaa480ab · 2023-08-17T17:30:36.000-07:00
Co-authored-by: Svetlana Karslioglu &lt;svekars@fb.com&gt;
diff --git a/prototype_source/pt2e_quant_ptq_static.rst b/prototype_source/pt2e_quant_ptq_static.rst
@@ -437,6 +437,7 @@ Convert the Calibrated Model to a Quantized Model
    Previous documentation for `representations <https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md>`_ all quantized operators are represented as ``dequantize -> fp32_op -> qauntize``.
 
    .. code-block:: python
+
       def quantized_linear(x_int8, x_scale, x_zero_point, weight_int8, weight_scale, weight_zero_point, bias_fp32, output_scale, output_zero_point):
           x_fp32 = torch.ops.quantized_decomposed.dequantize_per_tensor(
                    x_i8, x_scale, x_zero_point, x_quant_min, x_quant_max, torch.int8)
@@ -448,9 +449,9 @@ Convert the Calibrated Model to a Quantized Model
           out_fp32, out_scale, out_zero_point, out_quant_min, out_quant_max, torch.int8)
           return out_i8
      
-     * Reference Quantized Model Representation (WIP, expected to be ready at end of August): we have special representation for selected ops (e.g. quantized linear), other ops are represented as (dq -> float32_op -> q), and q/dq are decomposed into more primitive operators.
+     * Reference Quantized Model Representation (WIP, expected to be ready at end of August): we have special representation for selected ops (for example, quantized linear), other ops are represented as (dq -> float32_op -> q), and q/dq are decomposed into more primitive operators.
 
-       You can get this representation by: convert_pt2e(..., use_reference_representation=True)
+       You can get this representation by: ``convert_pt2e(..., use_reference_representation=True)``
 
     .. code-block:: python
        # Reference Quantized Pattern for quantized linear
@@ -465,7 +466,7 @@ Convert the Calibrated Model to a Quantized Model
            return out_int8
 
 
-   Please see `<here https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/pt2e/representation/rewrite.py>`_ for the most up to date reference representations.
+   See `here <https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/pt2e/representation/rewrite.py>`_ for the most up-to-date reference representations.
 
 
 Checking Model Size and Accuracy Evaluation
diff --git a/prototype_source/pt2e_quantizer.rst b/prototype_source/pt2e_quantizer.rst
@@ -146,29 +146,34 @@ parameters are shared with other tensors. Input of ``SharedQuantizationSpec`` is
 can be an input edge or an output value.
 
 .. note::
-   * Sharing is Transitive
-     Some Tensors might be effectively be using shared quantization spec due to (1) two nodes/edges are
+   * Sharing is transitive
+
+     Some Tensors might be effectively using shared quantization spec due to (1) two nodes/edges are
      configured to use SharedQuantizationSpec (2) there is existing sharing of some of the nodes
 
-     For example, let's say we have two conv nodes conv1 and conv2, and both of them are fed into a cat
-     node. `cat([conv1_out, conv2_out], ...)` Let's say output of conv1, conv2 and first input of cat are configured
-     with the same configurations of QuantizationSpec, second input of cat is configured to use SharedQuantizationSpec
+     For example, let's say we have two ``conv`` nodes ``conv1`` and ``conv2``, and both of them are fed into a ``cat``
+     node. `cat([conv1_out, conv2_out], ...)` Let's say output of ``conv1``, ``conv2`` and the first input of ``cat`` are configured
+     with the same configurations of ``QuantizationSpec``, second input of ``cat`` is configured to use ``SharedQuantizationSpec``
      with the first input.
-     conv1_out: qspec1(dtype=torch.int8, ...)
-     conv2_out: qspec1(dtype=torch.int8, ...)
-     cat_input0: qspec1(dtype=torch.int8, ...)
-     cat_input1: SharedQuantizationSpec((conv1, cat))  # conv1 node is the first input of cat
      
-     First of all, the output of conv1 are implicitly sharing quantization parameter (and observer object)
-     with first input of cat, and same for output of conv2 and second input of cat.
-     So since user configures the two input of cat to share quantization parameters, by transitivity,
-     conv2_out and conv1_out will also be sharing quantization parameters. In the observed graph, you
+     .. code-block::
+     
+       conv1_out: qspec1(dtype=torch.int8, ...)
+       conv2_out: qspec1(dtype=torch.int8, ...)
+       cat_input0: qspec1(dtype=torch.int8, ...)
+       cat_input1: SharedQuantizationSpec((conv1, cat))  # conv1 node is the first input of cat
+     
+     First of all, the output of ``conv1`` is implicitly sharing quantization parameter (and observer object)
+     with the first input of ``cat``, and same for output of ``conv2`` and the second input of ``cat``.
+     So since user configures the two inputs of ``cat`` to share quantization parameters, by transitivity,
+     ``conv2_out`` and ``conv1_out`` will also be sharing quantization parameters. In the observed graph, you
      will see:
-     ```
+     .. code-block::
+     
      conv1 -> obs -> cat
      conv2 -> obs   /
-     ```
-     and both `obs` will be the same observer instance
+
+     and both ``obs`` will be the same observer instance.
 
 
 -  Input edge is the connection between input node and the node consuming the input,