Skip to content

Commit ae209cc

Browse files
jerryzh168Svetlana Karslioglu
and
Svetlana Karslioglu
authored
Apply suggestions from code review
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
1 parent 7e504c1 commit ae209cc

File tree

2 files changed

+23
-16
lines changed

2 files changed

+23
-16
lines changed

prototype_source/pt2e_quant_ptq_static.rst

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -436,11 +436,13 @@ Convert the Calibrated Model to a Quantized Model
436436
print(quantized_model)
437437
438438
.. note::
439-
At this step, we currently have two representations that you can choose from, but what exact representation
440-
we offer in the long term might change based on feedbacks from users.
439+
At this step, we currently have two representations that you can choose from, but exact representation
440+
we offer in the long term might change based on feedback from PyTorch users.
441441

442442
* Q/DQ Representation (default)
443-
Previous documentation for `representations <https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md>`_ all quantized operators are represented as ``dequantize -> fp32_op -> qauntize``.
443+
444+
Previous documentation for `representations <https://github.com/pytorch/rfcs/blob/master/RFC-0019-
445+
Extending-PyTorch-Quantization-to-Custom-Backends.md>`_ all quantized operators are represented as ``dequantize -> fp32_op -> qauntize``.
444446

445447
.. code-block:: python
446448
@@ -455,11 +457,12 @@ Convert the Calibrated Model to a Quantized Model
455457
out_fp32, out_scale, out_zero_point, out_quant_min, out_quant_max, torch.int8)
456458
return out_i8
457459
458-
* Reference Quantized Model Representation (WIP, expected to be ready at end of August): we have special representation for selected ops (for example, quantized linear), other ops are represented as (dq -> float32_op -> q), and q/dq are decomposed into more primitive operators.
460+
* Reference Quantized Model Representation (WIP, expected to be ready at end of August): we have special representation for selected ops (for example, quantized linear), other ops are represented as (``dq -> float32_op -> q``), and ``q/dq`` are decomposed into more primitive operators.
459461
460-
You can get this representation by: ``convert_pt2e(..., use_reference_representation=True)``
462+
You can get this representation by using ``convert_pt2e(..., use_reference_representation=True)``.
461463
462464
.. code-block:: python
465+
463466
# Reference Quantized Pattern for quantized linear
464467
def quantized_linear(x_int8, x_scale, x_zero_point, weight_int8, weight_scale, weight_zero_point, bias_fp32, output_scale, output_zero_point):
465468
x_int16 = x_int8.to(torch.int16)

prototype_source/pt2e_quantizer.rst

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -146,14 +146,17 @@ parameters are shared with other tensors. Input of ``SharedQuantizationSpec`` is
146146
can be an input edge or an output value.
147147

148148
.. note::
149-
* Sharing is transitive
150149

151-
Some Tensors might be effectively using shared quantization spec due to (1) two nodes/edges are
152-
configured to use SharedQuantizationSpec (2) there is existing sharing of some of the nodes
150+
* Sharing is transitive
153151

152+
Some tensors might be effectively using shared quantization spec due to:
153+
154+
* Two nodes/edges are configured to use ``SharedQuantizationSpec``.
155+
* There is existing sharing of some nodes.
156+
154157
For example, let's say we have two ``conv`` nodes ``conv1`` and ``conv2``, and both of them are fed into a ``cat``
155-
node. `cat([conv1_out, conv2_out], ...)` Let's say output of ``conv1``, ``conv2`` and the first input of ``cat`` are configured
156-
with the same configurations of ``QuantizationSpec``, second input of ``cat`` is configured to use ``SharedQuantizationSpec``
158+
node: ``cat([conv1_out, conv2_out], ...)``. Let's say the output of ``conv1``, ``conv2``, and the first input of ``cat`` are configured
159+
with the same configurations of ``QuantizationSpec``. The second input of ``cat`` is configured to use ``SharedQuantizationSpec``
157160
with the first input.
158161

159162
.. code-block::
@@ -163,15 +166,16 @@ can be an input edge or an output value.
163166
cat_input0: qspec1(dtype=torch.int8, ...)
164167
cat_input1: SharedQuantizationSpec((conv1, cat)) # conv1 node is the first input of cat
165168
166-
First of all, the output of ``conv1`` is implicitly sharing quantization parameter (and observer object)
167-
with the first input of ``cat``, and same for output of ``conv2`` and the second input of ``cat``.
168-
So since user configures the two inputs of ``cat`` to share quantization parameters, by transitivity,
169+
First of all, the output of ``conv1`` is implicitly sharing quantization parameters (and observer object)
170+
with the first input of ``cat``, and the same is true for the output of ``conv2`` and the second input of ``cat``.
171+
Therefore, since the user configures the two inputs of ``cat`` to share quantization parameters, by transitivity,
169172
``conv2_out`` and ``conv1_out`` will also be sharing quantization parameters. In the observed graph, you
170-
will see:
173+
will see the following:
174+
171175
.. code-block::
172176
173-
conv1 -> obs -> cat
174-
conv2 -> obs /
177+
conv1 -> obs -> cat
178+
conv2 -> obs /
175179
176180
and both ``obs`` will be the same observer instance.
177181

0 commit comments

Comments
 (0)