Skip to content

Commit 93aedaa

Browse files
jerryzh168Svetlana Karslioglu
and
Svetlana Karslioglu
authored
Apply suggestions from code review
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
1 parent a28d763 commit 93aedaa

File tree

2 files changed

+25
-19
lines changed

2 files changed

+25
-19
lines changed

prototype_source/pt2e_quant_ptq_static.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -437,6 +437,7 @@ Convert the Calibrated Model to a Quantized Model
437437
Previous documentation for `representations <https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md>`_ all quantized operators are represented as ``dequantize -> fp32_op -> qauntize``.
438438

439439
.. code-block:: python
440+
440441
def quantized_linear(x_int8, x_scale, x_zero_point, weight_int8, weight_scale, weight_zero_point, bias_fp32, output_scale, output_zero_point):
441442
x_fp32 = torch.ops.quantized_decomposed.dequantize_per_tensor(
442443
x_i8, x_scale, x_zero_point, x_quant_min, x_quant_max, torch.int8)
@@ -448,9 +449,9 @@ Convert the Calibrated Model to a Quantized Model
448449
out_fp32, out_scale, out_zero_point, out_quant_min, out_quant_max, torch.int8)
449450
return out_i8
450451
451-
* Reference Quantized Model Representation (WIP, expected to be ready at end of August): we have special representation for selected ops (e.g. quantized linear), other ops are represented as (dq -> float32_op -> q), and q/dq are decomposed into more primitive operators.
452+
* Reference Quantized Model Representation (WIP, expected to be ready at end of August): we have special representation for selected ops (for example, quantized linear), other ops are represented as (dq -> float32_op -> q), and q/dq are decomposed into more primitive operators.
452453
453-
You can get this representation by: convert_pt2e(..., use_reference_representation=True)
454+
You can get this representation by: ``convert_pt2e(..., use_reference_representation=True)``
454455
455456
.. code-block:: python
456457
# Reference Quantized Pattern for quantized linear
@@ -465,7 +466,7 @@ Convert the Calibrated Model to a Quantized Model
465466
return out_int8
466467
467468
468-
Please see `<here https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/pt2e/representation/rewrite.py>`_ for the most up to date reference representations.
469+
See `here <https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/pt2e/representation/rewrite.py>`_ for the most up-to-date reference representations.
469470
470471
471472
Checking Model Size and Accuracy Evaluation

prototype_source/pt2e_quantizer.rst

Lines changed: 21 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -146,29 +146,34 @@ parameters are shared with other tensors. Input of ``SharedQuantizationSpec`` is
146146
can be an input edge or an output value.
147147

148148
.. note::
149-
* Sharing is Transitive
150-
Some Tensors might be effectively be using shared quantization spec due to (1) two nodes/edges are
149+
* Sharing is transitive
150+
151+
Some Tensors might be effectively using shared quantization spec due to (1) two nodes/edges are
151152
configured to use SharedQuantizationSpec (2) there is existing sharing of some of the nodes
152153

153-
For example, let's say we have two conv nodes conv1 and conv2, and both of them are fed into a cat
154-
node. `cat([conv1_out, conv2_out], ...)` Let's say output of conv1, conv2 and first input of cat are configured
155-
with the same configurations of QuantizationSpec, second input of cat is configured to use SharedQuantizationSpec
154+
For example, let's say we have two ``conv`` nodes ``conv1`` and ``conv2``, and both of them are fed into a ``cat``
155+
node. `cat([conv1_out, conv2_out], ...)` Let's say output of ``conv1``, ``conv2`` and the first input of ``cat`` are configured
156+
with the same configurations of ``QuantizationSpec``, second input of ``cat`` is configured to use ``SharedQuantizationSpec``
156157
with the first input.
157-
conv1_out: qspec1(dtype=torch.int8, ...)
158-
conv2_out: qspec1(dtype=torch.int8, ...)
159-
cat_input0: qspec1(dtype=torch.int8, ...)
160-
cat_input1: SharedQuantizationSpec((conv1, cat)) # conv1 node is the first input of cat
161158

162-
First of all, the output of conv1 are implicitly sharing quantization parameter (and observer object)
163-
with first input of cat, and same for output of conv2 and second input of cat.
164-
So since user configures the two input of cat to share quantization parameters, by transitivity,
165-
conv2_out and conv1_out will also be sharing quantization parameters. In the observed graph, you
159+
.. code-block::
160+
161+
conv1_out: qspec1(dtype=torch.int8, ...)
162+
conv2_out: qspec1(dtype=torch.int8, ...)
163+
cat_input0: qspec1(dtype=torch.int8, ...)
164+
cat_input1: SharedQuantizationSpec((conv1, cat)) # conv1 node is the first input of cat
165+
166+
First of all, the output of ``conv1`` is implicitly sharing quantization parameter (and observer object)
167+
with the first input of ``cat``, and same for output of ``conv2`` and the second input of ``cat``.
168+
So since user configures the two inputs of ``cat`` to share quantization parameters, by transitivity,
169+
``conv2_out`` and ``conv1_out`` will also be sharing quantization parameters. In the observed graph, you
166170
will see:
167-
```
171+
.. code-block::
172+
168173
conv1 -> obs -> cat
169174
conv2 -> obs /
170-
```
171-
and both `obs` will be the same observer instance
175+
176+
and both ``obs`` will be the same observer instance.
172177

173178

174179
- Input edge is the connection between input node and the node consuming the input,

0 commit comments

Comments
 (0)