Skip to content

Commit d9d3245

Browse files
adjust descriptation
1 parent 5a43584 commit d9d3245

File tree

1 file changed

+20
-20
lines changed

1 file changed

+20
-20
lines changed

prototype_source/quantization_in_pytorch_2_0_export_tutorial.rst

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -27,22 +27,22 @@ and ``BackendConfig`` to specify the supported ways of quantization in their bac
2727
This API covers most use cases relatively well, but the main problem is that this API is not fully extensible
2828
without involvement of the quantization team:
2929

30-
- Limitation around expressing quantization intentions for complicated operator patterns such as in the discussion of
31-
`issue-96288 <https://github.com/pytorch/pytorch/issues/96288>`__ to support ``conv add`` fusion with oneDNN library.
32-
It also requires some changes to current already complicated pattern matching code such as in the
33-
`PR-97122 <https://github.com/pytorch/pytorch/pull/97122>`__ to support ``conv add`` fusion.
34-
- Limitation around supporting user's advanced quantization intention to quantize their model. For example, if backend
30+
- Current API has limitation around expressing quantization intentions for complicated operator patterns such as in the discussion of
31+
`Issue-96288 <https://github.com/pytorch/pytorch/issues/96288>`__ to support ``conv add`` fusion.
32+
Supporting ``conv add`` fusion also requires some changes to current already complicated pattern matching code such as in the
33+
`PR-97122 <https://github.com/pytorch/pytorch/pull/97122>`__.
34+
- Current API also has limitation around supporting user's advanced quantization intention to quantize their model. For example, if backend
3535
developer only wants to quantize inputs and outputs when the ``linear`` has a third input, it requires co-work from quantization
3636
team and backend developer.
37-
- Currently we use ``QConfigMapping`` and ``BackendConfig`` as separate object. ``QConfigMapping`` describes user's
37+
- Current API uses ``QConfigMapping`` and ``BackendConfig`` as separate object. ``QConfigMapping`` describes user's
3838
intention of how they want their model to be quantized. ``BackendConfig`` describes what kind of quantization a backend support.
39-
Currently ``BackendConfig`` is backend specific, but ``QConfigMapping`` is not. And user can provide a ``QConfigMapping``
40-
that is incompatible with a specific BackendConfig. This is not a great UX. Ideally we can structure this better
39+
Currently, ``BackendConfig`` is backend specific, but ``QConfigMapping`` is not. And user can provide a ``QConfigMapping``
40+
that is incompatible with a specific ``BackendConfig``. This is not a great UX. Ideally, we can structure this better
4141
by making both configuration (``QConfigMapping``) and quantization capability (``BackendConfig``) backend
42-
specific, so there will be less confusion about incompatibilities.
43-
- Currently in ``QConfig`` we are exposing observer/fake_quant classes as an object for user to configure quantization.
44-
This increases the things that user may need to care about, e.g. not only the ``dtype`` but also how the observation should
45-
happen. These could potentially be hidden from user so that the user interface is simpler.
42+
specific. So there will be less confusion about incompatibilities.
43+
- Currently, in ``QConfig`` we are exposing observer/fake_quant classes as an object for user to configure quantization.
44+
This increases the things that user needs to care about, e.g. not only the ``dtype`` but also how the observation should
45+
happen. These could potentially be hidden from user to make user interface simpler.
4646

4747
To address these scalability issues,
4848
`Quantizer <https://github.com/pytorch/pytorch/blob/3e988316b5976df560c51c998303f56a234a6a1f/torch/ao/quantization/_pt2e/quantizer/quantizer.py#L160>`__
@@ -137,18 +137,18 @@ Taking QNNPackQuantizer as an example, the overall Quantization 2.0 flow could b
137137
# Step 4: Lower Reference Quantized Model into the backend
138138

139139
Quantizer uses annotation API to convey quantization intent for different operators/patterns.
140-
Annotation API uses ``QuantizationSpec`` (
141-
`definition is here <https://github.com/pytorch/pytorch/blob/1ca2e993af6fa6934fca35da6970308ce227ddc7/torch/ao/quantization/_pt2e/quantizer/quantizer.py#L38>`__
142-
) to convey intent of how a tensor will be quantized,
140+
Annotation API uses
141+
`QuantizationSpec <https://github.com/pytorch/pytorch/blob/1ca2e993af6fa6934fca35da6970308ce227ddc7/torch/ao/quantization/_pt2e/quantizer/quantizer.py#L38>`__
142+
to convey intent of how a tensor will be quantized,
143143
e.g. dtype, bitwidth, min, max values, symmetric vs. asymmetric etc.
144144
Furthermore, annotation API also allows quantizer to specify how a
145145
tensor value should be observed, e.g. ``MinMaxObserver``, or ``HistogramObserver``
146146
, or some customized observer.
147147

148-
``QuantizationSpec`` is used to annotate nodes' output tensor or input tensors. Annotating
148+
``QuantizationSpec`` is used to annotate nodes' input tensors or output tensor. Annotating
149149
input tensors is equivalent of annotating edge of the graph, while annotating output tensor is
150-
equivalent of annotating node. Thus annotation API requires quantizer to annotate nodes (output tensor)
151-
or edges (input tensors) of the graph.
150+
equivalent of annotating node. Thus annotation API requires quantizer to annotate
151+
edges (input tensors) or nodes (output tensor) of the graph.
152152

153153
Now, we will have a step-by-step tutorial for how to use the annotation API with different types of
154154
``QuantizationSpec``.
@@ -162,7 +162,7 @@ inputs, output of the pattern. Following is an example flow (take ``add`` operat
162162
of how this intent is conveyed in the quantization workflow with annotation API.
163163

164164
- Step 1: Identify the original floating point pattern in the FX graph. There are
165-
several ways to identify this pattern: Quantizer may use a pattern matcher (e.g. SubgraphMatcher)
165+
several ways to identify this pattern: Quantizer may use a pattern matcher
166166
to match the operator pattern; Quantizer may go through the nodes from start to the end and compare
167167
the node's target type to match the operator pattern. In this example, we can use the
168168
`get_source_partitions <https://github.com/pytorch/pytorch/blob/07104ca99c9d297975270fb58fda786e60b49b38/torch/fx/passes/utils/source_matcher_utils.py#L51>`__
@@ -200,7 +200,7 @@ of how this intent is conveyed in the quantization workflow with annotation API.
200200
``output_qspec`` field expresses the ``QuantizationSpec`` used to
201201
annotate the output node; ``_annotated`` field indicates if this node has already been annotated by quantizer.
202202
In this example, we will create the ``QuantizationAnnotation`` object with the ``QuantizationSpec`` objects
203-
created in above step 2 for two inputs and one output of ``add`` node.
203+
created in above step 2 for two inputs and one output of the ``add`` node.
204204

205205
::
206206

0 commit comments

Comments
 (0)