@@ -27,22 +27,22 @@ and ``BackendConfig`` to specify the supported ways of quantization in their bac
27
27
This API covers most use cases relatively well, but the main problem is that this API is not fully extensible
28
28
without involvement of the quantization team:
29
29
30
- - Limitation around expressing quantization intentions for complicated operator patterns such as in the discussion of
31
- `issue -96288 <https://github.com/pytorch/pytorch/issues/96288 >`__ to support ``conv add `` fusion with oneDNN library .
32
- It also requires some changes to current already complicated pattern matching code such as in the
33
- `PR-97122 <https://github.com/pytorch/pytorch/pull/97122 >`__ to support `` conv add `` fusion .
34
- - Limitation around supporting user's advanced quantization intention to quantize their model. For example, if backend
30
+ - Current API has limitation around expressing quantization intentions for complicated operator patterns such as in the discussion of
31
+ `Issue -96288 <https://github.com/pytorch/pytorch/issues/96288 >`__ to support ``conv add `` fusion.
32
+ Supporting `` conv add `` fusion also requires some changes to current already complicated pattern matching code such as in the
33
+ `PR-97122 <https://github.com/pytorch/pytorch/pull/97122 >`__.
34
+ - Current API also has limitation around supporting user's advanced quantization intention to quantize their model. For example, if backend
35
35
developer only wants to quantize inputs and outputs when the ``linear `` has a third input, it requires co-work from quantization
36
36
team and backend developer.
37
- - Currently we use ``QConfigMapping `` and ``BackendConfig `` as separate object. ``QConfigMapping `` describes user's
37
+ - Current API uses ``QConfigMapping `` and ``BackendConfig `` as separate object. ``QConfigMapping `` describes user's
38
38
intention of how they want their model to be quantized. ``BackendConfig `` describes what kind of quantization a backend support.
39
- Currently ``BackendConfig `` is backend specific, but ``QConfigMapping `` is not. And user can provide a ``QConfigMapping ``
40
- that is incompatible with a specific BackendConfig. This is not a great UX. Ideally we can structure this better
39
+ Currently, ``BackendConfig `` is backend specific, but ``QConfigMapping `` is not. And user can provide a ``QConfigMapping ``
40
+ that is incompatible with a specific `` BackendConfig `` . This is not a great UX. Ideally, we can structure this better
41
41
by making both configuration (``QConfigMapping ``) and quantization capability (``BackendConfig ``) backend
42
- specific, so there will be less confusion about incompatibilities.
43
- - Currently in ``QConfig `` we are exposing observer/fake_quant classes as an object for user to configure quantization.
44
- This increases the things that user may need to care about, e.g. not only the ``dtype `` but also how the observation should
45
- happen. These could potentially be hidden from user so that the user interface is simpler.
42
+ specific. So there will be less confusion about incompatibilities.
43
+ - Currently, in ``QConfig `` we are exposing observer/fake_quant classes as an object for user to configure quantization.
44
+ This increases the things that user needs to care about, e.g. not only the ``dtype `` but also how the observation should
45
+ happen. These could potentially be hidden from user to make user interface simpler.
46
46
47
47
To address these scalability issues,
48
48
`Quantizer <https://github.com/pytorch/pytorch/blob/3e988316b5976df560c51c998303f56a234a6a1f/torch/ao/quantization/_pt2e/quantizer/quantizer.py#L160 >`__
@@ -137,18 +137,18 @@ Taking QNNPackQuantizer as an example, the overall Quantization 2.0 flow could b
137
137
# Step 4: Lower Reference Quantized Model into the backend
138
138
139
139
Quantizer uses annotation API to convey quantization intent for different operators/patterns.
140
- Annotation API uses `` QuantizationSpec `` (
141
- `definition is here <https://github.com/pytorch/pytorch/blob/1ca2e993af6fa6934fca35da6970308ce227ddc7/torch/ao/quantization/_pt2e/quantizer/quantizer.py#L38 >`__
142
- ) to convey intent of how a tensor will be quantized,
140
+ Annotation API uses
141
+ `QuantizationSpec <https://github.com/pytorch/pytorch/blob/1ca2e993af6fa6934fca35da6970308ce227ddc7/torch/ao/quantization/_pt2e/quantizer/quantizer.py#L38 >`__
142
+ to convey intent of how a tensor will be quantized,
143
143
e.g. dtype, bitwidth, min, max values, symmetric vs. asymmetric etc.
144
144
Furthermore, annotation API also allows quantizer to specify how a
145
145
tensor value should be observed, e.g. ``MinMaxObserver ``, or ``HistogramObserver ``
146
146
, or some customized observer.
147
147
148
- ``QuantizationSpec `` is used to annotate nodes' output tensor or input tensors . Annotating
148
+ ``QuantizationSpec `` is used to annotate nodes' input tensors or output tensor . Annotating
149
149
input tensors is equivalent of annotating edge of the graph, while annotating output tensor is
150
- equivalent of annotating node. Thus annotation API requires quantizer to annotate nodes (output tensor)
151
- or edges (input tensors) of the graph.
150
+ equivalent of annotating node. Thus annotation API requires quantizer to annotate
151
+ edges (input tensors) or nodes (output tensor ) of the graph.
152
152
153
153
Now, we will have a step-by-step tutorial for how to use the annotation API with different types of
154
154
``QuantizationSpec ``.
@@ -162,7 +162,7 @@ inputs, output of the pattern. Following is an example flow (take ``add`` operat
162
162
of how this intent is conveyed in the quantization workflow with annotation API.
163
163
164
164
- Step 1: Identify the original floating point pattern in the FX graph. There are
165
- several ways to identify this pattern: Quantizer may use a pattern matcher (e.g. SubgraphMatcher)
165
+ several ways to identify this pattern: Quantizer may use a pattern matcher
166
166
to match the operator pattern; Quantizer may go through the nodes from start to the end and compare
167
167
the node's target type to match the operator pattern. In this example, we can use the
168
168
`get_source_partitions <https://github.com/pytorch/pytorch/blob/07104ca99c9d297975270fb58fda786e60b49b38/torch/fx/passes/utils/source_matcher_utils.py#L51 >`__
@@ -200,7 +200,7 @@ of how this intent is conveyed in the quantization workflow with annotation API.
200
200
``output_qspec `` field expresses the ``QuantizationSpec `` used to
201
201
annotate the output node; ``_annotated `` field indicates if this node has already been annotated by quantizer.
202
202
In this example, we will create the ``QuantizationAnnotation `` object with the ``QuantizationSpec `` objects
203
- created in above step 2 for two inputs and one output of ``add `` node.
203
+ created in above step 2 for two inputs and one output of the ``add `` node.
204
204
205
205
::
206
206
0 commit comments