You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- `Understanding of the quantization concepts in PyTorch <https://pytorch.org/docs/master/quantization.html#quantization-api-summary>`__
21
21
- `Understanding of FX Graph Mode post training static quantization <https://pytorch.org/tutorials/prototype/fx_graph_mode_ptq_static.html>`__
22
22
- `Understanding of BackendConfig in PyTorch Quantization FX Graph Mode <https://pytorch.org/tutorials/prototype/backend_config_tutorial.html?highlight=backend>`__
23
-
- `Understanding of QConfigMapping in PyTorch Quantization FX Graph Mode <https://pytorch.org/tutorials/prototype/backend_config_tutorial.html#set-up-qconfigmapping-that-satisfies-the-backend-constraints>`__
23
+
- `Understanding of QConfig and QConfigMapping in PyTorch Quantization FX Graph Mode <https://pytorch.org/tutorials/prototype/backend_config_tutorial.html#set-up-qconfigmapping-that-satisfies-the-backend-constraints>`__
24
24
25
25
Previously in ``FX Graph Mode Quantization`` we were using ``QConfigMapping`` for users to specify how the model to be quantized
26
26
and ``BackendConfig`` to specify the supported ways of quantization in their backend.
27
27
This API covers most use cases relatively well, but the main problem is that this API is not fully extensible
28
-
with two main limitations:
28
+
without involvement of the quantization team:
29
29
30
30
- Limitation around expressing quantization intentions for complicated operator patterns such as in the discussion of
31
31
`issue-96288 <https://github.com/pytorch/pytorch/issues/96288>`__ to support ``conv add`` fusion with oneDNN library.
@@ -34,6 +34,15 @@ with two main limitations:
34
34
- Limitation around supporting user's advanced quantization intention to quantize their model. For example, if backend
35
35
developer only wants to quantize inputs and outputs when the ``linear`` has a third input, it requires co-work from quantization
36
36
team and backend developer.
37
+
- Currently we use ``QConfigMapping`` and ``BackendConfig`` as separate object. ``QConfigMapping`` describes user's
38
+
intention of how they want their model to be quantized. ``BackendConfig`` describes what kind of quantization a backend support.
39
+
Currently ``BackendConfig`` is backend specific, but ``QConfigMapping`` is not. And user can provide a ``QConfigMapping``
40
+
that is incompatible with a specific BackendConfig. This is not a great UX. Ideally we can structure this better
41
+
by making both configuration (``QConfigMapping``) and quantization capability (``BackendConfig``) backend
42
+
specific, so there will be less confusion about incompatibilities.
43
+
- Currently in ``QConfig`` we are exposing observer/fake_quant classes as an object for user to configure quantization.
44
+
This increases the things that user may need to care about, e.g. not only the ``dtype`` but also how the observation should
45
+
happen. These could potentially be hidden from user so that the user interface is simpler.
@@ -127,24 +136,34 @@ Taking QNNPackQuantizer as an example, the overall Quantization 2.0 flow could b
127
136
128
137
# Step 4: Lower Reference Quantized Model into the backend
129
138
130
-
Inside the Quantizer, we will use the ``QuantizationAnnotation API``
131
-
to convey user's intent for what quantization spec to use and how to
132
-
observe certain tensor values in the prepare step. Now, we will have a step-by-step
133
-
tutorial for how to use the ``QuantizationAnnotation API`` with different types of
139
+
Quantizer uses annotation API to convey quantization intent for different operators/patterns.
140
+
Annotation API uses ``QuantizationSpec`` (
141
+
`definition is here <https://github.com/pytorch/pytorch/blob/1ca2e993af6fa6934fca35da6970308ce227ddc7/torch/ao/quantization/_pt2e/quantizer/quantizer.py#L38>`__
142
+
) to convey intent of how a tensor will be quantized,
143
+
e.g. dtype, bitwidth, min, max values, symmetric vs. asymmetric etc.
144
+
Furthermore, annotation API also allows quantizer to specify how a
145
+
tensor value should be observed, e.g. ``MinMaxObserver``, or ``HistogramObserver``
146
+
, or some customized observer.
147
+
148
+
``QuantizationSpec`` is used to annotate nodes' output tensor or input tensors. Annotating
149
+
input tensors is equivalent of annotating edge of the graph, while annotating output tensor is
150
+
equivalent of annotating node. Thus annotation API requires quantizer to annotate nodes (output tensor)
151
+
or edges (input tensors) of the graph.
152
+
153
+
Now, we will have a step-by-step tutorial for how to use the annotation API with different types of
0 commit comments