Skip to content

Commit 994bd83

Browse files
leslie-fang-intelSvetlana Karslioglu
and
Svetlana Karslioglu
authored
modify quantization in pytorch.2.0 export tutorial (#2456)
Update prototype_source/quantization_in_pytorch_2_0_export_tutorial.rst --------- Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
1 parent 6b087cf commit 994bd83

File tree

1 file changed

+30
-35
lines changed

1 file changed

+30
-35
lines changed

prototype_source/quantization_in_pytorch_2_0_export_tutorial.rst

Lines changed: 30 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -14,54 +14,46 @@ have significantly higher model coverage, better programmability, and
1414
a simplified UX.
1515

1616
Prerequisites:
17-
-----------------------
17+
^^^^^^^^^^^^^^^^
1818

19-
- `Understanding of torchdynamo concepts in PyTorch <https://pytorch.org/docs/stable/dynamo/index.html>`__
20-
- `Understanding of the quantization concepts in PyTorch <https://pytorch.org/docs/master/quantization.html#quantization-api-summary>`__
21-
- `Understanding of FX Graph Mode post training static quantization <https://pytorch.org/tutorials/prototype/fx_graph_mode_ptq_static.html>`__
22-
- `Understanding of BackendConfig in PyTorch Quantization FX Graph Mode <https://pytorch.org/tutorials/prototype/backend_config_tutorial.html?highlight=backend>`__
23-
- `Understanding of QConfig and QConfigMapping in PyTorch Quantization FX Graph Mode <https://pytorch.org/tutorials/prototype/backend_config_tutorial.html#set-up-qconfigmapping-that-satisfies-the-backend-constraints>`__
19+
- `Torchdynamo concepts in PyTorch <https://pytorch.org/docs/stable/dynamo/index.html>`__
20+
- `Quantization concepts in PyTorch <https://pytorch.org/docs/master/quantization.html#quantization-api-summary>`__
21+
- `FX Graph Mode post training static quantization <https://pytorch.org/tutorials/prototype/fx_graph_mode_ptq_static.html>`__
22+
- `BackendConfig in PyTorch Quantization FX Graph Mode <https://pytorch.org/tutorials/prototype/backend_config_tutorial.html?highlight=backend>`__
23+
- `QConfig and QConfigMapping in PyTorch Quantization FX Graph Mode <https://pytorch.org/tutorials/prototype/backend_config_tutorial.html#set-up-qconfigmapping-that-satisfies-the-backend-constraints>`__
24+
25+
Introduction:
26+
^^^^^^^^^^^^^^^^
2427

2528
Previously in ``FX Graph Mode Quantization`` we were using ``QConfigMapping`` for users to specify how the model to be quantized
2629
and ``BackendConfig`` to specify the supported ways of quantization in their backend.
2730
This API covers most use cases relatively well, but the main problem is that this API is not fully extensible
2831
without involvement of the quantization team:
2932

30-
- This API has limitation around expressing quantization intentions for complicated operator patterns such as in the discussion of
31-
`Issue-96288 <https://github.com/pytorch/pytorch/issues/96288>`__ to support ``conv add`` fusion.
32-
Supporting ``conv add`` fusion also requires some changes to current already complicated pattern matching code such as in the
33-
`PR-97122 <https://github.com/pytorch/pytorch/pull/97122>`__.
34-
- This API also has limitation around supporting user's advanced quantization intention to quantize their model. For example, if backend
35-
developer only wants to quantize inputs and outputs when the ``linear`` has a third input, it requires co-work from quantization
36-
team and backend developer.
37-
- This API uses ``QConfigMapping`` and ``BackendConfig`` as separate object. ``QConfigMapping`` describes user's
38-
intention of how they want their model to be quantized. ``BackendConfig`` describes what kind of quantization a backend support.
39-
``BackendConfig`` is backend specific, but ``QConfigMapping`` is not. And user can provide a ``QConfigMapping``
40-
that is incompatible with a specific ``BackendConfig``. This is not a great UX. Ideally, we can structure this better
41-
by making both configuration (``QConfigMapping``) and quantization capability (``BackendConfig``) backend
42-
specific. So there will be less confusion about incompatibilities.
43-
- In ``QConfig``, we are exposing observer/fake_quant classes as an object for user to configure quantization.
44-
This increases the things that user needs to care about, e.g. not only the ``dtype`` but also how the
45-
observation should happen. These could potentially be hidden from user to make user interface simpler.
46-
47-
To address these scalability issues,
33+
- This API has limitation to support advanced quantization intention and complicated quantization operator patterns
34+
as in the discussion of `Issue-96288 <https://github.com/pytorch/pytorch/issues/96288>`__ to support ``conv add`` fusion.
35+
- This API uses ``QConfigMapping`` and ``BackendConfig`` as separate object in quantization configuration
36+
which may cause confusion about incompatibilities between these two objects. Also these quantization configurations require
37+
too much quantization details users need to know which can be hidden from user interface to make it simpler.
38+
39+
To address these issues,
4840
`Quantizer <https://github.com/pytorch/pytorch/blob/3e988316b5976df560c51c998303f56a234a6a1f/torch/ao/quantization/_pt2e/quantizer/quantizer.py#L160>`__
4941
is introduced for quantization in PyTorch 2.0 export. ``Quantizer`` is a class that users can use to
5042
programmatically set the quantization specifications for input and output of each node in the model graph. It adds flexibility
5143
to the quantization API and allows modeling users and backend developers to configure quantization programmatically.
5244
This will allow users to express how they want an operator pattern to be observed in a more explicit
53-
way by annotating the appropriate nodes. A backend specific quantizer inherited from base quantizer,
54-
some methods that need to be implemented:
55-
56-
- `annotate method <https://github.com/pytorch/pytorch/blob/3e988316b5976df560c51c998303f56a234a6a1f/torch/ao/quantization/_pt2e/quantizer/qnnpack_quantizer.py#L269>`__
57-
is used to annotate nodes in the graph with
58-
`QuantizationAnnotation <https://github.com/pytorch/pytorch/blob/07104ca99c9d297975270fb58fda786e60b49b38/torch/ao/quantization/_pt2e/quantizer/quantizer.py#L144>`__
59-
objects to convey the desired way of quantization.
45+
way by annotating the appropriate nodes.
6046

6147
Imagine a backend developer who wishes to integrate a third-party backend
6248
with PyTorch's quantization 2.0 flow. To accomplish this, they would only need
63-
to define the backend specific quantizer. The high level architecture of
64-
quantization 2.0 with quantizer could look like this:
49+
to define the backend specific quantizer. A backend specific quantizer inherited from base quantizer.
50+
The main method that need to be implemented for the backend specific quantizer is the
51+
`annotate method <https://github.com/pytorch/pytorch/blob/3e988316b5976df560c51c998303f56a234a6a1f/torch/ao/quantization/_pt2e/quantizer/qnnpack_quantizer.py#L269>`__
52+
which is used to annotate nodes in the graph with
53+
`QuantizationAnnotation <https://github.com/pytorch/pytorch/blob/07104ca99c9d297975270fb58fda786e60b49b38/torch/ao/quantization/_pt2e/quantizer/quantizer.py#L144>`__
54+
objects to convey the desired way of quantization.
55+
56+
The high level architecture of quantization 2.0 with quantizer could look like this:
6557

6658
::
6759

@@ -136,6 +128,9 @@ Taking QNNPackQuantizer as an example, the overall Quantization 2.0 flow could b
136128

137129
# Step 4: Lower Reference Quantized Model into the backend
138130

131+
Annotation API:
132+
^^^^^^^^^^^^^^^^^^^
133+
139134
``Quantizer`` uses annotation API to convey quantization intent for different operators/patterns.
140135
Annotation API mainly consists of
141136
`QuantizationSpec <https://github.com/pytorch/pytorch/blob/1ca2e993af6fa6934fca35da6970308ce227ddc7/torch/ao/quantization/_pt2e/quantizer/quantizer.py#L38>`__
@@ -366,8 +361,8 @@ functions that are used in the example:
366361
`get_bias_qspec <https://github.com/pytorch/pytorch/blob/47cfcf566ab76573452787335f10c9ca185752dc/torch/ao/quantization/_pt2e/quantizer/utils.py#L53>`__
367362
can be used to get the ``QuantizationSpec`` from ``QuantizationConfig`` for a specific pattern.
368363

369-
6. Conclusion
370-
---------------------
364+
Conclusion
365+
^^^^^^^^^^^^^^^^^^^
371366

372367
With this tutorial, we introduce the new quantization path in PyTorch 2.0. Users can learn about
373368
how to define a ``BackendQuantizer`` with the ``QuantizationAnnotation API`` and integrate it into the quantization 2.0 flow.

0 commit comments

Comments
 (0)