Description
🚀 Descirbe the improvement or the new tutorial
In PyTorch 2.0, we have a new quantization path that is built on top of the graph captured by torchdynamo.export, see an example flow here: https://github.com/pytorch/pytorch/blob/main/test/quantization/pt2e/test_quantize_pt2e.py#L907, it requires backend developers to write a quantizer, we have an existing quantizer object defined for QNNPack/XNNPack here: https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/_pt2e/quantizer/qnnpack_quantizer.py#L176.
The API that quantizer is interfacing with is called Annotation API, and we just finished design and implementation (WIP as of 05/22, but should be done this week) of this API, and would like to have a tutorial that walks through how to annotate nodes using this API.
Design Doc for Annotation API: https://docs.google.com/document/d/1tjIsL7-uVgm_1bv_kUK7iovP6G1D5zcbzwEcmYEG2Js/edit# please ping @jerryzh168 for access.
General Design Doc for the quantization path in pytorch 2.0: https://docs.google.com/document/d/1_jjXrdaPbkmy7Fzmo35-r1GnNKL7anYoAnqozjyY-XI/edit#
What should the tutorial contain:
- overall introduction for pytorch 2.0 export flow, quantizer and annotation API
- how to annotate common operator patterns (https://docs.google.com/document/d/1tjIsL7-uVgm_1bv_kUK7iovP6G1D5zcbzwEcmYEG2Js/edit#heading=h.it9h4gjr7m9g), maybe use add as an example instead since bias is not properly handled in the example
- how to annotate sharing qparams operators, e.g. cat or add with two inputs sharing quantization parameters
- how to annotate fixed qparams operators, e.g. sigmoid (https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/backend_config/_common_operator_config_utils.py#L74)
- how to annotate bias for linear (DerivedQuantizationSpec)
- put everything together and play around with a toy model and check the output quantized model (after convert_pt2e)
Existing tutorials on this topic
The most relevant tutorial that we have written (by @andrewor14 ) is this:
Additional context
No response
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @ZailiWang @ZhaoqiongZ @leslie-fang-intel @Xia-Weiwen @sekahler2 @CaoE @zhuhaozhe @Valentine233