💡 [REQUEST] - Write a Tutorial for PyTorch 2.0 Export Quantization Frontend (Quantizer and Annotation API)

### 🚀 Descirbe the improvement or the new tutorial

In PyTorch 2.0, we have a new quantization path that is built on top of the graph captured by torchdynamo.export, see an example flow here: https://github.com/pytorch/pytorch/blob/main/test/quantization/pt2e/test_quantize_pt2e.py#L907, it  requires backend developers to write a quantizer, we have an existing quantizer object defined for QNNPack/XNNPack here: https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/_pt2e/quantizer/qnnpack_quantizer.py#L176.

The API that quantizer is interfacing with is called Annotation API, and we just finished design and implementation (WIP as of 05/22, but should be done this week) of this API, and would like to have a tutorial that walks through how to annotate nodes using this API.

Design Doc for Annotation API: https://docs.google.com/document/d/1tjIsL7-uVgm_1bv_kUK7iovP6G1D5zcbzwEcmYEG2Js/edit# please ping @jerryzh168 for access.

General Design Doc for the quantization path in pytorch 2.0: https://docs.google.com/document/d/1_jjXrdaPbkmy7Fzmo35-r1GnNKL7anYoAnqozjyY-XI/edit#


What should the tutorial contain:
1. overall introduction for pytorch 2.0 export flow, quantizer and annotation API
2. how to annotate common operator patterns (https://docs.google.com/document/d/1tjIsL7-uVgm_1bv_kUK7iovP6G1D5zcbzwEcmYEG2Js/edit#heading=h.it9h4gjr7m9g), maybe use add as an example instead since bias is not properly handled in the example
3. how to annotate sharing qparams operators, e.g. cat or add with two inputs sharing quantization parameters
4. how to annotate fixed qparams operators, e.g. sigmoid (https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/backend_config/_common_operator_config_utils.py#L74)
5. how to annotate bias for linear (DerivedQuantizationSpec)
6. put everything together and play around with a toy model and check the output quantized model (after convert_pt2e)


### Existing tutorials on this topic

The most relevant tutorial that we have written (by @andrewor14 ) is this:
* https://pytorch.org/tutorials/prototype/backend_config_tutorial.html?highlight=fx%20graph%20mode%20quantization

### Additional context

_No response_

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @ZailiWang @ZhaoqiongZ @leslie-fang-intel @Xia-Weiwen @sekahler2 @CaoE @zhuhaozhe @Valentine233

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

💡 [REQUEST] - Write a Tutorial for PyTorch 2.0 Export Quantization Frontend (Quantizer and Annotation API) #2336

🚀 Descirbe the improvement or the new tutorial

Existing tutorials on this topic

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

💡 [REQUEST] - Write a Tutorial for PyTorch 2.0 Export Quantization Frontend (Quantizer and Annotation API) #2336

Description

🚀 Descirbe the improvement or the new tutorial

Existing tutorials on this topic

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions