Skip to content

💡 [REQUEST] - Write a Tutorial for PyTorch 2.0 Export Quantization Frontend (Quantizer and Annotation API) #2336

Closed
@jerryzh168

Description

@jerryzh168

🚀 Descirbe the improvement or the new tutorial

In PyTorch 2.0, we have a new quantization path that is built on top of the graph captured by torchdynamo.export, see an example flow here: https://github.com/pytorch/pytorch/blob/main/test/quantization/pt2e/test_quantize_pt2e.py#L907, it requires backend developers to write a quantizer, we have an existing quantizer object defined for QNNPack/XNNPack here: https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/_pt2e/quantizer/qnnpack_quantizer.py#L176.

The API that quantizer is interfacing with is called Annotation API, and we just finished design and implementation (WIP as of 05/22, but should be done this week) of this API, and would like to have a tutorial that walks through how to annotate nodes using this API.

Design Doc for Annotation API: https://docs.google.com/document/d/1tjIsL7-uVgm_1bv_kUK7iovP6G1D5zcbzwEcmYEG2Js/edit# please ping @jerryzh168 for access.

General Design Doc for the quantization path in pytorch 2.0: https://docs.google.com/document/d/1_jjXrdaPbkmy7Fzmo35-r1GnNKL7anYoAnqozjyY-XI/edit#

What should the tutorial contain:

  1. overall introduction for pytorch 2.0 export flow, quantizer and annotation API
  2. how to annotate common operator patterns (https://docs.google.com/document/d/1tjIsL7-uVgm_1bv_kUK7iovP6G1D5zcbzwEcmYEG2Js/edit#heading=h.it9h4gjr7m9g), maybe use add as an example instead since bias is not properly handled in the example
  3. how to annotate sharing qparams operators, e.g. cat or add with two inputs sharing quantization parameters
  4. how to annotate fixed qparams operators, e.g. sigmoid (https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/backend_config/_common_operator_config_utils.py#L74)
  5. how to annotate bias for linear (DerivedQuantizationSpec)
  6. put everything together and play around with a toy model and check the output quantized model (after convert_pt2e)

Existing tutorials on this topic

The most relevant tutorial that we have written (by @andrewor14 ) is this:

Additional context

No response

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @ZailiWang @ZhaoqiongZ @leslie-fang-intel @Xia-Weiwen @sekahler2 @CaoE @zhuhaozhe @Valentine233

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions