You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
``capture_pre_autograd_graph`` is a short term API, it will be updated to use the offical ``torch.export`` API when that is ready.
359
366
360
367
361
368
Import the Backend Specific Quantizer and Configure how to Quantize the Model
@@ -429,24 +436,47 @@ Convert the Calibrated Model to a Quantized Model
429
436
quantized_model = convert_pt2e(prepared_model)
430
437
print(quantized_model)
431
438
432
-
.. note::
433
-
the model produced here also had some improvement upon the previous
434
-
`representations <https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md>`_ in the FX graph mode quantizaiton, previously all quantized operators are represented as ``dequantize -> fp32_op -> qauntize``, in the new flow, we choose to represent some of the operators with integer computation so that it's closer to the computation happens in hardwares.
435
-
For example, here is how we plan to represent a quantized linear operator:
439
+
At this step, we currently have two representations that you can choose from, but exact representation
440
+
we offer in the long term might change based on feedback from PyTorch users.
441
+
442
+
* Q/DQ Representation (default)
443
+
444
+
Previous documentation for `representations <https://github.com/pytorch/rfcs/blob/master/RFC-0019-
445
+
Extending-PyTorch-Quantization-to-Custom-Backends.md>`_ all quantized operators are represented as ``dequantize -> fp32_op -> qauntize``.
* Reference Quantized Model Representation (WIP, expected to be ready at end of August): we have special representation for selected ops (for example, quantized linear), other ops are represented as (``dq -> float32_op -> q``), and ``q/dq`` are decomposed into more primitive operators.
461
+
462
+
You can get this representation by using ``convert_pt2e(..., use_reference_representation=True)``.
463
+
464
+
.. code-block:: python
465
+
466
+
# Reference Quantized Pattern for quantized linear
`Quantized Model Representation <https://docs.google.com/document/d/17h-OEtD4o_hoVuPqUFsdm5uo7psiNMY8ThN03F9ZZwg/edit>`_.
479
+
See `here <https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/pt2e/representation/rewrite.py>`_ for the most up-to-date reference representations.
450
480
451
481
452
482
Checking Model Size and Accuracy Evaluation
@@ -503,9 +533,9 @@ We'll show how to save and load the quantized model.
0 commit comments