Skip to content

Commit 27e5b97

Browse files
committed
[quant][pt2e] Update save/load model for pt2 export quant tutorial
Summary: att Test Plan: visual inspection of generated pages Reviewers: Subscribers: Tasks: Tags:
1 parent acee6c7 commit 27e5b97

File tree

1 file changed

+9
-3
lines changed

1 file changed

+9
-3
lines changed

prototype_source/pt2e_quant_ptq_static.rst

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -523,7 +523,9 @@ Save and Load Quantized Model
523523

524524
We'll show how to save and load the quantized model.
525525

526+
526527
.. code-block:: python
528+
527529
# 0. Store reference output for example inputs and check evaluation accuracy
528530
example_inputs = (next(iter(data_loader))[0],)
529531
ref = quantized_model(*example_inputs)
@@ -551,7 +553,10 @@ We'll show how to save and load the quantized model.
551553
552554
553555
Output:
556+
557+
554558
.. code-block:: python
559+
555560
[before serialization] Evaluation accuracy on test dataset: 79.82, 94.55
556561
diff: tensor([[0., 0., 0., ..., 0., 0., 0.],
557562
[0., 0., 0., ..., 0., 0., 0.],
@@ -576,9 +581,10 @@ Lowering and Performance Evaluation
576581

577582
The model produced at this point is not the final model that runs on the device,
578583
it is a reference quantized model that captures the intended quantized computation
579-
from the user, expressed as ATen operators, to get a model that runs on real
580-
devices, we'll need to lower the model. For example for the models that run on
581-
edge devices, we can lower to executorch.
584+
from the user, expressed as ATen operators and some additional quantize/dequantize operators,
585+
to get a model that runs on real devices, we'll need to lower the model.
586+
For example for the models that run on edge devices, we can lower with delegation and executorch runtime
587+
operators..
582588

583589
Conclusion
584590
--------------

0 commit comments

Comments
 (0)