@@ -523,7 +523,9 @@ Save and Load Quantized Model
523
523
524
524
We'll show how to save and load the quantized model.
525
525
526
+
526
527
.. code-block :: python
528
+
527
529
# 0. Store reference output for example inputs and check evaluation accuracy
528
530
example_inputs = (next (iter (data_loader))[0 ],)
529
531
ref = quantized_model(* example_inputs)
@@ -551,7 +553,10 @@ We'll show how to save and load the quantized model.
551
553
552
554
553
555
Output:
556
+
557
+
554
558
.. code-block :: python
559
+
555
560
[before serialization] Evaluation accuracy on test dataset: 79.82 , 94.55
556
561
diff: tensor([[0 ., 0 ., 0 ., ... , 0 ., 0 ., 0 .],
557
562
[0 ., 0 ., 0 ., ... , 0 ., 0 ., 0 .],
@@ -576,9 +581,10 @@ Lowering and Performance Evaluation
576
581
577
582
The model produced at this point is not the final model that runs on the device,
578
583
it is a reference quantized model that captures the intended quantized computation
579
- from the user, expressed as ATen operators, to get a model that runs on real
580
- devices, we'll need to lower the model. For example for the models that run on
581
- edge devices, we can lower to executorch.
584
+ from the user, expressed as ATen operators and some additional quantize/dequantize operators,
585
+ to get a model that runs on real devices, we'll need to lower the model.
586
+ For example for the models that run on edge devices, we can lower with delegation and executorch runtime
587
+ operators..
582
588
583
589
Conclusion
584
590
--------------
0 commit comments