@@ -526,7 +526,7 @@ We'll show how to save and load the quantized model.
526
526
527
527
.. code-block :: python
528
528
529
- # 0. Store reference output for example inputs and check evaluation accuracy
529
+ # 0. Store reference output, for example, inputs, and check evaluation accuracy:
530
530
example_inputs = (next (iter (data_loader))[0 ],)
531
531
ref = quantized_model(* example_inputs)
532
532
top1, top5 = evaluate(quantized_model, criterion, data_loader_test)
@@ -544,7 +544,7 @@ We'll show how to save and load the quantized model.
544
544
loaded_quantized_ep = torch.export.load(pt2e_quantized_model_file_path)
545
545
loaded_quantized_model = loaded_quantized_ep.module()
546
546
547
- # 3. Check results for example inputs and checke evaluation accuracy again
547
+ # 3. Check results for example inputs and check evaluation accuracy again:
548
548
res = loaded_quantized_model(* example_inputs)
549
549
print (" diff:" , ref - res)
550
550
@@ -583,8 +583,8 @@ The model produced at this point is not the final model that runs on the device,
583
583
it is a reference quantized model that captures the intended quantized computation
584
584
from the user, expressed as ATen operators and some additional quantize/dequantize operators,
585
585
to get a model that runs on real devices, we'll need to lower the model.
586
- For example for the models that run on edge devices, we can lower with delegation and executorch runtime
587
- operators..
586
+ For example, for the models that run on edge devices, we can lower with delegation and ExecuTorch runtime
587
+ operators.
588
588
589
589
Conclusion
590
590
--------------
0 commit comments