Skip to content

Commit 6eff598

Browse files
author
Svetlana Karslioglu
authored
Merge branch 'main' into tb-profiler-tutorial-docs-update
2 parents 169319e + e28eace commit 6eff598

8 files changed

+919
-3038
lines changed
828 KB
Loading
-16.8 KB
Binary file not shown.

_static/torchvision_finetuning_instance_segmentation.ipynb

Lines changed: 0 additions & 2605 deletions
This file was deleted.

_static/tv-training-code.py

Lines changed: 443 additions & 73 deletions
Large diffs are not rendered by default.

en-wordlist.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,7 @@ TensorBoards
189189
TensorDict
190190
TensorFloat
191191
TextVQA
192+
TODO
192193
Tokenization
193194
TorchDynamo
194195
TorchInductor

intermediate_source/scaled_dot_product_attention_tutorial.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -317,7 +317,7 @@ def generate_rand_batch(
317317
# on the same set of functions for both modules.
318318
# The reason for this here is that ``torch.compile`` is very good at removing the
319319
# framework overhead associated with PyTorch. If your model is launching
320-
# large, efficient CUDA kernels, which in this case ``CausaulSelfAttention``
320+
# large, efficient CUDA kernels, which in this case ``CausalSelfAttention``
321321
# is, then the overhead of PyTorch can be hidden.
322322
#
323323
# In reality, your module does not normally consist of a singular

intermediate_source/torchvision_tutorial.rst

Lines changed: 431 additions & 329 deletions
Large diffs are not rendered by default.

prototype_source/pt2e_quant_ptq_static.rst

Lines changed: 43 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -508,6 +508,10 @@ Now we can compare the size and model accuracy with baseline model.
508508
target device, it's just a representation of quantized computation in ATen
509509
operators.
510510

511+
.. note::
512+
The weights are still in fp32 right now, we may do constant propagation for quantize op to
513+
get integer weights in the future.
514+
511515
If you want to get better accuracy or performance, try configuring
512516
``quantizer`` in different ways, and each ``quantizer`` will have its own way
513517
of configuration, so please consult the documentation for the
@@ -519,46 +523,54 @@ Save and Load Quantized Model
519523

520524
We'll show how to save and load the quantized model.
521525

522-
.. code-block:: python
523526

524-
# 1. Save state_dict
525-
pt2e_quantized_model_file_path = saved_model_dir + "resnet18_pt2e_quantized.pth"
526-
torch.save(quantized_model.state_dict(), pt2e_quantized_model_file_path)
527+
.. code-block:: python
527528
528-
# Get a reference output
529+
# 0. Store reference output, for example, inputs, and check evaluation accuracy:
529530
example_inputs = (next(iter(data_loader))[0],)
530531
ref = quantized_model(*example_inputs)
532+
top1, top5 = evaluate(quantized_model, criterion, data_loader_test)
533+
print("[before serialization] Evaluation accuracy on test dataset: %2.2f, %2.2f"%(top1.avg, top5.avg))
531534
532-
# 2. Initialize the quantized model and Load state_dict
533-
# Rerun all steps to get a quantized model
534-
model_to_quantize = load_model(saved_model_dir + float_model_file).to("cpu")
535-
model_to_quantize.eval()
536-
from torch._export import capture_pre_autograd_graph
537-
538-
exported_model = capture_pre_autograd_graph(model_to_quantize, example_inputs)
539-
from torch.ao.quantization.quantizer.xnnpack_quantizer import (
540-
XNNPACKQuantizer,
541-
get_symmetric_quantization_config,
542-
)
535+
# 1. Export the model and Save ExportedProgram
536+
pt2e_quantized_model_file_path = saved_model_dir + "resnet18_pt2e_quantized.pth"
537+
# capture the model to get an ExportedProgram
538+
quantized_ep = torch.export.export(quantized_model, example_inputs)
539+
# use torch.export.save to save an ExportedProgram
540+
torch.export.save(quantized_ep, pt2e_quantized_model_file_path)
543541
544-
quantizer = XNNPACKQuantizer()
545-
quantizer.set_global(get_symmetric_quantization_config())
546-
prepared_model = prepare_pt2e(exported_model, quantizer)
547-
prepared_model(*example_inputs)
548-
loaded_quantized_model = convert_pt2e(prepared_model)
549542
550-
# load the state_dict from saved file to intialized model
551-
loaded_quantized_model.load_state_dict(torch.load(pt2e_quantized_model_file_path))
543+
# 2. Load the saved ExportedProgram
544+
loaded_quantized_ep = torch.export.load(pt2e_quantized_model_file_path)
545+
loaded_quantized_model = loaded_quantized_ep.module()
552546
553-
# Sanity check with sample data
547+
# 3. Check results for example inputs and check evaluation accuracy again:
554548
res = loaded_quantized_model(*example_inputs)
555-
556-
# 3. Evaluate the loaded quantized model
549+
print("diff:", ref - res)
550+
557551
top1, top5 = evaluate(loaded_quantized_model, criterion, data_loader_test)
558552
print("[after serialization/deserialization] Evaluation accuracy on test dataset: %2.2f, %2.2f"%(top1.avg, top5.avg))
559553
554+
555+
Output:
556+
557+
558+
.. code-block:: python
559+
560+
[before serialization] Evaluation accuracy on test dataset: 79.82, 94.55
561+
diff: tensor([[0., 0., 0., ..., 0., 0., 0.],
562+
[0., 0., 0., ..., 0., 0., 0.],
563+
[0., 0., 0., ..., 0., 0., 0.],
564+
...,
565+
[0., 0., 0., ..., 0., 0., 0.],
566+
[0., 0., 0., ..., 0., 0., 0.],
567+
[0., 0., 0., ..., 0., 0., 0.]])
568+
569+
[after serialization/deserialization] Evaluation accuracy on test dataset: 79.82, 94.55
570+
571+
560572
Debugging the Quantized Model
561-
----------------------------
573+
------------------------------
562574

563575
You can use `Numeric Suite <https://pytorch.org/docs/stable/quantization-accuracy-debugging.html#numerical-debugging-tooling-prototype>`_
564576
that can help with debugging in eager mode and FX graph mode. The new version of
@@ -569,9 +581,10 @@ Lowering and Performance Evaluation
569581

570582
The model produced at this point is not the final model that runs on the device,
571583
it is a reference quantized model that captures the intended quantized computation
572-
from the user, expressed as ATen operators, to get a model that runs on real
573-
devices, we'll need to lower the model. For example for the models that run on
574-
edge devices, we can lower to executorch.
584+
from the user, expressed as ATen operators and some additional quantize/dequantize operators,
585+
to get a model that runs on real devices, we'll need to lower the model.
586+
For example, for the models that run on edge devices, we can lower with delegation and ExecuTorch runtime
587+
operators.
575588

576589
Conclusion
577590
--------------

0 commit comments

Comments
 (0)