Skip to content

Commit 885d83d

Browse files
HDCharlesSvetlana KarsliogluTriapitsyn
authored
[ao] fixing tutorials (#2140)
* [ao] fixing tutorials Summary: fixing tutorial to have example_inputs and QConfigMapping Test Plan: for fx_graph_mode_ptq_static.rst and fx_graph_mode_ptq_dynamic.py I pasted and ran the script to verify that it ran, the latter could be run exactly while the former required a slight modification since I didn't had a different version of the imagenet dataset but only the dataloaders had to be modified. for the quant_guide no explicit testing was done Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: d142b2f Pull Request resolved: #2137 * Fix typo "tesors" (#2138) * [ao] fixing tutorials Summary: fixing tutorial to have example_inputs and QConfigMapping Test Plan: for fx_graph_mode_ptq_static.rst and fx_graph_mode_ptq_dynamic.py I pasted and ran the script to verify that it ran, the latter could be run exactly while the former required a slight modification since I didn't had a different version of the imagenet dataset but only the dataloaders had to be modified. for the quant_guide no explicit testing was done Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: d142b2f Pull Request resolved: #2137 Co-authored-by: Svetlana Karslioglu <svekars@fb.com> Co-authored-by: Alex Triapitsyn <atryapa@gmail.com>
1 parent 4c112d9 commit 885d83d

File tree

3 files changed

+373
-398
lines changed

3 files changed

+373
-398
lines changed

prototype_source/fx_graph_mode_ptq_dynamic.py

Lines changed: 31 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
55
**Author**: `Jerry Zhang <https://github.com/jerryzh168>`_
66
7-
This tutorial introduces the steps to do post training dynamic quantization in graph mode based on ``torch.fx``.
7+
This tutorial introduces the steps to do post training dynamic quantization in graph mode based on ``torch.fx``.
88
We have a separate tutorial for `FX Graph Mode Post Training Static Quantization <https://pytorch.org/tutorials/prototype/fx_graph_mode_ptq_static.html>`_,
99
comparison between FX Graph Mode Quantization and Eager Mode Quantization can be found in the `quantization docs <https://pytorch.org/docs/master/quantization.html#quantization-api-summary>`_
1010
@@ -13,20 +13,20 @@
1313
.. code:: python
1414
1515
import torch
16-
from torch.quantization import default_dynamic_qconfig
17-
# Note that this is temporary, we'll expose these functions to torch.quantization after official releasee
16+
from torch.ao.quantization import default_dynamic_qconfig, QConfigMapping
17+
# Note that this is temporary, we'll expose these functions to torch.ao.quantization after official releasee
1818
from torch.quantization.quantize_fx import prepare_fx, convert_fx
1919
2020
float_model.eval()
2121
qconfig = get_default_qconfig("fbgemm")
22-
qconfig_dict = {"": qconfig}
23-
prepared_model = prepare_fx(float_model, qconfig_dict) # fuse modules and insert observers
22+
qconfig_mapping = QConfigMapping().set_global(qconfig)
23+
prepared_model = prepare_fx(float_model, qconfig_mapping, example_inputs) # fuse modules and insert observers
2424
# no calibration is required for dynamic quantization
2525
quantized_model = convert_fx(prepared_model) # convert the model to a dynamically quantized model
2626
27-
In this tutorial, we’ll apply dynamic quantization to an LSTM-based next word-prediction model,
28-
closely following the word language model from the PyTorch examples.
29-
We will copy the code from `Dynamic Quantization on an LSTM Word Language Model <https://pytorch.org/tutorials/advanced/dynamic_quantization_tutorial.html>`_
27+
In this tutorial, we’ll apply dynamic quantization to an LSTM-based next word-prediction model,
28+
closely following the word language model from the PyTorch examples.
29+
We will copy the code from `Dynamic Quantization on an LSTM Word Language Model <https://pytorch.org/tutorials/advanced/dynamic_quantization_tutorial.html>`_
3030
and omit the descriptions.
3131
3232
"""
@@ -36,20 +36,20 @@
3636
# 1. Define the Model, Download Data and Model
3737
# --------------------------------------------
3838
#
39-
# Download the `data <https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-v1.zip>`_
39+
# Download the `data <https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-v1.zip>`_
4040
# and unzip to data folder
41-
#
41+
#
4242
# .. code::
43-
#
43+
#
4444
# mkdir data
4545
# cd data
4646
# wget https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-v1.zip
4747
# unzip wikitext-2-v1.zip
4848
#
4949
# Download model to the data folder:
50-
#
50+
#
5151
# .. code::
52-
#
52+
#
5353
# wget https://s3.amazonaws.com/pytorch-tutorial-assets/word_language_model_quantize.pth
5454
#
5555
# Define the model:
@@ -105,7 +105,7 @@ def init_hidden(lstm_model, bsz):
105105
nhid = lstm_model.rnn.hidden_size
106106
return (torch.zeros(nlayers, bsz, nhid, device=device),
107107
torch.zeros(nlayers, bsz, nhid, device=device))
108-
108+
109109

110110
# Load Text Data
111111
class Dictionary(object):
@@ -191,6 +191,7 @@ def batchify(data, bsz):
191191
return data.view(bsz, -1).t().contiguous()
192192

193193
test_data = batchify(corpus.test, eval_batch_size)
194+
example_inputs = (next(iter(test_data))[0])
194195

195196
# Evaluation functions
196197
def get_batch(source, i):
@@ -224,25 +225,23 @@ def evaluate(model_, data_source):
224225
######################################################################
225226
# 2. Post Training Dynamic Quantization
226227
# -------------------------------------
227-
# Now we can dynamically quantize the model.
228+
# Now we can dynamically quantize the model.
228229
# We can use the same function as post training static quantization but with a dynamic qconfig.
229230

230231
from torch.quantization.quantize_fx import prepare_fx, convert_fx
231-
from torch.quantization import default_dynamic_qconfig, float_qparams_weight_only_qconfig
232-
233-
# Full docs for supported qconfig for floating point modules/ops can be found in docs for quantization (TODO: link)
234-
# Full docs for qconfig_dict can be found in the documents of prepare_fx (TODO: link)
235-
qconfig_dict = {
236-
"object_type": [
237-
(nn.Embedding, float_qparams_weight_only_qconfig),
238-
(nn.LSTM, default_dynamic_qconfig),
239-
(nn.Linear, default_dynamic_qconfig)
240-
]
241-
}
232+
from torch.ao.quantization import default_dynamic_qconfig, float_qparams_weight_only_qconfig, QConfigMapping
233+
234+
# Full docs for supported qconfig for floating point modules/ops can be found in `quantization docs <https://pytorch.org/docs/stable/quantization.html#module-torch.quantization>`_
235+
# Full docs for `QConfigMapping <https://pytorch.org/docs/stable/generated/torch.ao.quantization.qconfig_mapping.QConfigMapping.html#torch.ao.quantization.qconfig_mapping.QConfigMapping>`_
236+
qconfig_mapping = (QConfigMapping()
237+
.set_object_type(nn.Embedding, float_qparams_weight_only_qconfig)
238+
.set_object_type(nn.LSTM, default_dynamic_qconfig)
239+
.set_object_type(nn.Linear, default_dynamic_qconfig)
240+
)
242241
# Deepcopying the original model because quantization api changes the model inplace and we want
243242
# to keep the original model for future comparison
244243
model_to_quantize = copy.deepcopy(model)
245-
prepared_model = prepare_fx(model_to_quantize, qconfig_dict)
244+
prepared_model = prepare_fx(model_to_quantize, qconfig_mapping, example_inputs)
246245
print("prepared model:", prepared_model)
247246
quantized_model = convert_fx(prepared_model)
248247
print("quantized model", quantized_model)
@@ -252,11 +251,11 @@ def evaluate(model_, data_source):
252251
# For dynamically quantized objects, we didn't do anything in ``prepare_fx`` for modules,
253252
# but will insert observers for weight for dynamically quantizable forunctionals and torch ops.
254253
# We also fuse the modules like Conv + Bn, Linear + ReLU.
255-
#
256-
# In convert we'll convert the float modules to dynamically quantized modules and
254+
#
255+
# In convert we'll convert the float modules to dynamically quantized modules and
257256
# convert float ops to dynamically quantized ops. We can see in the example model,
258257
# ``nn.Embedding``, ``nn.Linear`` and ``nn.LSTM`` are dynamically quantized.
259-
#
258+
#
260259
# Now we can compare the size and runtime of the quantized model.
261260

262261
def print_size_of_model(model):
@@ -283,10 +282,10 @@ def time_model_evaluation(model, test_data):
283282
time_model_evaluation(quantized_model, test_data)
284283

285284
#####################################################################
286-
# There is a roughly 2x speedup for this model. Also note that the speedup
285+
# There is a roughly 2x speedup for this model. Also note that the speedup
287286
# may vary depending on model, device, build, input batch sizes, threading etc.
288287
#
289288
# 3. Conclusion
290289
# -------------
291-
# This tutorial introduces the api for post training dynamic quantization in FX Graph Mode,
290+
# This tutorial introduces the api for post training dynamic quantization in FX Graph Mode,
292291
# which dynamically quantizes the same modules as Eager Mode Quantization.

0 commit comments

Comments
 (0)