Skip to content

Commit a33064b

Browse files
author
Jessica Lin
authored
Merge pull request #755 from jianyuh/jlin27-quant-tutorials
Update Dynamic Quant BERT Tutorial 3
2 parents 2740556 + 3d35f57 commit a33064b

File tree

1 file changed

+31
-34
lines changed

1 file changed

+31
-34
lines changed

intermediate_source/dynamic_quantization_bert_tutorial.py

Lines changed: 31 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@
4444
# <https://gluebenchmark.com/>`_. The MRPC (Dolan and Brockett, 2005) is
4545
# a corpus of sentence pairs automatically extracted from online news
4646
# sources, with human annotations of whether the sentences in the pair
47-
# are semantically equivalent. Because the classes are imbalanced (68%
47+
# are semantically equivalent. As the classes are imbalanced (68%
4848
# positive, 32% negative), we follow the common practice and report
4949
# `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_.
5050
# MRPC is a common NLP task for language pair classification, as shown
@@ -55,10 +55,10 @@
5555

5656
######################################################################
5757
# 1. Setup
58-
# -------
58+
# --------
5959
#
60-
# Install PyTorch and HuggingFace Transformers
61-
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
60+
# 1.1 Install PyTorch and HuggingFace Transformers
61+
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6262
#
6363
# To start this tutorial, let’s first follow the installation instructions
6464
# in PyTorch `here <https://github.com/pytorch/pytorch/#installation>`_ and HuggingFace Github Repo `here <https://github.com/huggingface/transformers#installation>`_.
@@ -87,8 +87,8 @@
8787

8888

8989
######################################################################
90-
# 2. Import the necessary modules
91-
# ----------------------------
90+
# 1.2 Import the necessary modules
91+
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
9292
#
9393
# In this step we import the necessary Python modules for the tutorial.
9494
#
@@ -130,13 +130,13 @@
130130

131131

132132
######################################################################
133-
# 3. Download the dataset
134-
# --------------------
133+
# 1.3 Download the dataset
134+
# ^^^^^^^^^^^^^^^^^^^^^^^^
135135
#
136136
# Before running MRPC tasks we download the `GLUE data
137137
# <https://gluebenchmark.com/tasks>`_ by running `this script
138138
# <https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e>`_
139-
# and unpack it to a directory `glue_data`.
139+
# and unpack it to a directory ``glue_data``.
140140
#
141141
#
142142
# .. code:: shell
@@ -146,8 +146,8 @@
146146

147147

148148
######################################################################
149-
# 4. Helper functions
150-
# ----------------
149+
# 1.4 Learn about helper functions
150+
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
151151
#
152152
# The helper functions are built-in in transformers library. We mainly use
153153
# the following helper functions: one for converting the text examples
@@ -157,25 +157,25 @@
157157
# The `glue_convert_examples_to_features <https://github.com/huggingface/transformers/blob/master/transformers/data/processors/glue.py>`_ function converts the texts into input features:
158158
#
159159
# - Tokenize the input sequences;
160-
# - Insert [CLS] at the beginning;
160+
# - Insert [CLS] in the beginning;
161161
# - Insert [SEP] between the first sentence and the second sentence, and
162-
# at the end;
162+
# in the end;
163163
# - Generate token type ids to indicate whether a token belongs to the
164164
# first sequence or the second sequence.
165165
#
166166
# The `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_
167167
# can be interpreted as a weighted average of the precision and recall,
168168
# where an F1 score reaches its best value at 1 and worst score at 0. The
169169
# relative contribution of precision and recall to the F1 score are equal.
170-
# The equation for the F1 score is:
171170
#
172-
# - F1 = 2 \* (precision \* recall) / (precision + recall)
171+
# - The equation for the F1 score is:
172+
# .. math:: F1 = 2 * (\text{precision} * \text{recall}) / (\text{precision} + \text{recall})
173173
#
174174

175175

176176
######################################################################
177-
# 5. Fine-tune the BERT model
178-
# --------------------------
177+
# 2. Fine-tune the BERT model
178+
# ---------------------------
179179
#
180180

181181

@@ -216,8 +216,8 @@
216216
# To save time, you can download the model file (~400 MB) directly into your local folder ``$OUT_DIR``.
217217

218218
######################################################################
219-
# 6. Set global configurations
220-
# -------------------------
219+
# 2.1 Set global configurations
220+
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
221221
#
222222

223223

@@ -264,12 +264,9 @@ def set_seed(seed):
264264

265265

266266
######################################################################
267-
# 7. Load the fine-tuned BERT model
268-
# ------------------------------
267+
# 2.2 Load the fine-tuned BERT model
268+
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
269269
#
270-
271-
272-
######################################################################
273270
# We load the tokenizer and fine-tuned BERT sequence classifier model
274271
# (FP32) from the ``configs.output_dir``.
275272
#
@@ -282,8 +279,8 @@ def set_seed(seed):
282279

283280

284281
######################################################################
285-
# 8. Define the tokenize and evaluation function
286-
# -------------------------------------------
282+
# 2.3 Define the tokenize and evaluation function
283+
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
287284
#
288285
# We reuse the tokenize and evaluation function from `Huggingface <https://github.com/huggingface/transformers/blob/master/examples/run_glue.py>`_.
289286
#
@@ -426,7 +423,7 @@ def load_and_cache_examples(args, task, tokenizer, evaluate=False):
426423

427424

428425
######################################################################
429-
# 9. Apply the dynamic quantization
426+
# 3. Apply the dynamic quantization
430427
# -------------------------------
431428
#
432429
# We call ``torch.quantization.quantize_dynamic`` on the model to apply
@@ -445,8 +442,8 @@ def load_and_cache_examples(args, task, tokenizer, evaluate=False):
445442

446443

447444
######################################################################
448-
# 10. Check the model size
449-
# --------------------
445+
# 3.1 Check the model size
446+
# ^^^^^^^^^^^^^^^^^^^^^^^^
450447
#
451448
# Let’s first check the model size. We can observe a significant reduction
452449
# in model size (FP32 total size: 438 MB; INT8 total size: 181 MB):
@@ -472,8 +469,8 @@ def print_size_of_model(model):
472469

473470

474471
######################################################################
475-
# 11. Evaluate the inference accuracy and time
476-
# ----------------------------------------
472+
# 3.2 Evaluate the inference accuracy and time
473+
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
477474
#
478475
# Next, let’s compare the inference time as well as the evaluation
479476
# accuracy between the original FP32 model and the INT8 model after the
@@ -513,7 +510,7 @@ def time_model_evaluation(model, configs, tokenizer):
513510
# comparison, in a `recent paper <https://arxiv.org/pdf/1910.06188.pdf>`_ (Table 1),
514511
# it achieved 0.8788 by
515512
# applying the post-training dynamic quantization and 0.8956 by applying
516-
# the quantization-aware training. The main reason is that we support the
513+
# the quantization-aware training. The main difference is that we support the
517514
# asymmetric quantization in PyTorch while that paper supports the
518515
# symmetric quantization only.
519516
#
@@ -533,8 +530,8 @@ def time_model_evaluation(model, configs, tokenizer):
533530

534531

535532
######################################################################
536-
# 12. Serialize the quantized model
537-
# -----------------------------
533+
# 3.3 Serialize the quantized model
534+
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
538535
#
539536
# We can serialize and save the quantized model for the future use.
540537
#

0 commit comments

Comments
 (0)