Skip to content

Commit 2740556

Browse files
author
Jessica Lin
authored
Merge pull request #753 from jianyuh/jlin27-quant-tutorials
Update Dynamic Quant BERT Tutorial 2
2 parents c320480 + 91ceb93 commit 2740556

File tree

1 file changed

+53
-81
lines changed

1 file changed

+53
-81
lines changed

intermediate_source/dynamic_quantization_bert_tutorial.py

Lines changed: 53 additions & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,8 @@
1616
#
1717
#
1818
# In this tutorial, we will apply the dynamic quantization on a BERT
19-
# model, closely following the BERT model from the HuggingFace
20-
# Transformers examples (https://github.com/huggingface/transformers).
19+
# model, closely following the BERT model from `the HuggingFace
20+
# Transformers examples <https://github.com/huggingface/transformers>`_.
2121
# With this step-by-step journey, we would like to demonstrate how to
2222
# convert a well-known state-of-the-art model like BERT into dynamic
2323
# quantized model.
@@ -27,18 +27,16 @@
2727
# achieves the state-of-the-art accuracy results on many popular
2828
# Natural Language Processing (NLP) tasks, such as question answering,
2929
# text classification, and others. The original paper can be found
30-
# here: https://arxiv.org/pdf/1810.04805.pdf.
30+
# `here <https://arxiv.org/pdf/1810.04805.pdf>`_.
3131
#
3232
# - Dynamic quantization support in PyTorch converts a float model to a
3333
# quantized model with static int8 or float16 data types for the
3434
# weights and dynamic quantization for the activations. The activations
3535
# are quantized dynamically (per batch) to int8 when the weights are
36-
# quantized to int8.
37-
#
38-
# In PyTorch, we have `torch.quantization.quantize_dynamic API
39-
# <https://pytorch.org/docs/stable/quantization.html#torch.quantization.quantize_dynamic>`_
40-
# ,which replaces specified modules with dynamic weight-only quantized
41-
# versions and output the quantized model.
36+
# quantized to int8. In PyTorch, we have `torch.quantization.quantize_dynamic API
37+
# <https://pytorch.org/docs/stable/quantization.html#torch.quantization.quantize_dynamic>`_,
38+
# which replaces specified modules with dynamic weight-only quantized
39+
# versions and output the quantized model.
4240
#
4341
# - We demonstrate the accuracy and inference performance results on the
4442
# `Microsoft Research Paraphrase Corpus (MRPC) task <https://www.microsoft.com/en-us/download/details.aspx?id=52398>`_
@@ -47,29 +45,24 @@
4745
# a corpus of sentence pairs automatically extracted from online news
4846
# sources, with human annotations of whether the sentences in the pair
4947
# are semantically equivalent. Because the classes are imbalanced (68%
50-
# positive, 32% negative), we follow common practice and report both
51-
# accuracy and `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_
48+
# positive, 32% negative), we follow the common practice and report
49+
# `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_.
5250
# MRPC is a common NLP task for language pair classification, as shown
5351
# below.
5452
#
55-
# .. figure:: /_static/img/bert_mrpc.png
53+
# .. figure:: /_static/img/bert.png
5654

5755

5856
######################################################################
59-
# Setup
57+
# 1. Setup
6058
# -------
6159
#
6260
# Install PyTorch and HuggingFace Transformers
6361
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6462
#
6563
# To start this tutorial, let’s first follow the installation instructions
66-
# in PyTorch and HuggingFace Github Repo: -
67-
#
68-
# * https://github.com/pytorch/pytorch/#installation -
69-
#
70-
# * https://github.com/huggingface/transformers#installation
71-
#
72-
# In addition, we also install ``sklearn`` package, as we will reuse its
64+
# in PyTorch `here <https://github.com/pytorch/pytorch/#installation>`_ and HuggingFace Github Repo `here <https://github.com/huggingface/transformers#installation>`_.
65+
# In addition, we also install `scikit-learn <https://github.com/scikit-learn/scikit-learn>`_ package, as we will reuse its
7366
# built-in F1 score calculation helper function.
7467
#
7568
# .. code:: shell
@@ -94,7 +87,7 @@
9487

9588

9689
######################################################################
97-
# Import the necessary modules
90+
# 2. Import the necessary modules
9891
# ----------------------------
9992
#
10093
# In this step we import the necessary Python modules for the tutorial.
@@ -137,61 +130,51 @@
137130

138131

139132
######################################################################
140-
# Download the dataset
133+
# 3. Download the dataset
141134
# --------------------
142135
#
143136
# Before running MRPC tasks we download the `GLUE data
144-
# <https://gluebenchmark.com/tasks>`_ by running this `script
145-
# <https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e>`_ followed by
146-
# `download_glue_data <https://github.com/nyu-mll/GLUE-baselines/blob/master/download_glue_data.py>`_.
147-
# and unpack it to some directory “glue_data/MRPC”.
137+
# <https://gluebenchmark.com/tasks>`_ by running `this script
138+
# <https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e>`_
139+
# and unpack it to a directory `glue_data`.
148140
#
149141
#
150142
# .. code:: shell
151143
#
152-
# wget https://gist.githubusercontent.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e/raw/17b8dd0d724281ed7c3b2aeeda662b92809aadd5/download_glue_data.py
153144
# python download_glue_data.py --data_dir='glue_data' --tasks='MRPC'
154-
# ls glue_data/MRPC
155145
#
156146

157147

158148
######################################################################
159-
# Helper functions
149+
# 4. Helper functions
160150
# ----------------
161151
#
162152
# The helper functions are built-in in transformers library. We mainly use
163153
# the following helper functions: one for converting the text examples
164154
# into the feature vectors; The other one for measuring the F1 score of
165155
# the predicted result.
166156
#
167-
# Convert the texts into features
168-
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
169-
#
170-
# `glue_convert_examples_to_features <https://github.com/huggingface/transformers/blob/master/transformers/data/processors/glue.py>`_.
171-
# load a data file into a list of ``InputFeatures``.
157+
# The `glue_convert_examples_to_features <https://github.com/huggingface/transformers/blob/master/transformers/data/processors/glue.py>`_ function converts the texts into input features:
172158
#
173159
# - Tokenize the input sequences;
174160
# - Insert [CLS] at the beginning;
175161
# - Insert [SEP] between the first sentence and the second sentence, and
176162
# at the end;
177163
# - Generate token type ids to indicate whether a token belongs to the
178-
# first sequence or the second sequence;
179-
#
180-
# F1 metric
181-
# ~~~~~~~~~
164+
# first sequence or the second sequence.
182165
#
183166
# The `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_
184167
# can be interpreted as a weighted average of the precision and recall,
185168
# where an F1 score reaches its best value at 1 and worst score at 0. The
186169
# relative contribution of precision and recall to the F1 score are equal.
187-
# The formula for the F1 score is:
170+
# The equation for the F1 score is:
188171
#
189-
# F1 = 2 \* (precision \* recall) / (precision + recall)
172+
# - F1 = 2 \* (precision \* recall) / (precision + recall)
190173
#
191174

192175

193176
######################################################################
194-
# Fine-tune the BERT model
177+
# 5. Fine-tune the BERT model
195178
# --------------------------
196179
#
197180

@@ -204,15 +187,15 @@
204187
# with the pre-trained BERT model to classify semantically equivalent
205188
# sentence pairs on MRPC task.
206189
#
207-
# To fine-tune the pre-trained BERT model (bert-base-uncased model in
190+
# To fine-tune the pre-trained BERT model (``bert-base-uncased`` model in
208191
# HuggingFace transformers) for the MRPC task, you can follow the command
209-
# in `examples<https://github.com/huggingface/transformers/tree/master/examples>`_"
192+
# in `examples <https://github.com/huggingface/transformers/tree/master/examples#mrpc>`_:
210193
#
211194
# ::
212195
#
213196
# export GLUE_DIR=./glue_data
214197
# export TASK_NAME=MRPC
215-
# export OUT_DIR=/mnt/homedir/jianyuhuang/public/bert/$TASK_NAME/
198+
# export OUT_DIR=./$TASK_NAME/
216199
# python ./run_glue.py \
217200
# --model_type bert \
218201
# --model_name_or_path bert-base-uncased \
@@ -229,24 +212,11 @@
229212
# --save_steps 100000 \
230213
# --output_dir $OUT_DIR
231214
#
232-
# We provide the fined-tuned BERT model for MRPC task here (We did the
233-
# fine-tuning on CPUs with a total train batch size of 8):
234-
#
235-
# https://drive.google.com/drive/folders/1mGBx0t-YJAWXHbgab2f_IimaMiVHlKh-
236-
#
237-
# To save time, you can manually copy the fined-tuned BERT model for MRPC
238-
# task in your Google Drive (Create the same “BERT_Quant_Tutorial/MRPC”
239-
# folder in the Google Drive directory), and then mount your Google Drive
240-
# on your runtime using an authorization code, so that we can directly
241-
# read and write the models into Google Drive in the following steps.
242-
#
243-
244-
from google.colab import drive
245-
drive.mount('/content/drive')
246-
215+
# We provide the fined-tuned BERT model for MRPC task `here <https://download.pytorch.org/tutorial/MRPC.zip>`_.
216+
# To save time, you can download the model file (~400 MB) directly into your local folder ``$OUT_DIR``.
247217

248218
######################################################################
249-
# Set global configurations
219+
# 6. Set global configurations
250220
# -------------------------
251221
#
252222

@@ -258,11 +228,11 @@
258228

259229
configs = Namespace()
260230

261-
# The output directory for the fine-tuned model.
262-
configs.output_dir = "/content/drive/My Drive/BERT_Quant_Tutorial/MRPC/"
231+
# The output directory for the fine-tuned model, $OUT_DIR.
232+
configs.output_dir = "./MRPC/"
263233

264-
# The data directory for the MRPC task in the GLUE benchmark.
265-
configs.data_dir = "/content/glue_data/MRPC"
234+
# The data directory for the MRPC task in the GLUE benchmark, $GLUE_DIR/$TASK_NAME.
235+
configs.data_dir = "./glue_data/MRPC"
266236

267237
# The model name or path for the pre-trained model.
268238
configs.model_name_or_path = "bert-base-uncased"
@@ -294,7 +264,7 @@ def set_seed(seed):
294264

295265

296266
######################################################################
297-
# Load the fine-tuned BERT model
267+
# 7. Load the fine-tuned BERT model
298268
# ------------------------------
299269
#
300270

@@ -312,11 +282,12 @@ def set_seed(seed):
312282

313283

314284
######################################################################
315-
# Define the tokenize and evaluation function
285+
# 8. Define the tokenize and evaluation function
316286
# -------------------------------------------
317287
#
318-
# We reuse the tokenize and evaluation function from `huggingface <https://github.com/huggingface/transformers/blob/master/examples/run_glue.py>`_.
288+
# We reuse the tokenize and evaluation function from `Huggingface <https://github.com/huggingface/transformers/blob/master/examples/run_glue.py>`_.
319289
#
290+
320291
# coding=utf-8
321292
# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
322293
# Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
@@ -455,7 +426,7 @@ def load_and_cache_examples(args, task, tokenizer, evaluate=False):
455426

456427

457428
######################################################################
458-
# Apply the dynamic quantization
429+
# 9. Apply the dynamic quantization
459430
# -------------------------------
460431
#
461432
# We call ``torch.quantization.quantize_dynamic`` on the model to apply
@@ -474,11 +445,11 @@ def load_and_cache_examples(args, task, tokenizer, evaluate=False):
474445

475446

476447
######################################################################
477-
# Check the model size
448+
# 10. Check the model size
478449
# --------------------
479450
#
480451
# Let’s first check the model size. We can observe a significant reduction
481-
# in model size:
452+
# in model size (FP32 total size: 438 MB; INT8 total size: 181 MB):
482453
#
483454

484455
def print_size_of_model(model):
@@ -491,7 +462,7 @@ def print_size_of_model(model):
491462

492463

493464
######################################################################
494-
# The BERT model used in this tutorial (bert-base-uncased) has a
465+
# The BERT model used in this tutorial (``bert-base-uncased``) has a
495466
# vocabulary size V of 30522. With the embedding size of 768, the total
496467
# size of the word embedding table is ~ 4 (Bytes/FP32) \* 30522 \* 768 =
497468
# 90 MB. So with the help of quantization, the model size of the
@@ -501,15 +472,14 @@ def print_size_of_model(model):
501472

502473

503474
######################################################################
504-
# Evaluate the inference accuracy and time
475+
# 11. Evaluate the inference accuracy and time
505476
# ----------------------------------------
506477
#
507478
# Next, let’s compare the inference time as well as the evaluation
508479
# accuracy between the original FP32 model and the INT8 model after the
509480
# dynamic quantization.
510481
#
511482

512-
# Evaluate the original FP32 BERT model
513483
def time_model_evaluation(model, configs, tokenizer):
514484
eval_start_time = time.time()
515485
result = evaluate(configs, model, tokenizer, prefix="")
@@ -518,6 +488,7 @@ def time_model_evaluation(model, configs, tokenizer):
518488
print(result)
519489
print("Evaluate total time (seconds): {0:.1f}".format(eval_duration_time))
520490

491+
# Evaluate the original FP32 BERT model
521492
time_model_evaluation(model, configs, tokenizer)
522493

523494
# Evaluate the INT8 BERT model after the dynamic quantization
@@ -539,7 +510,8 @@ def time_model_evaluation(model, configs, tokenizer):
539510
#
540511
# We have 0.6% F1 score accuracy after applying the post-training dynamic
541512
# quantization on the fine-tuned BERT model on the MRPC task. As a
542-
# comparison, in the recent paper [3] (Table 1), it achieved 0.8788 by
513+
# comparison, in a `recent paper <https://arxiv.org/pdf/1910.06188.pdf>`_ (Table 1),
514+
# it achieved 0.8788 by
543515
# applying the post-training dynamic quantization and 0.8956 by applying
544516
# the quantization-aware training. The main reason is that we support the
545517
# asymmetric quantization in PyTorch while that paper supports the
@@ -561,7 +533,7 @@ def time_model_evaluation(model, configs, tokenizer):
561533

562534

563535
######################################################################
564-
# Serialize the quantized model
536+
# 12. Serialize the quantized model
565537
# -----------------------------
566538
#
567539
# We can serialize and save the quantized model for the future use.
@@ -583,7 +555,7 @@ def time_model_evaluation(model, configs, tokenizer):
583555
# having a limited implication on accuracy.
584556
#
585557
# Thanks for reading! As always, we welcome any feedback, so please create
586-
# an issue here (https://github.com/pytorch/pytorch/issues) if you have
558+
# an issue `here <https://github.com/pytorch/pytorch/issues>`_ if you have
587559
# any.
588560
#
589561

@@ -592,14 +564,14 @@ def time_model_evaluation(model, configs, tokenizer):
592564
# References
593565
# -----------
594566
#
595-
# [1] J.Devlin, M. Chang, K. Lee and K. Toutanova, BERT: Pre-training of
567+
# [1] J.Devlin, M. Chang, K. Lee and K. Toutanova, `BERT: Pre-training of
596568
# Deep Bidirectional Transformers for Language Understanding (2018)
569+
# <https://arxiv.org/pdf/1810.04805.pdf>`_.
597570
#
598-
# [2] HuggingFace Transformers.
599-
# https://github.com/huggingface/transformers
571+
# [2] `HuggingFace Transformers <https://github.com/huggingface/transformers>`_.
600572
#
601-
# [3] O. Zafrir, G. Boudoukh, P. Izsak, & M. Wasserblat (2019). Q8BERT:
602-
# Quantized 8bit BERT. arXiv preprint arXiv:1910.06188.
573+
# [3] O. Zafrir, G. Boudoukh, P. Izsak, and M. Wasserblat (2019). `Q8BERT:
574+
# Quantized 8bit BERT <https://arxiv.org/pdf/1910.06188.pdf>`_.
603575
#
604576

605577

0 commit comments

Comments
 (0)