16
16
#
17
17
#
18
18
# In this tutorial, we will apply the dynamic quantization on a BERT
19
- # model, closely following the BERT model from the HuggingFace
20
- # Transformers examples ( https://github.com/huggingface/transformers) .
19
+ # model, closely following the BERT model from ` the HuggingFace
20
+ # Transformers examples < https://github.com/huggingface/transformers>`_ .
21
21
# With this step-by-step journey, we would like to demonstrate how to
22
22
# convert a well-known state-of-the-art model like BERT into dynamic
23
23
# quantized model.
27
27
# achieves the state-of-the-art accuracy results on many popular
28
28
# Natural Language Processing (NLP) tasks, such as question answering,
29
29
# text classification, and others. The original paper can be found
30
- # here: https://arxiv.org/pdf/1810.04805.pdf.
30
+ # ` here < https://arxiv.org/pdf/1810.04805.pdf>`_ .
31
31
#
32
32
# - Dynamic quantization support in PyTorch converts a float model to a
33
33
# quantized model with static int8 or float16 data types for the
34
34
# weights and dynamic quantization for the activations. The activations
35
35
# are quantized dynamically (per batch) to int8 when the weights are
36
- # quantized to int8.
37
- #
38
- # In PyTorch, we have `torch.quantization.quantize_dynamic API
39
- # <https://pytorch.org/docs/stable/quantization.html#torch.quantization.quantize_dynamic>`_
40
- # ,which replaces specified modules with dynamic weight-only quantized
41
- # versions and output the quantized model.
36
+ # quantized to int8. In PyTorch, we have `torch.quantization.quantize_dynamic API
37
+ # <https://pytorch.org/docs/stable/quantization.html#torch.quantization.quantize_dynamic>`_,
38
+ # which replaces specified modules with dynamic weight-only quantized
39
+ # versions and output the quantized model.
42
40
#
43
41
# - We demonstrate the accuracy and inference performance results on the
44
42
# `Microsoft Research Paraphrase Corpus (MRPC) task <https://www.microsoft.com/en-us/download/details.aspx?id=52398>`_
47
45
# a corpus of sentence pairs automatically extracted from online news
48
46
# sources, with human annotations of whether the sentences in the pair
49
47
# are semantically equivalent. Because the classes are imbalanced (68%
50
- # positive, 32% negative), we follow common practice and report both
51
- # accuracy and `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_
48
+ # positive, 32% negative), we follow the common practice and report
49
+ # `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_.
52
50
# MRPC is a common NLP task for language pair classification, as shown
53
51
# below.
54
52
#
55
- # .. figure:: /_static/img/bert_mrpc .png
53
+ # .. figure:: /_static/img/bert .png
56
54
57
55
58
56
######################################################################
59
- # Setup
57
+ # 1. Setup
60
58
# -------
61
59
#
62
60
# Install PyTorch and HuggingFace Transformers
63
61
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
64
62
#
65
63
# To start this tutorial, let’s first follow the installation instructions
66
- # in PyTorch and HuggingFace Github Repo: -
67
- #
68
- # * https://github.com/pytorch/pytorch/#installation -
69
- #
70
- # * https://github.com/huggingface/transformers#installation
71
- #
72
- # In addition, we also install ``sklearn`` package, as we will reuse its
64
+ # in PyTorch `here <https://github.com/pytorch/pytorch/#installation>`_ and HuggingFace Github Repo `here <https://github.com/huggingface/transformers#installation>`_.
65
+ # In addition, we also install `scikit-learn <https://github.com/scikit-learn/scikit-learn>`_ package, as we will reuse its
73
66
# built-in F1 score calculation helper function.
74
67
#
75
68
# .. code:: shell
94
87
95
88
96
89
######################################################################
97
- # Import the necessary modules
90
+ # 2. Import the necessary modules
98
91
# ----------------------------
99
92
#
100
93
# In this step we import the necessary Python modules for the tutorial.
137
130
138
131
139
132
######################################################################
140
- # Download the dataset
133
+ # 3. Download the dataset
141
134
# --------------------
142
135
#
143
136
# Before running MRPC tasks we download the `GLUE data
144
- # <https://gluebenchmark.com/tasks>`_ by running this `script
145
- # <https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e>`_ followed by
146
- # `download_glue_data <https://github.com/nyu-mll/GLUE-baselines/blob/master/download_glue_data.py>`_.
147
- # and unpack it to some directory “glue_data/MRPC”.
137
+ # <https://gluebenchmark.com/tasks>`_ by running `this script
138
+ # <https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e>`_
139
+ # and unpack it to a directory `glue_data`.
148
140
#
149
141
#
150
142
# .. code:: shell
151
143
#
152
- # wget https://gist.githubusercontent.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e/raw/17b8dd0d724281ed7c3b2aeeda662b92809aadd5/download_glue_data.py
153
144
# python download_glue_data.py --data_dir='glue_data' --tasks='MRPC'
154
- # ls glue_data/MRPC
155
145
#
156
146
157
147
158
148
######################################################################
159
- # Helper functions
149
+ # 4. Helper functions
160
150
# ----------------
161
151
#
162
152
# The helper functions are built-in in transformers library. We mainly use
163
153
# the following helper functions: one for converting the text examples
164
154
# into the feature vectors; The other one for measuring the F1 score of
165
155
# the predicted result.
166
156
#
167
- # Convert the texts into features
168
- # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
169
- #
170
- # `glue_convert_examples_to_features <https://github.com/huggingface/transformers/blob/master/transformers/data/processors/glue.py>`_.
171
- # load a data file into a list of ``InputFeatures``.
157
+ # The `glue_convert_examples_to_features <https://github.com/huggingface/transformers/blob/master/transformers/data/processors/glue.py>`_ function converts the texts into input features:
172
158
#
173
159
# - Tokenize the input sequences;
174
160
# - Insert [CLS] at the beginning;
175
161
# - Insert [SEP] between the first sentence and the second sentence, and
176
162
# at the end;
177
163
# - Generate token type ids to indicate whether a token belongs to the
178
- # first sequence or the second sequence;
179
- #
180
- # F1 metric
181
- # ~~~~~~~~~
164
+ # first sequence or the second sequence.
182
165
#
183
166
# The `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_
184
167
# can be interpreted as a weighted average of the precision and recall,
185
168
# where an F1 score reaches its best value at 1 and worst score at 0. The
186
169
# relative contribution of precision and recall to the F1 score are equal.
187
- # The formula for the F1 score is:
170
+ # The equation for the F1 score is:
188
171
#
189
- # F1 = 2 \* (precision \* recall) / (precision + recall)
172
+ # - F1 = 2 \* (precision \* recall) / (precision + recall)
190
173
#
191
174
192
175
193
176
######################################################################
194
- # Fine-tune the BERT model
177
+ # 5. Fine-tune the BERT model
195
178
# --------------------------
196
179
#
197
180
204
187
# with the pre-trained BERT model to classify semantically equivalent
205
188
# sentence pairs on MRPC task.
206
189
#
207
- # To fine-tune the pre-trained BERT model (“ bert-base-uncased” model in
190
+ # To fine-tune the pre-trained BERT model (`` bert-base-uncased`` model in
208
191
# HuggingFace transformers) for the MRPC task, you can follow the command
209
- # in `examples<https://github.com/huggingface/transformers/tree/master/examples>`_"
192
+ # in `examples <https://github.com/huggingface/transformers/tree/master/examples#mrpc >`_:
210
193
#
211
194
# ::
212
195
#
213
196
# export GLUE_DIR=./glue_data
214
197
# export TASK_NAME=MRPC
215
- # export OUT_DIR=/mnt/homedir/jianyuhuang/public/bert /$TASK_NAME/
198
+ # export OUT_DIR=. /$TASK_NAME/
216
199
# python ./run_glue.py \
217
200
# --model_type bert \
218
201
# --model_name_or_path bert-base-uncased \
229
212
# --save_steps 100000 \
230
213
# --output_dir $OUT_DIR
231
214
#
232
- # We provide the fined-tuned BERT model for MRPC task here (We did the
233
- # fine-tuning on CPUs with a total train batch size of 8):
234
- #
235
- # https://drive.google.com/drive/folders/1mGBx0t-YJAWXHbgab2f_IimaMiVHlKh-
236
- #
237
- # To save time, you can manually copy the fined-tuned BERT model for MRPC
238
- # task in your Google Drive (Create the same “BERT_Quant_Tutorial/MRPC”
239
- # folder in the Google Drive directory), and then mount your Google Drive
240
- # on your runtime using an authorization code, so that we can directly
241
- # read and write the models into Google Drive in the following steps.
242
- #
243
-
244
- from google .colab import drive
245
- drive .mount ('/content/drive' )
246
-
215
+ # We provide the fined-tuned BERT model for MRPC task `here <https://download.pytorch.org/tutorial/MRPC.zip>`_.
216
+ # To save time, you can download the model file (~400 MB) directly into your local folder ``$OUT_DIR``.
247
217
248
218
######################################################################
249
- # Set global configurations
219
+ # 6. Set global configurations
250
220
# -------------------------
251
221
#
252
222
258
228
259
229
configs = Namespace ()
260
230
261
- # The output directory for the fine-tuned model.
262
- configs .output_dir = "/content/drive/My Drive/BERT_Quant_Tutorial /MRPC/"
231
+ # The output directory for the fine-tuned model, $OUT_DIR .
232
+ configs .output_dir = ". /MRPC/"
263
233
264
- # The data directory for the MRPC task in the GLUE benchmark.
265
- configs .data_dir = "/content /glue_data/MRPC"
234
+ # The data directory for the MRPC task in the GLUE benchmark, $GLUE_DIR/$TASK_NAME .
235
+ configs .data_dir = ". /glue_data/MRPC"
266
236
267
237
# The model name or path for the pre-trained model.
268
238
configs .model_name_or_path = "bert-base-uncased"
@@ -294,7 +264,7 @@ def set_seed(seed):
294
264
295
265
296
266
######################################################################
297
- # Load the fine-tuned BERT model
267
+ # 7. Load the fine-tuned BERT model
298
268
# ------------------------------
299
269
#
300
270
@@ -312,11 +282,12 @@ def set_seed(seed):
312
282
313
283
314
284
######################################################################
315
- # Define the tokenize and evaluation function
285
+ # 8. Define the tokenize and evaluation function
316
286
# -------------------------------------------
317
287
#
318
- # We reuse the tokenize and evaluation function from `huggingface <https://github.com/huggingface/transformers/blob/master/examples/run_glue.py>`_.
288
+ # We reuse the tokenize and evaluation function from `Huggingface <https://github.com/huggingface/transformers/blob/master/examples/run_glue.py>`_.
319
289
#
290
+
320
291
# coding=utf-8
321
292
# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
322
293
# Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
@@ -455,7 +426,7 @@ def load_and_cache_examples(args, task, tokenizer, evaluate=False):
455
426
456
427
457
428
######################################################################
458
- # Apply the dynamic quantization
429
+ # 9. Apply the dynamic quantization
459
430
# -------------------------------
460
431
#
461
432
# We call ``torch.quantization.quantize_dynamic`` on the model to apply
@@ -474,11 +445,11 @@ def load_and_cache_examples(args, task, tokenizer, evaluate=False):
474
445
475
446
476
447
######################################################################
477
- # Check the model size
448
+ # 10. Check the model size
478
449
# --------------------
479
450
#
480
451
# Let’s first check the model size. We can observe a significant reduction
481
- # in model size:
452
+ # in model size (FP32 total size: 438 MB; INT8 total size: 181 MB) :
482
453
#
483
454
484
455
def print_size_of_model (model ):
@@ -491,7 +462,7 @@ def print_size_of_model(model):
491
462
492
463
493
464
######################################################################
494
- # The BERT model used in this tutorial (bert-base-uncased) has a
465
+ # The BERT model used in this tutorial (`` bert-base-uncased`` ) has a
495
466
# vocabulary size V of 30522. With the embedding size of 768, the total
496
467
# size of the word embedding table is ~ 4 (Bytes/FP32) \* 30522 \* 768 =
497
468
# 90 MB. So with the help of quantization, the model size of the
@@ -501,15 +472,14 @@ def print_size_of_model(model):
501
472
502
473
503
474
######################################################################
504
- # Evaluate the inference accuracy and time
475
+ # 11. Evaluate the inference accuracy and time
505
476
# ----------------------------------------
506
477
#
507
478
# Next, let’s compare the inference time as well as the evaluation
508
479
# accuracy between the original FP32 model and the INT8 model after the
509
480
# dynamic quantization.
510
481
#
511
482
512
- # Evaluate the original FP32 BERT model
513
483
def time_model_evaluation (model , configs , tokenizer ):
514
484
eval_start_time = time .time ()
515
485
result = evaluate (configs , model , tokenizer , prefix = "" )
@@ -518,6 +488,7 @@ def time_model_evaluation(model, configs, tokenizer):
518
488
print (result )
519
489
print ("Evaluate total time (seconds): {0:.1f}" .format (eval_duration_time ))
520
490
491
+ # Evaluate the original FP32 BERT model
521
492
time_model_evaluation (model , configs , tokenizer )
522
493
523
494
# Evaluate the INT8 BERT model after the dynamic quantization
@@ -539,7 +510,8 @@ def time_model_evaluation(model, configs, tokenizer):
539
510
#
540
511
# We have 0.6% F1 score accuracy after applying the post-training dynamic
541
512
# quantization on the fine-tuned BERT model on the MRPC task. As a
542
- # comparison, in the recent paper [3] (Table 1), it achieved 0.8788 by
513
+ # comparison, in a `recent paper <https://arxiv.org/pdf/1910.06188.pdf>`_ (Table 1),
514
+ # it achieved 0.8788 by
543
515
# applying the post-training dynamic quantization and 0.8956 by applying
544
516
# the quantization-aware training. The main reason is that we support the
545
517
# asymmetric quantization in PyTorch while that paper supports the
@@ -561,7 +533,7 @@ def time_model_evaluation(model, configs, tokenizer):
561
533
562
534
563
535
######################################################################
564
- # Serialize the quantized model
536
+ # 12. Serialize the quantized model
565
537
# -----------------------------
566
538
#
567
539
# We can serialize and save the quantized model for the future use.
@@ -583,7 +555,7 @@ def time_model_evaluation(model, configs, tokenizer):
583
555
# having a limited implication on accuracy.
584
556
#
585
557
# Thanks for reading! As always, we welcome any feedback, so please create
586
- # an issue here ( https://github.com/pytorch/pytorch/issues) if you have
558
+ # an issue ` here < https://github.com/pytorch/pytorch/issues>`_ if you have
587
559
# any.
588
560
#
589
561
@@ -592,14 +564,14 @@ def time_model_evaluation(model, configs, tokenizer):
592
564
# References
593
565
# -----------
594
566
#
595
- # [1] J.Devlin, M. Chang, K. Lee and K. Toutanova, BERT: Pre-training of
567
+ # [1] J.Devlin, M. Chang, K. Lee and K. Toutanova, ` BERT: Pre-training of
596
568
# Deep Bidirectional Transformers for Language Understanding (2018)
569
+ # <https://arxiv.org/pdf/1810.04805.pdf>`_.
597
570
#
598
- # [2] HuggingFace Transformers.
599
- # https://github.com/huggingface/transformers
571
+ # [2] `HuggingFace Transformers <https://github.com/huggingface/transformers>`_.
600
572
#
601
- # [3] O. Zafrir, G. Boudoukh, P. Izsak, & M. Wasserblat (2019). Q8BERT:
602
- # Quantized 8bit BERT. arXiv preprint arXiv: 1910.06188.
573
+ # [3] O. Zafrir, G. Boudoukh, P. Izsak, and M. Wasserblat (2019). ` Q8BERT:
574
+ # Quantized 8bit BERT <https://arxiv.org/pdf/ 1910.06188.pdf>`_ .
603
575
#
604
576
605
577
0 commit comments