16
16
#
17
17
#
18
18
# In this tutorial, we will apply the dynamic quantization on a BERT
19
- # model, closely following the BERT model from the HuggingFace
20
- # Transformers examples ( https://github.com/huggingface/transformers) .
19
+ # model, closely following the BERT model from ` the HuggingFace
20
+ # Transformers examples < https://github.com/huggingface/transformers>`_ .
21
21
# With this step-by-step journey, we would like to demonstrate how to
22
22
# convert a well-known state-of-the-art model like BERT into dynamic
23
23
# quantized model.
27
27
# achieves the state-of-the-art accuracy results on many popular
28
28
# Natural Language Processing (NLP) tasks, such as question answering,
29
29
# text classification, and others. The original paper can be found
30
- # here: https://arxiv.org/pdf/1810.04805.pdf.
30
+ # ` here < https://arxiv.org/pdf/1810.04805.pdf>`_ .
31
31
#
32
32
# - Dynamic quantization support in PyTorch converts a float model to a
33
33
# quantized model with static int8 or float16 data types for the
36
36
# quantized to int8.
37
37
#
38
38
# In PyTorch, we have `torch.quantization.quantize_dynamic API
39
- # <https://pytorch.org/docs/stable/quantization.html#torch.quantization.quantize_dynamic>`_
40
- # , which replaces specified modules with dynamic weight-only quantized
39
+ # <https://pytorch.org/docs/stable/quantization.html#torch.quantization.quantize_dynamic>`_,
40
+ # which replaces specified modules with dynamic weight-only quantized
41
41
# versions and output the quantized model.
42
42
#
43
43
# - We demonstrate the accuracy and inference performance results on the
47
47
# a corpus of sentence pairs automatically extracted from online news
48
48
# sources, with human annotations of whether the sentences in the pair
49
49
# are semantically equivalent. Because the classes are imbalanced (68%
50
- # positive, 32% negative), we follow common practice and report both
51
- # accuracy and `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_
50
+ # positive, 32% negative), we follow the common practice and report
51
+ # `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_.
52
52
# MRPC is a common NLP task for language pair classification, as shown
53
53
# below.
54
54
#
63
63
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
64
64
#
65
65
# To start this tutorial, let’s first follow the installation instructions
66
- # in PyTorch and HuggingFace Github Repo: -
67
- #
68
- # * https://github.com/pytorch/pytorch/#installation -
69
- #
70
- # * https://github.com/huggingface/transformers#installation
71
- #
72
- # In addition, we also install ``sklearn`` package, as we will reuse its
66
+ # in PyTorch `here <https://github.com/pytorch/pytorch/#installation>`_ and HuggingFace Github Repo `here <https://github.com/huggingface/transformers#installation>`_.
67
+ # In addition, we also install `scikit-learn <https://github.com/scikit-learn/scikit-learn>`_ package, as we will reuse its
73
68
# built-in F1 score calculation helper function.
74
69
#
75
70
# .. code:: shell
141
136
# --------------------
142
137
#
143
138
# Before running MRPC tasks we download the `GLUE data
144
- # <https://gluebenchmark.com/tasks>`_ by running this `script
145
- # <https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e>`_ followed by
146
- # `download_glue_data <https://github.com/nyu-mll/GLUE-baselines/blob/master/download_glue_data.py>`_.
147
- # and unpack it to some directory “glue_data/MRPC”.
139
+ # <https://gluebenchmark.com/tasks>`_ by running `this script
140
+ # <https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e>`_
141
+ # and unpack it to a directory `glue_data`.
148
142
#
149
143
#
150
144
# .. code:: shell
151
145
#
152
- # wget https://gist.githubusercontent.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e/raw/17b8dd0d724281ed7c3b2aeeda662b92809aadd5/download_glue_data.py
153
146
# python download_glue_data.py --data_dir='glue_data' --tasks='MRPC'
154
- # ls glue_data/MRPC
155
147
#
156
148
157
149
164
156
# into the feature vectors; The other one for measuring the F1 score of
165
157
# the predicted result.
166
158
#
167
- # Convert the texts into features
168
- # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
169
- #
170
- # `glue_convert_examples_to_features <https://github.com/huggingface/transformers/blob/master/transformers/data/processors/glue.py>`_.
171
- # load a data file into a list of ``InputFeatures``.
159
+ # The `glue_convert_examples_to_features <https://github.com/huggingface/transformers/blob/master/transformers/data/processors/glue.py>`_ function converts the texts into input features:
172
160
#
173
161
# - Tokenize the input sequences;
174
162
# - Insert [CLS] at the beginning;
175
163
# - Insert [SEP] between the first sentence and the second sentence, and
176
164
# at the end;
177
165
# - Generate token type ids to indicate whether a token belongs to the
178
- # first sequence or the second sequence;
179
- #
180
- # F1 metric
181
- # ~~~~~~~~~
166
+ # first sequence or the second sequence.
182
167
#
183
168
# The `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_
184
169
# can be interpreted as a weighted average of the precision and recall,
185
170
# where an F1 score reaches its best value at 1 and worst score at 0. The
186
171
# relative contribution of precision and recall to the F1 score are equal.
187
- # The formula for the F1 score is:
172
+ # The equation for the F1 score is:
188
173
#
189
- # F1 = 2 \* (precision \* recall) / (precision + recall)
174
+ # - F1 = 2 \* (precision \* recall) / (precision + recall)
190
175
#
191
176
192
177
204
189
# with the pre-trained BERT model to classify semantically equivalent
205
190
# sentence pairs on MRPC task.
206
191
#
207
- # To fine-tune the pre-trained BERT model (“ bert-base-uncased” model in
192
+ # To fine-tune the pre-trained BERT model (`` bert-base-uncased`` model in
208
193
# HuggingFace transformers) for the MRPC task, you can follow the command
209
- # in `examples<https://github.com/huggingface/transformers/tree/master/examples>`_"
194
+ # in `examples <https://github.com/huggingface/transformers/tree/master/examples#mrpc >`_:
210
195
#
211
196
# ::
212
197
#
213
198
# export GLUE_DIR=./glue_data
214
199
# export TASK_NAME=MRPC
215
- # export OUT_DIR=/mnt/homedir/jianyuhuang/public/bert /$TASK_NAME/
200
+ # export OUT_DIR=. /$TASK_NAME/
216
201
# python ./run_glue.py \
217
202
# --model_type bert \
218
203
# --model_name_or_path bert-base-uncased \
229
214
# --save_steps 100000 \
230
215
# --output_dir $OUT_DIR
231
216
#
232
- # We provide the fined-tuned BERT model for MRPC task here (We did the
233
- # fine-tuning on CPUs with a total train batch size of 8):
234
- #
235
- # https://drive.google.com/drive/folders/1mGBx0t-YJAWXHbgab2f_IimaMiVHlKh-
236
- #
237
- # To save time, you can manually copy the fined-tuned BERT model for MRPC
238
- # task in your Google Drive (Create the same “BERT_Quant_Tutorial/MRPC”
239
- # folder in the Google Drive directory), and then mount your Google Drive
240
- # on your runtime using an authorization code, so that we can directly
241
- # read and write the models into Google Drive in the following steps.
242
- #
243
-
244
- from google .colab import drive
245
- drive .mount ('/content/drive' )
246
-
217
+ # We provide the fined-tuned BERT model for MRPC task `here <https://download.pytorch.org/tutorial/MRPC.zip>`_.
218
+ # To save time, you can download the model file (~400 MB) directly into your local folder ``$OUT_DIR``.
247
219
248
220
######################################################################
249
221
# Set global configurations
258
230
259
231
configs = Namespace ()
260
232
261
- # The output directory for the fine-tuned model.
262
- configs .output_dir = "/content/drive/My Drive/BERT_Quant_Tutorial /MRPC/"
233
+ # The output directory for the fine-tuned model, $OUT_DIR .
234
+ configs .output_dir = ". /MRPC/"
263
235
264
- # The data directory for the MRPC task in the GLUE benchmark.
265
- configs .data_dir = "/content /glue_data/MRPC"
236
+ # The data directory for the MRPC task in the GLUE benchmark, $GLUE_DIR/$TASK_NAME .
237
+ configs .data_dir = ". /glue_data/MRPC"
266
238
267
239
# The model name or path for the pre-trained model.
268
240
configs .model_name_or_path = "bert-base-uncased"
@@ -315,8 +287,9 @@ def set_seed(seed):
315
287
# Define the tokenize and evaluation function
316
288
# -------------------------------------------
317
289
#
318
- # We reuse the tokenize and evaluation function from `huggingface <https://github.com/huggingface/transformers/blob/master/examples/run_glue.py>`_.
290
+ # We reuse the tokenize and evaluation function from `Huggingface <https://github.com/huggingface/transformers/blob/master/examples/run_glue.py>`_.
319
291
#
292
+
320
293
# coding=utf-8
321
294
# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
322
295
# Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
@@ -478,7 +451,7 @@ def load_and_cache_examples(args, task, tokenizer, evaluate=False):
478
451
# --------------------
479
452
#
480
453
# Let’s first check the model size. We can observe a significant reduction
481
- # in model size:
454
+ # in model size (FP32 total size: 438 MB; INT8 total size: 181 MB) :
482
455
#
483
456
484
457
def print_size_of_model (model ):
@@ -491,7 +464,7 @@ def print_size_of_model(model):
491
464
492
465
493
466
######################################################################
494
- # The BERT model used in this tutorial (bert-base-uncased) has a
467
+ # The BERT model used in this tutorial (`` bert-base-uncased`` ) has a
495
468
# vocabulary size V of 30522. With the embedding size of 768, the total
496
469
# size of the word embedding table is ~ 4 (Bytes/FP32) \* 30522 \* 768 =
497
470
# 90 MB. So with the help of quantization, the model size of the
@@ -509,7 +482,6 @@ def print_size_of_model(model):
509
482
# dynamic quantization.
510
483
#
511
484
512
- # Evaluate the original FP32 BERT model
513
485
def time_model_evaluation (model , configs , tokenizer ):
514
486
eval_start_time = time .time ()
515
487
result = evaluate (configs , model , tokenizer , prefix = "" )
@@ -518,6 +490,7 @@ def time_model_evaluation(model, configs, tokenizer):
518
490
print (result )
519
491
print ("Evaluate total time (seconds): {0:.1f}" .format (eval_duration_time ))
520
492
493
+ # Evaluate the original FP32 BERT model
521
494
time_model_evaluation (model , configs , tokenizer )
522
495
523
496
# Evaluate the INT8 BERT model after the dynamic quantization
@@ -539,7 +512,8 @@ def time_model_evaluation(model, configs, tokenizer):
539
512
#
540
513
# We have 0.6% F1 score accuracy after applying the post-training dynamic
541
514
# quantization on the fine-tuned BERT model on the MRPC task. As a
542
- # comparison, in the recent paper [3] (Table 1), it achieved 0.8788 by
515
+ # comparison, in a `recent paper <https://arxiv.org/pdf/1910.06188.pdf>`_ (Table 1),
516
+ # it achieved 0.8788 by
543
517
# applying the post-training dynamic quantization and 0.8956 by applying
544
518
# the quantization-aware training. The main reason is that we support the
545
519
# asymmetric quantization in PyTorch while that paper supports the
@@ -583,7 +557,7 @@ def time_model_evaluation(model, configs, tokenizer):
583
557
# having a limited implication on accuracy.
584
558
#
585
559
# Thanks for reading! As always, we welcome any feedback, so please create
586
- # an issue here ( https://github.com/pytorch/pytorch/issues) if you have
560
+ # an issue ` here < https://github.com/pytorch/pytorch/issues>`_ if you have
587
561
# any.
588
562
#
589
563
@@ -592,14 +566,14 @@ def time_model_evaluation(model, configs, tokenizer):
592
566
# References
593
567
# -----------
594
568
#
595
- # [1] J.Devlin, M. Chang, K. Lee and K. Toutanova, BERT: Pre-training of
569
+ # [1] J.Devlin, M. Chang, K. Lee and K. Toutanova, ` BERT: Pre-training of
596
570
# Deep Bidirectional Transformers for Language Understanding (2018)
571
+ # <https://arxiv.org/pdf/1810.04805.pdf>`_.
597
572
#
598
- # [2] HuggingFace Transformers.
599
- # https://github.com/huggingface/transformers
573
+ # [2] `HuggingFace Transformers <https://github.com/huggingface/transformers>`_.
600
574
#
601
- # [3] O. Zafrir, G. Boudoukh, P. Izsak, & M. Wasserblat (2019). Q8BERT:
602
- # Quantized 8bit BERT. arXiv preprint arXiv: 1910.06188.
575
+ # [3] O. Zafrir, G. Boudoukh, P. Izsak, and M. Wasserblat (2019). ` Q8BERT:
576
+ # Quantized 8bit BERT <https://arxiv.org/pdf/ 1910.06188.pdf>`_ .
603
577
#
604
578
605
579
0 commit comments