Skip to content

Commit b0bfaf8

Browse files
committed
Update Dynamic Quant BERT Tutorial 2
1 parent c320480 commit b0bfaf8

File tree

1 file changed

+38
-64
lines changed

1 file changed

+38
-64
lines changed

intermediate_source/dynamic_quantization_bert_tutorial.py

Lines changed: 38 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,8 @@
1616
#
1717
#
1818
# In this tutorial, we will apply the dynamic quantization on a BERT
19-
# model, closely following the BERT model from the HuggingFace
20-
# Transformers examples (https://github.com/huggingface/transformers).
19+
# model, closely following the BERT model from `the HuggingFace
20+
# Transformers examples <https://github.com/huggingface/transformers>`_.
2121
# With this step-by-step journey, we would like to demonstrate how to
2222
# convert a well-known state-of-the-art model like BERT into dynamic
2323
# quantized model.
@@ -27,7 +27,7 @@
2727
# achieves the state-of-the-art accuracy results on many popular
2828
# Natural Language Processing (NLP) tasks, such as question answering,
2929
# text classification, and others. The original paper can be found
30-
# here: https://arxiv.org/pdf/1810.04805.pdf.
30+
# `here <https://arxiv.org/pdf/1810.04805.pdf>`_.
3131
#
3232
# - Dynamic quantization support in PyTorch converts a float model to a
3333
# quantized model with static int8 or float16 data types for the
@@ -36,8 +36,8 @@
3636
# quantized to int8.
3737
#
3838
# In PyTorch, we have `torch.quantization.quantize_dynamic API
39-
# <https://pytorch.org/docs/stable/quantization.html#torch.quantization.quantize_dynamic>`_
40-
# ,which replaces specified modules with dynamic weight-only quantized
39+
# <https://pytorch.org/docs/stable/quantization.html#torch.quantization.quantize_dynamic>`_,
40+
# which replaces specified modules with dynamic weight-only quantized
4141
# versions and output the quantized model.
4242
#
4343
# - We demonstrate the accuracy and inference performance results on the
@@ -47,8 +47,8 @@
4747
# a corpus of sentence pairs automatically extracted from online news
4848
# sources, with human annotations of whether the sentences in the pair
4949
# are semantically equivalent. Because the classes are imbalanced (68%
50-
# positive, 32% negative), we follow common practice and report both
51-
# accuracy and `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_
50+
# positive, 32% negative), we follow the common practice and report
51+
# `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_.
5252
# MRPC is a common NLP task for language pair classification, as shown
5353
# below.
5454
#
@@ -63,13 +63,8 @@
6363
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6464
#
6565
# To start this tutorial, let’s first follow the installation instructions
66-
# in PyTorch and HuggingFace Github Repo: -
67-
#
68-
# * https://github.com/pytorch/pytorch/#installation -
69-
#
70-
# * https://github.com/huggingface/transformers#installation
71-
#
72-
# In addition, we also install ``sklearn`` package, as we will reuse its
66+
# in PyTorch `here <https://github.com/pytorch/pytorch/#installation>`_ and HuggingFace Github Repo `here <https://github.com/huggingface/transformers#installation>`_.
67+
# In addition, we also install `scikit-learn <https://github.com/scikit-learn/scikit-learn>`_ package, as we will reuse its
7368
# built-in F1 score calculation helper function.
7469
#
7570
# .. code:: shell
@@ -141,17 +136,14 @@
141136
# --------------------
142137
#
143138
# Before running MRPC tasks we download the `GLUE data
144-
# <https://gluebenchmark.com/tasks>`_ by running this `script
145-
# <https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e>`_ followed by
146-
# `download_glue_data <https://github.com/nyu-mll/GLUE-baselines/blob/master/download_glue_data.py>`_.
147-
# and unpack it to some directory “glue_data/MRPC”.
139+
# <https://gluebenchmark.com/tasks>`_ by running `this script
140+
# <https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e>`_
141+
# and unpack it to a directory `glue_data`.
148142
#
149143
#
150144
# .. code:: shell
151145
#
152-
# wget https://gist.githubusercontent.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e/raw/17b8dd0d724281ed7c3b2aeeda662b92809aadd5/download_glue_data.py
153146
# python download_glue_data.py --data_dir='glue_data' --tasks='MRPC'
154-
# ls glue_data/MRPC
155147
#
156148

157149

@@ -164,29 +156,22 @@
164156
# into the feature vectors; The other one for measuring the F1 score of
165157
# the predicted result.
166158
#
167-
# Convert the texts into features
168-
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
169-
#
170-
# `glue_convert_examples_to_features <https://github.com/huggingface/transformers/blob/master/transformers/data/processors/glue.py>`_.
171-
# load a data file into a list of ``InputFeatures``.
159+
# The `glue_convert_examples_to_features <https://github.com/huggingface/transformers/blob/master/transformers/data/processors/glue.py>`_ function converts the texts into input features:
172160
#
173161
# - Tokenize the input sequences;
174162
# - Insert [CLS] at the beginning;
175163
# - Insert [SEP] between the first sentence and the second sentence, and
176164
# at the end;
177165
# - Generate token type ids to indicate whether a token belongs to the
178-
# first sequence or the second sequence;
179-
#
180-
# F1 metric
181-
# ~~~~~~~~~
166+
# first sequence or the second sequence.
182167
#
183168
# The `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_
184169
# can be interpreted as a weighted average of the precision and recall,
185170
# where an F1 score reaches its best value at 1 and worst score at 0. The
186171
# relative contribution of precision and recall to the F1 score are equal.
187-
# The formula for the F1 score is:
172+
# The equation for the F1 score is:
188173
#
189-
# F1 = 2 \* (precision \* recall) / (precision + recall)
174+
# - F1 = 2 \* (precision \* recall) / (precision + recall)
190175
#
191176

192177

@@ -204,15 +189,15 @@
204189
# with the pre-trained BERT model to classify semantically equivalent
205190
# sentence pairs on MRPC task.
206191
#
207-
# To fine-tune the pre-trained BERT model (bert-base-uncased model in
192+
# To fine-tune the pre-trained BERT model (``bert-base-uncased`` model in
208193
# HuggingFace transformers) for the MRPC task, you can follow the command
209-
# in `examples<https://github.com/huggingface/transformers/tree/master/examples>`_"
194+
# in `examples <https://github.com/huggingface/transformers/tree/master/examples#mrpc>`_:
210195
#
211196
# ::
212197
#
213198
# export GLUE_DIR=./glue_data
214199
# export TASK_NAME=MRPC
215-
# export OUT_DIR=/mnt/homedir/jianyuhuang/public/bert/$TASK_NAME/
200+
# export OUT_DIR=./$TASK_NAME/
216201
# python ./run_glue.py \
217202
# --model_type bert \
218203
# --model_name_or_path bert-base-uncased \
@@ -229,21 +214,8 @@
229214
# --save_steps 100000 \
230215
# --output_dir $OUT_DIR
231216
#
232-
# We provide the fined-tuned BERT model for MRPC task here (We did the
233-
# fine-tuning on CPUs with a total train batch size of 8):
234-
#
235-
# https://drive.google.com/drive/folders/1mGBx0t-YJAWXHbgab2f_IimaMiVHlKh-
236-
#
237-
# To save time, you can manually copy the fined-tuned BERT model for MRPC
238-
# task in your Google Drive (Create the same “BERT_Quant_Tutorial/MRPC”
239-
# folder in the Google Drive directory), and then mount your Google Drive
240-
# on your runtime using an authorization code, so that we can directly
241-
# read and write the models into Google Drive in the following steps.
242-
#
243-
244-
from google.colab import drive
245-
drive.mount('/content/drive')
246-
217+
# We provide the fined-tuned BERT model for MRPC task `here <https://download.pytorch.org/tutorial/MRPC.zip>`_.
218+
# To save time, you can download the model file (~400 MB) directly into your local folder ``$OUT_DIR``.
247219

248220
######################################################################
249221
# Set global configurations
@@ -258,11 +230,11 @@
258230

259231
configs = Namespace()
260232

261-
# The output directory for the fine-tuned model.
262-
configs.output_dir = "/content/drive/My Drive/BERT_Quant_Tutorial/MRPC/"
233+
# The output directory for the fine-tuned model, $OUT_DIR.
234+
configs.output_dir = "./MRPC/"
263235

264-
# The data directory for the MRPC task in the GLUE benchmark.
265-
configs.data_dir = "/content/glue_data/MRPC"
236+
# The data directory for the MRPC task in the GLUE benchmark, $GLUE_DIR/$TASK_NAME.
237+
configs.data_dir = "./glue_data/MRPC"
266238

267239
# The model name or path for the pre-trained model.
268240
configs.model_name_or_path = "bert-base-uncased"
@@ -315,8 +287,9 @@ def set_seed(seed):
315287
# Define the tokenize and evaluation function
316288
# -------------------------------------------
317289
#
318-
# We reuse the tokenize and evaluation function from `huggingface <https://github.com/huggingface/transformers/blob/master/examples/run_glue.py>`_.
290+
# We reuse the tokenize and evaluation function from `Huggingface <https://github.com/huggingface/transformers/blob/master/examples/run_glue.py>`_.
319291
#
292+
320293
# coding=utf-8
321294
# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
322295
# Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
@@ -478,7 +451,7 @@ def load_and_cache_examples(args, task, tokenizer, evaluate=False):
478451
# --------------------
479452
#
480453
# Let’s first check the model size. We can observe a significant reduction
481-
# in model size:
454+
# in model size (FP32 total size: 438 MB; INT8 total size: 181 MB):
482455
#
483456

484457
def print_size_of_model(model):
@@ -491,7 +464,7 @@ def print_size_of_model(model):
491464

492465

493466
######################################################################
494-
# The BERT model used in this tutorial (bert-base-uncased) has a
467+
# The BERT model used in this tutorial (``bert-base-uncased``) has a
495468
# vocabulary size V of 30522. With the embedding size of 768, the total
496469
# size of the word embedding table is ~ 4 (Bytes/FP32) \* 30522 \* 768 =
497470
# 90 MB. So with the help of quantization, the model size of the
@@ -509,7 +482,6 @@ def print_size_of_model(model):
509482
# dynamic quantization.
510483
#
511484

512-
# Evaluate the original FP32 BERT model
513485
def time_model_evaluation(model, configs, tokenizer):
514486
eval_start_time = time.time()
515487
result = evaluate(configs, model, tokenizer, prefix="")
@@ -518,6 +490,7 @@ def time_model_evaluation(model, configs, tokenizer):
518490
print(result)
519491
print("Evaluate total time (seconds): {0:.1f}".format(eval_duration_time))
520492

493+
# Evaluate the original FP32 BERT model
521494
time_model_evaluation(model, configs, tokenizer)
522495

523496
# Evaluate the INT8 BERT model after the dynamic quantization
@@ -539,7 +512,8 @@ def time_model_evaluation(model, configs, tokenizer):
539512
#
540513
# We have 0.6% F1 score accuracy after applying the post-training dynamic
541514
# quantization on the fine-tuned BERT model on the MRPC task. As a
542-
# comparison, in the recent paper [3] (Table 1), it achieved 0.8788 by
515+
# comparison, in a `recent paper <https://arxiv.org/pdf/1910.06188.pdf>`_ (Table 1),
516+
# it achieved 0.8788 by
543517
# applying the post-training dynamic quantization and 0.8956 by applying
544518
# the quantization-aware training. The main reason is that we support the
545519
# asymmetric quantization in PyTorch while that paper supports the
@@ -583,7 +557,7 @@ def time_model_evaluation(model, configs, tokenizer):
583557
# having a limited implication on accuracy.
584558
#
585559
# Thanks for reading! As always, we welcome any feedback, so please create
586-
# an issue here (https://github.com/pytorch/pytorch/issues) if you have
560+
# an issue `here <https://github.com/pytorch/pytorch/issues>`_ if you have
587561
# any.
588562
#
589563

@@ -592,14 +566,14 @@ def time_model_evaluation(model, configs, tokenizer):
592566
# References
593567
# -----------
594568
#
595-
# [1] J.Devlin, M. Chang, K. Lee and K. Toutanova, BERT: Pre-training of
569+
# [1] J.Devlin, M. Chang, K. Lee and K. Toutanova, `BERT: Pre-training of
596570
# Deep Bidirectional Transformers for Language Understanding (2018)
571+
# <https://arxiv.org/pdf/1810.04805.pdf>`_.
597572
#
598-
# [2] HuggingFace Transformers.
599-
# https://github.com/huggingface/transformers
573+
# [2] `HuggingFace Transformers <https://github.com/huggingface/transformers>`_.
600574
#
601-
# [3] O. Zafrir, G. Boudoukh, P. Izsak, & M. Wasserblat (2019). Q8BERT:
602-
# Quantized 8bit BERT. arXiv preprint arXiv:1910.06188.
575+
# [3] O. Zafrir, G. Boudoukh, P. Izsak, and M. Wasserblat (2019). `Q8BERT:
576+
# Quantized 8bit BERT <https://arxiv.org/pdf/1910.06188.pdf>`_.
603577
#
604578

605579

0 commit comments

Comments
 (0)