-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Update Dynamic Quant BERT Tutorial 2 #753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jlin27
merged 1 commit into
pytorch:jlin27-quant-tutorials
from
jianyuh:jlin27-quant-tutorials
Dec 6, 2019
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,8 +16,8 @@ | |
# | ||
# | ||
# In this tutorial, we will apply the dynamic quantization on a BERT | ||
# model, closely following the BERT model from the HuggingFace | ||
# Transformers examples (https://github.com/huggingface/transformers). | ||
# model, closely following the BERT model from `the HuggingFace | ||
# Transformers examples <https://github.com/huggingface/transformers>`_. | ||
# With this step-by-step journey, we would like to demonstrate how to | ||
# convert a well-known state-of-the-art model like BERT into dynamic | ||
# quantized model. | ||
|
@@ -27,18 +27,16 @@ | |
# achieves the state-of-the-art accuracy results on many popular | ||
# Natural Language Processing (NLP) tasks, such as question answering, | ||
# text classification, and others. The original paper can be found | ||
# here: https://arxiv.org/pdf/1810.04805.pdf. | ||
# `here <https://arxiv.org/pdf/1810.04805.pdf>`_. | ||
# | ||
# - Dynamic quantization support in PyTorch converts a float model to a | ||
# quantized model with static int8 or float16 data types for the | ||
# weights and dynamic quantization for the activations. The activations | ||
# are quantized dynamically (per batch) to int8 when the weights are | ||
# quantized to int8. | ||
# | ||
# In PyTorch, we have `torch.quantization.quantize_dynamic API | ||
# <https://pytorch.org/docs/stable/quantization.html#torch.quantization.quantize_dynamic>`_ | ||
# ,which replaces specified modules with dynamic weight-only quantized | ||
# versions and output the quantized model. | ||
# quantized to int8. In PyTorch, we have `torch.quantization.quantize_dynamic API | ||
# <https://pytorch.org/docs/stable/quantization.html#torch.quantization.quantize_dynamic>`_, | ||
# which replaces specified modules with dynamic weight-only quantized | ||
# versions and output the quantized model. | ||
# | ||
# - We demonstrate the accuracy and inference performance results on the | ||
# `Microsoft Research Paraphrase Corpus (MRPC) task <https://www.microsoft.com/en-us/download/details.aspx?id=52398>`_ | ||
|
@@ -47,29 +45,24 @@ | |
# a corpus of sentence pairs automatically extracted from online news | ||
# sources, with human annotations of whether the sentences in the pair | ||
# are semantically equivalent. Because the classes are imbalanced (68% | ||
# positive, 32% negative), we follow common practice and report both | ||
# accuracy and `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_ | ||
# positive, 32% negative), we follow the common practice and report | ||
# `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_. | ||
# MRPC is a common NLP task for language pair classification, as shown | ||
# below. | ||
# | ||
# .. figure:: /_static/img/bert_mrpc.png | ||
# .. figure:: /_static/img/bert.png | ||
|
||
|
||
###################################################################### | ||
# Setup | ||
# 1. Setup | ||
# ------- | ||
# | ||
# Install PyTorch and HuggingFace Transformers | ||
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
# | ||
# To start this tutorial, let’s first follow the installation instructions | ||
# in PyTorch and HuggingFace Github Repo: - | ||
# | ||
# * https://github.com/pytorch/pytorch/#installation - | ||
# | ||
# * https://github.com/huggingface/transformers#installation | ||
# | ||
# In addition, we also install ``sklearn`` package, as we will reuse its | ||
# in PyTorch `here <https://github.com/pytorch/pytorch/#installation>`_ and HuggingFace Github Repo `here <https://github.com/huggingface/transformers#installation>`_. | ||
# In addition, we also install `scikit-learn <https://github.com/scikit-learn/scikit-learn>`_ package, as we will reuse its | ||
# built-in F1 score calculation helper function. | ||
# | ||
# .. code:: shell | ||
|
@@ -94,7 +87,7 @@ | |
|
||
|
||
###################################################################### | ||
# Import the necessary modules | ||
# 2. Import the necessary modules | ||
# ---------------------------- | ||
# | ||
# In this step we import the necessary Python modules for the tutorial. | ||
|
@@ -137,61 +130,51 @@ | |
|
||
|
||
###################################################################### | ||
# Download the dataset | ||
# 3. Download the dataset | ||
# -------------------- | ||
# | ||
# Before running MRPC tasks we download the `GLUE data | ||
# <https://gluebenchmark.com/tasks>`_ by running this `script | ||
# <https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e>`_ followed by | ||
# `download_glue_data <https://github.com/nyu-mll/GLUE-baselines/blob/master/download_glue_data.py>`_. | ||
# and unpack it to some directory “glue_data/MRPC”. | ||
# <https://gluebenchmark.com/tasks>`_ by running `this script | ||
# <https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e>`_ | ||
# and unpack it to a directory `glue_data`. | ||
# | ||
# | ||
# .. code:: shell | ||
# | ||
# wget https://gist.githubusercontent.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e/raw/17b8dd0d724281ed7c3b2aeeda662b92809aadd5/download_glue_data.py | ||
# python download_glue_data.py --data_dir='glue_data' --tasks='MRPC' | ||
# ls glue_data/MRPC | ||
# | ||
|
||
|
||
###################################################################### | ||
# Helper functions | ||
# 4. Helper functions | ||
# ---------------- | ||
# | ||
# The helper functions are built-in in transformers library. We mainly use | ||
# the following helper functions: one for converting the text examples | ||
# into the feature vectors; The other one for measuring the F1 score of | ||
# the predicted result. | ||
# | ||
# Convert the texts into features | ||
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
# | ||
# `glue_convert_examples_to_features <https://github.com/huggingface/transformers/blob/master/transformers/data/processors/glue.py>`_. | ||
# load a data file into a list of ``InputFeatures``. | ||
# The `glue_convert_examples_to_features <https://github.com/huggingface/transformers/blob/master/transformers/data/processors/glue.py>`_ function converts the texts into input features: | ||
# | ||
# - Tokenize the input sequences; | ||
# - Insert [CLS] at the beginning; | ||
# - Insert [SEP] between the first sentence and the second sentence, and | ||
# at the end; | ||
# - Generate token type ids to indicate whether a token belongs to the | ||
# first sequence or the second sequence; | ||
# | ||
# F1 metric | ||
# ~~~~~~~~~ | ||
# first sequence or the second sequence. | ||
# | ||
# The `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_ | ||
# can be interpreted as a weighted average of the precision and recall, | ||
# where an F1 score reaches its best value at 1 and worst score at 0. The | ||
# relative contribution of precision and recall to the F1 score are equal. | ||
# The formula for the F1 score is: | ||
# The equation for the F1 score is: | ||
# | ||
# F1 = 2 \* (precision \* recall) / (precision + recall) | ||
# - F1 = 2 \* (precision \* recall) / (precision + recall) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
# | ||
|
||
|
||
###################################################################### | ||
# Fine-tune the BERT model | ||
# 5. Fine-tune the BERT model | ||
# -------------------------- | ||
# | ||
|
||
|
@@ -204,15 +187,15 @@ | |
# with the pre-trained BERT model to classify semantically equivalent | ||
# sentence pairs on MRPC task. | ||
# | ||
# To fine-tune the pre-trained BERT model (“bert-base-uncased” model in | ||
# To fine-tune the pre-trained BERT model (``bert-base-uncased`` model in | ||
# HuggingFace transformers) for the MRPC task, you can follow the command | ||
# in `examples<https://github.com/huggingface/transformers/tree/master/examples>`_" | ||
# in `examples <https://github.com/huggingface/transformers/tree/master/examples#mrpc>`_: | ||
# | ||
# :: | ||
# | ||
# export GLUE_DIR=./glue_data | ||
# export TASK_NAME=MRPC | ||
# export OUT_DIR=/mnt/homedir/jianyuhuang/public/bert/$TASK_NAME/ | ||
# export OUT_DIR=./$TASK_NAME/ | ||
# python ./run_glue.py \ | ||
# --model_type bert \ | ||
# --model_name_or_path bert-base-uncased \ | ||
|
@@ -229,24 +212,11 @@ | |
# --save_steps 100000 \ | ||
# --output_dir $OUT_DIR | ||
# | ||
# We provide the fined-tuned BERT model for MRPC task here (We did the | ||
# fine-tuning on CPUs with a total train batch size of 8): | ||
# | ||
# https://drive.google.com/drive/folders/1mGBx0t-YJAWXHbgab2f_IimaMiVHlKh- | ||
# | ||
# To save time, you can manually copy the fined-tuned BERT model for MRPC | ||
# task in your Google Drive (Create the same “BERT_Quant_Tutorial/MRPC” | ||
# folder in the Google Drive directory), and then mount your Google Drive | ||
# on your runtime using an authorization code, so that we can directly | ||
# read and write the models into Google Drive in the following steps. | ||
# | ||
|
||
from google.colab import drive | ||
drive.mount('/content/drive') | ||
|
||
# We provide the fined-tuned BERT model for MRPC task `here <https://download.pytorch.org/tutorial/MRPC.zip>`_. | ||
# To save time, you can download the model file (~400 MB) directly into your local folder ``$OUT_DIR``. | ||
|
||
###################################################################### | ||
# Set global configurations | ||
# 6. Set global configurations | ||
# ------------------------- | ||
# | ||
|
||
|
@@ -258,11 +228,11 @@ | |
|
||
configs = Namespace() | ||
|
||
# The output directory for the fine-tuned model. | ||
configs.output_dir = "/content/drive/My Drive/BERT_Quant_Tutorial/MRPC/" | ||
# The output directory for the fine-tuned model, $OUT_DIR. | ||
configs.output_dir = "./MRPC/" | ||
|
||
# The data directory for the MRPC task in the GLUE benchmark. | ||
configs.data_dir = "/content/glue_data/MRPC" | ||
# The data directory for the MRPC task in the GLUE benchmark, $GLUE_DIR/$TASK_NAME. | ||
configs.data_dir = "./glue_data/MRPC" | ||
|
||
# The model name or path for the pre-trained model. | ||
configs.model_name_or_path = "bert-base-uncased" | ||
|
@@ -294,7 +264,7 @@ def set_seed(seed): | |
|
||
|
||
###################################################################### | ||
# Load the fine-tuned BERT model | ||
# 7. Load the fine-tuned BERT model | ||
# ------------------------------ | ||
# | ||
|
||
|
@@ -312,11 +282,12 @@ def set_seed(seed): | |
|
||
|
||
###################################################################### | ||
# Define the tokenize and evaluation function | ||
# 8. Define the tokenize and evaluation function | ||
# ------------------------------------------- | ||
# | ||
# We reuse the tokenize and evaluation function from `huggingface <https://github.com/huggingface/transformers/blob/master/examples/run_glue.py>`_. | ||
# We reuse the tokenize and evaluation function from `Huggingface <https://github.com/huggingface/transformers/blob/master/examples/run_glue.py>`_. | ||
# | ||
|
||
# coding=utf-8 | ||
# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team. | ||
# Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved. | ||
|
@@ -455,7 +426,7 @@ def load_and_cache_examples(args, task, tokenizer, evaluate=False): | |
|
||
|
||
###################################################################### | ||
# Apply the dynamic quantization | ||
# 9. Apply the dynamic quantization | ||
# ------------------------------- | ||
# | ||
# We call ``torch.quantization.quantize_dynamic`` on the model to apply | ||
|
@@ -474,11 +445,11 @@ def load_and_cache_examples(args, task, tokenizer, evaluate=False): | |
|
||
|
||
###################################################################### | ||
# Check the model size | ||
# 10. Check the model size | ||
# -------------------- | ||
# | ||
# Let’s first check the model size. We can observe a significant reduction | ||
# in model size: | ||
# in model size (FP32 total size: 438 MB; INT8 total size: 181 MB): | ||
# | ||
|
||
def print_size_of_model(model): | ||
|
@@ -491,7 +462,7 @@ def print_size_of_model(model): | |
|
||
|
||
###################################################################### | ||
# The BERT model used in this tutorial (bert-base-uncased) has a | ||
# The BERT model used in this tutorial (``bert-base-uncased``) has a | ||
# vocabulary size V of 30522. With the embedding size of 768, the total | ||
# size of the word embedding table is ~ 4 (Bytes/FP32) \* 30522 \* 768 = | ||
# 90 MB. So with the help of quantization, the model size of the | ||
|
@@ -501,15 +472,14 @@ def print_size_of_model(model): | |
|
||
|
||
###################################################################### | ||
# Evaluate the inference accuracy and time | ||
# 11. Evaluate the inference accuracy and time | ||
# ---------------------------------------- | ||
# | ||
# Next, let’s compare the inference time as well as the evaluation | ||
# accuracy between the original FP32 model and the INT8 model after the | ||
# dynamic quantization. | ||
# | ||
|
||
# Evaluate the original FP32 BERT model | ||
def time_model_evaluation(model, configs, tokenizer): | ||
eval_start_time = time.time() | ||
result = evaluate(configs, model, tokenizer, prefix="") | ||
|
@@ -518,6 +488,7 @@ def time_model_evaluation(model, configs, tokenizer): | |
print(result) | ||
print("Evaluate total time (seconds): {0:.1f}".format(eval_duration_time)) | ||
|
||
# Evaluate the original FP32 BERT model | ||
time_model_evaluation(model, configs, tokenizer) | ||
|
||
# Evaluate the INT8 BERT model after the dynamic quantization | ||
|
@@ -539,7 +510,8 @@ def time_model_evaluation(model, configs, tokenizer): | |
# | ||
# We have 0.6% F1 score accuracy after applying the post-training dynamic | ||
# quantization on the fine-tuned BERT model on the MRPC task. As a | ||
# comparison, in the recent paper [3] (Table 1), it achieved 0.8788 by | ||
# comparison, in a `recent paper <https://arxiv.org/pdf/1910.06188.pdf>`_ (Table 1), | ||
# it achieved 0.8788 by | ||
# applying the post-training dynamic quantization and 0.8956 by applying | ||
# the quantization-aware training. The main reason is that we support the | ||
# asymmetric quantization in PyTorch while that paper supports the | ||
|
@@ -561,7 +533,7 @@ def time_model_evaluation(model, configs, tokenizer): | |
|
||
|
||
###################################################################### | ||
# Serialize the quantized model | ||
# 12. Serialize the quantized model | ||
# ----------------------------- | ||
# | ||
# We can serialize and save the quantized model for the future use. | ||
|
@@ -583,7 +555,7 @@ def time_model_evaluation(model, configs, tokenizer): | |
# having a limited implication on accuracy. | ||
# | ||
# Thanks for reading! As always, we welcome any feedback, so please create | ||
# an issue here (https://github.com/pytorch/pytorch/issues) if you have | ||
# an issue `here <https://github.com/pytorch/pytorch/issues>`_ if you have | ||
# any. | ||
# | ||
|
||
|
@@ -592,14 +564,14 @@ def time_model_evaluation(model, configs, tokenizer): | |
# References | ||
# ----------- | ||
# | ||
# [1] J.Devlin, M. Chang, K. Lee and K. Toutanova, BERT: Pre-training of | ||
# [1] J.Devlin, M. Chang, K. Lee and K. Toutanova, `BERT: Pre-training of | ||
# Deep Bidirectional Transformers for Language Understanding (2018) | ||
# <https://arxiv.org/pdf/1810.04805.pdf>`_. | ||
# | ||
# [2] HuggingFace Transformers. | ||
# https://github.com/huggingface/transformers | ||
# [2] `HuggingFace Transformers <https://github.com/huggingface/transformers>`_. | ||
# | ||
# [3] O. Zafrir, G. Boudoukh, P. Izsak, & M. Wasserblat (2019). Q8BERT: | ||
# Quantized 8bit BERT. arXiv preprint arXiv:1910.06188. | ||
# [3] O. Zafrir, G. Boudoukh, P. Izsak, and M. Wasserblat (2019). `Q8BERT: | ||
# Quantized 8bit BERT <https://arxiv.org/pdf/1910.06188.pdf>`_. | ||
# | ||
|
||
|
||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The length of the header underlining must be equal to the length of the header