44
44
# <https://gluebenchmark.com/>`_. The MRPC (Dolan and Brockett, 2005) is
45
45
# a corpus of sentence pairs automatically extracted from online news
46
46
# sources, with human annotations of whether the sentences in the pair
47
- # are semantically equivalent. Because the classes are imbalanced (68%
47
+ # are semantically equivalent. As the classes are imbalanced (68%
48
48
# positive, 32% negative), we follow the common practice and report
49
49
# `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_.
50
50
# MRPC is a common NLP task for language pair classification, as shown
55
55
56
56
######################################################################
57
57
# 1. Setup
58
- # -------
58
+ # --------
59
59
#
60
- # Install PyTorch and HuggingFace Transformers
61
- # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
60
+ # 1.1 Install PyTorch and HuggingFace Transformers
61
+ # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
62
62
#
63
63
# To start this tutorial, let’s first follow the installation instructions
64
64
# in PyTorch `here <https://github.com/pytorch/pytorch/#installation>`_ and HuggingFace Github Repo `here <https://github.com/huggingface/transformers#installation>`_.
87
87
88
88
89
89
######################################################################
90
- # 2. Import the necessary modules
91
- # ----------------------------
90
+ # 1.2 Import the necessary modules
91
+ # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
92
92
#
93
93
# In this step we import the necessary Python modules for the tutorial.
94
94
#
130
130
131
131
132
132
######################################################################
133
- # 3. Download the dataset
134
- # --------------------
133
+ # 1.3 Download the dataset
134
+ # ^^^^^^^^^^^^^^^^^^^^^^^^
135
135
#
136
136
# Before running MRPC tasks we download the `GLUE data
137
137
# <https://gluebenchmark.com/tasks>`_ by running `this script
138
138
# <https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e>`_
139
- # and unpack it to a directory `glue_data`.
139
+ # and unpack it to a directory `` glue_data` `.
140
140
#
141
141
#
142
142
# .. code:: shell
146
146
147
147
148
148
######################################################################
149
- # 4. Helper functions
150
- # ----------------
149
+ # 1.4 Learn about helper functions
150
+ # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
151
151
#
152
152
# The helper functions are built-in in transformers library. We mainly use
153
153
# the following helper functions: one for converting the text examples
157
157
# The `glue_convert_examples_to_features <https://github.com/huggingface/transformers/blob/master/transformers/data/processors/glue.py>`_ function converts the texts into input features:
158
158
#
159
159
# - Tokenize the input sequences;
160
- # - Insert [CLS] at the beginning;
160
+ # - Insert [CLS] in the beginning;
161
161
# - Insert [SEP] between the first sentence and the second sentence, and
162
- # at the end;
162
+ # in the end;
163
163
# - Generate token type ids to indicate whether a token belongs to the
164
164
# first sequence or the second sequence.
165
165
#
166
166
# The `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_
167
167
# can be interpreted as a weighted average of the precision and recall,
168
168
# where an F1 score reaches its best value at 1 and worst score at 0. The
169
169
# relative contribution of precision and recall to the F1 score are equal.
170
- # The equation for the F1 score is:
171
170
#
172
- # - F1 = 2 \* (precision \* recall) / (precision + recall)
171
+ # - The equation for the F1 score is:
172
+ # .. math:: F1 = 2 * (\text{precision} * \text{recall}) / (\text{precision} + \text{recall})
173
173
#
174
174
175
175
176
176
######################################################################
177
- # 5 . Fine-tune the BERT model
178
- # --------------------------
177
+ # 2 . Fine-tune the BERT model
178
+ # ---------------------------
179
179
#
180
180
181
181
216
216
# To save time, you can download the model file (~400 MB) directly into your local folder ``$OUT_DIR``.
217
217
218
218
######################################################################
219
- # 6. Set global configurations
220
- # -------------------------
219
+ # 2.1 Set global configurations
220
+ # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
221
221
#
222
222
223
223
@@ -264,12 +264,9 @@ def set_seed(seed):
264
264
265
265
266
266
######################################################################
267
- # 7. Load the fine-tuned BERT model
268
- # ------------------------------
267
+ # 2.2 Load the fine-tuned BERT model
268
+ # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
269
269
#
270
-
271
-
272
- ######################################################################
273
270
# We load the tokenizer and fine-tuned BERT sequence classifier model
274
271
# (FP32) from the ``configs.output_dir``.
275
272
#
@@ -282,8 +279,8 @@ def set_seed(seed):
282
279
283
280
284
281
######################################################################
285
- # 8. Define the tokenize and evaluation function
286
- # -------------------------------------------
282
+ # 2.3 Define the tokenize and evaluation function
283
+ # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
287
284
#
288
285
# We reuse the tokenize and evaluation function from `Huggingface <https://github.com/huggingface/transformers/blob/master/examples/run_glue.py>`_.
289
286
#
@@ -426,7 +423,7 @@ def load_and_cache_examples(args, task, tokenizer, evaluate=False):
426
423
427
424
428
425
######################################################################
429
- # 9 . Apply the dynamic quantization
426
+ # 3 . Apply the dynamic quantization
430
427
# -------------------------------
431
428
#
432
429
# We call ``torch.quantization.quantize_dynamic`` on the model to apply
@@ -445,8 +442,8 @@ def load_and_cache_examples(args, task, tokenizer, evaluate=False):
445
442
446
443
447
444
######################################################################
448
- # 10. Check the model size
449
- # --------------------
445
+ # 3.1 Check the model size
446
+ # ^^^^^^^^^^^^^^^^^^^^^^^^
450
447
#
451
448
# Let’s first check the model size. We can observe a significant reduction
452
449
# in model size (FP32 total size: 438 MB; INT8 total size: 181 MB):
@@ -472,8 +469,8 @@ def print_size_of_model(model):
472
469
473
470
474
471
######################################################################
475
- # 11. Evaluate the inference accuracy and time
476
- # ----------------------------------------
472
+ # 3.2 Evaluate the inference accuracy and time
473
+ # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
477
474
#
478
475
# Next, let’s compare the inference time as well as the evaluation
479
476
# accuracy between the original FP32 model and the INT8 model after the
@@ -513,7 +510,7 @@ def time_model_evaluation(model, configs, tokenizer):
513
510
# comparison, in a `recent paper <https://arxiv.org/pdf/1910.06188.pdf>`_ (Table 1),
514
511
# it achieved 0.8788 by
515
512
# applying the post-training dynamic quantization and 0.8956 by applying
516
- # the quantization-aware training. The main reason is that we support the
513
+ # the quantization-aware training. The main difference is that we support the
517
514
# asymmetric quantization in PyTorch while that paper supports the
518
515
# symmetric quantization only.
519
516
#
@@ -533,8 +530,8 @@ def time_model_evaluation(model, configs, tokenizer):
533
530
534
531
535
532
######################################################################
536
- # 12. Serialize the quantized model
537
- # -----------------------------
533
+ # 3.3 Serialize the quantized model
534
+ # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
538
535
#
539
536
# We can serialize and save the quantized model for the future use.
540
537
#
0 commit comments