Skip to content

Commit be9d2e6

Browse files
kwonmhaholly1238
andauthored
update dynamic quantization bert tutorial (#1129)
Clarify meaning on comparison original FP32 model and quantized model Remove duplicated phrases Co-authored-by: holly1238 <77758406+holly1238@users.noreply.github.com>
1 parent a67438b commit be9d2e6

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

intermediate_source/dynamic_quantization_bert_tutorial.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -494,7 +494,7 @@ follows:
494494
| FP32 | 0.9019 | 438 MB | 160 sec | 85 sec |
495495
| INT8 | 0.902 | 181 MB | 90 sec | 46 sec |
496496
497-
We have 0.6% F1 score accuracy after applying the post-training dynamic
497+
We have 0.6% lower F1 score accuracy after applying the post-training dynamic
498498
quantization on the fine-tuned BERT model on the MRPC task. As a
499499
comparison, in a `recent paper <https://arxiv.org/pdf/1910.06188.pdf>`_ (Table 1),
500500
it achieved 0.8788 by
@@ -541,7 +541,7 @@ To load the quantized model, we can use `torch.jit.load`
541541
Conclusion
542542
----------
543543

544-
In this tutorial, we demonstrated how to demonstrate how to convert a
544+
In this tutorial, we demonstrated how to convert a
545545
well-known state-of-the-art NLP model like BERT into dynamic quantized
546546
model. Dynamic quantization can reduce the size of the model while only
547547
having a limited implication on accuracy.

0 commit comments

Comments
 (0)