@@ -492,7 +492,7 @@ follows:
492
492
493
493
| Prec | F1 score | Model Size | 1 thread | 4 threads |
494
494
| FP32 | 0.9019 | 438 MB | 160 sec | 85 sec |
495
- | INT8 | 0.8953 | 181 MB | 90 sec | 46 sec |
495
+ | INT8 | 0.902 | 181 MB | 90 sec | 46 sec |
496
496
497
497
We have 0.6% F1 score accuracy after applying the post-training dynamic
498
498
quantization on the fine-tuned BERT model on the MRPC task. As a
@@ -520,15 +520,23 @@ processing the evaluation of MRPC dataset.
520
520
3.3 Serialize the quantized model
521
521
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
522
522
523
- We can serialize and save the quantized model for the future use.
523
+ We can serialize and save the quantized model for the future use using
524
+ `torch.jit.save ` after tracing the model.
524
525
525
526
.. code :: python
526
527
527
- quantized_output_dir = configs.output_dir + " quantized/"
528
- if not os.path.exists(quantized_output_dir):
529
- os.makedirs(quantized_output_dir)
530
- quantized_model.save_pretrained(quantized_output_dir)
528
+ input_ids = ids_tensor([8 , 128 ], 2 )
529
+ token_type_ids = ids_tensor([8 , 128 ], 2 )
530
+ attention_mask = ids_tensor([8 , 128 ], vocab_size = 2 )
531
+ dummy_input = (input_ids, attention_mask, token_type_ids)
532
+ traced_model = torch.jit.trace(quantized_model, dummy_input)
533
+ torch.jit.save(traced_model, " bert_traced_eager_quant.pt" )
531
534
535
+ To load the quantized model, we can use `torch.jit.load `
536
+
537
+ .. code :: python
538
+
539
+ loaded_quantized_model = torch.jit.load(" bert_traced_eager_quant.pt" )
532
540
533
541
Conclusion
534
542
----------
0 commit comments