From 14550b1b125d85878dbc3c145398ba5d62649653 Mon Sep 17 00:00:00 2001 From: Andrea Tupini Date: Fri, 27 Jan 2023 09:36:25 -0600 Subject: [PATCH] Fix typo and formatting in bettertransformer_tutorial.rst --- beginner_source/bettertransformer_tutorial.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/beginner_source/bettertransformer_tutorial.rst b/beginner_source/bettertransformer_tutorial.rst index 10302331b36..96249d88651 100644 --- a/beginner_source/bettertransformer_tutorial.rst +++ b/beginner_source/bettertransformer_tutorial.rst @@ -18,7 +18,7 @@ been updated to use the core library modules to benefit from fastpath accelerati Better Transformer offers two types of acceleration: -* Native multihead attention implementation for CPU and GPU to improvee overall execution efficiency. +* Native multihead attention (MHA) implementation for CPU and GPU to improve overall execution efficiency. * Exploiting sparsity in NLP inference. Because of variable input lengths, input tokens may contain a large number of padding tokens for which processing may be skipped, delivering significant speedups. @@ -124,6 +124,7 @@ Finally, we set the benchmark iteration count: 2.1 Run and benchmark inference on CPU with and without BT fastpath (native MHA only) We run the model on CPU, and collect profile information: + * The first run uses traditional ("slow path") execution. * The second run enables BT fastpath execution by putting the model in inference mode using `model.eval()` and disables gradient collection with `torch.no_grad()`. @@ -167,6 +168,7 @@ We disable the BT sparsity: We run the model on DEVICE, and collect profile information for native MHA execution on DEVICE: + * The first run uses traditional ("slow path") execution. * The second run enables BT fastpath execution by putting the model in inference mode using `model.eval()` and disables gradient collection with `torch.no_grad()`.