From c347228ef9708fd4f1c86d3c09b38a420ccb46ec Mon Sep 17 00:00:00 2001 From: Jeff Tang Date: Mon, 19 Oct 2020 18:02:12 -0700 Subject: [PATCH 1/5] pt mobile script and optimize recipe --- recipes_source/script_optimized.rst | 191 ++++++++++++++++++++++++++++ 1 file changed, 191 insertions(+) create mode 100644 recipes_source/script_optimized.rst diff --git a/recipes_source/script_optimized.rst b/recipes_source/script_optimized.rst new file mode 100644 index 00000000000..6a56ee2bd95 --- /dev/null +++ b/recipes_source/script_optimized.rst @@ -0,0 +1,191 @@ +Script and Optimize for Mobile Recipe +===================================== + +This recipe demonstrates how to convert a PyTorch model to TorchScript which can run in a high-performance C++ environment such as iOS and Android, and how to optimize the converted TorchScript model for mobile deployment. + +Introduction +------------ + +After a PyTorch model is trained and optionally but preferably quantized (see the Quantization for Mobile Apps Recipe), one essential step before the model can be used in iOS and Android apps is to convert the Python-dependent model to TorchScript, which can then further be optimized for mobile apps. Conversion to TorchScript can be as simple as a single call, or as complicated as changing the original model in many different places. + +Pre-requisites +-------------- + +PyTorch 1.6.0 or 1.7.0 + +Conversion to TorchScript +------------------------- + +There are two basic ways to convert a PyTorch model to TorchScript, using `trace` and `script`. Mixing `trace` and `script` may also be needed in some cases - see `here `_ for more information. + +Use the `trace` Method +^^^^^^^^^^^^^^^^^^^^^^ + +To use the `trace` method on a model, an example or dummy input for the model needs to be specified, the actual input size needs to be the same as the example input size, and the model definition cannot have control flow such as `if` or `for`. The reason for these constraints is that running `trace` on a model with an example input simply calls the model's `forward` method with the input and all operations executed in the model layers are recorded, creating the trace of the model. + +:: + + import torch + + dummy_input = torch.rand(1, 3, 224, 224) + torchscript_model = torch.jit.trace(model_quantized, dummy_input) + + +Use the `script` Method +^^^^^^^^^^^^^^^^^^^^^^^ + +For the example above, calling `script` below makes no difference: + +:: + + torchscript_model = torch.jit.script(model_quantized) + +But if a model has some flow control, then `trace` won't correctly record all the possible traces. Take some code snippet of an example model definition from `here `_ for example: + +:: + + class MyDecisionGate(torch.nn.Module): + def forward(self, x): + if x.sum() > 0: + return x + else: + return -x + + x = torch.rand(3, 4) + traced_cell = torch.jit.trace(MyDecisionGate(), x) + print(traced_cell.code) + +The code above will output: +TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! + +:: + + if x.sum() > 0: + def forward(self, + input: Tensor, + h: Tensor) -> Tuple[Tensor, Tensor]: + _0 = self.dg + _1 = (self.linear).forward(input, ) + _2 = (_0).forward(_1, ) + _3 = torch.tanh(torch.add(_1, h, alpha=1)) + return (_3, _3) + + +It is important to notice that "the trace might not generalize to other inputs" warning above means that if the model has any kind of data-dependent control flow, `trace` is not the right answer. But if we replace the last two lines of the Python code above with: + +:: + + scripted_cell = torch.jit.script(MyDecisionGate()) + print(scripted_cell.code) + +The output will be covering all possible inputs, thus generalizing to other inputs: + +:: + + def forward(self, + x: Tensor) -> Tensor: + _0 = bool(torch.gt(torch.sum(x, dtype=None), 0)) + if _0: + _1 = x + else: + _1 = torch.neg(x) + return _1 + + +This is another example of using `trace` and `script` - it converts the model trained in the PyTorch [neural machine translation tutorial](): + +:: + + encoder = EncoderRNN(input_lang.n_words, hidden_size) + decoder = AttnDecoderRNN(hidden_size, output_lang.n_words) + + # method 1: using trace with example inputs + + encoder_input=torch.tensor([1]) + encoder_hidden=torch.zeros(1, 1, hidden_size) + + decoder_input1=torch.tensor([[0]]) + decoder_input2=torch.zeros(1, 1, hidden_size) + decoder_input3=torch.zeros(MAX_LENGTH, hidden_size) + + traced_encoder = torch.jit.trace(encoder, (encoder_input, encoder_hidden)) + traced_decoder = torch.jit.trace(decoder, (decoder_input1, decoder_input2, decoder_input3)) + + # method 2: using script + + scripted_encoder = torch.jit.script(encoder) + scripted_decoder = torch.jit.script(decoder) + +So is it true that one can simply always use the `script` call and the model is converted to TorchScript? The answer is no, because TorchScript is actually a subset of Python and to make `script` work, the PyTorch model definition must only use the language features of that TorchScript subset of Python. `TorchScript Language Reference `_ covers all the details of what is supported in TorchScript. + + +Fix Errors with the `script` Method +--------------------------------------------- + +If you apply the `script` method to a non-trivial model, chances are you may encounter several types of errors. Check out `this tutorial `_ for a complete example of converting a chatbot model to TorchScript. But follow the steps below to fix common errors when you run the `script` method: + +1. For RuntimeError `attribute lookup is not defined on python value of type` (of a model), pass the value of the model as a parameter in the constructor. This is because when calling `script` on a model that accepts another model as a parameter, the model passed is actually of type `TracedModule` or `ScriptModule`, not of type `Module`, making the the model attribute not defined when scripting. + +For example, the `LuongAttnDecoderRNN` module in the tutorial above has an attribute `n_layers`, and the `GreedySearchDecoder` module refers to the `n_layers` attribute of a `decoder` instance of the `LuongAttnDecoderRNN` module, so in order to make `script` work, the `GreedySearchDecoder` module's constructor needs to be changed from: + +:: + + def __init__(self, encoder, decoder): + +to: + +:: + + def __init__(self, encoder, decoder, decoder_n_layers): + ... + self._decoder_n_layers = decoder_n_layers + + +and the `GreedySearchDecoder`'s `forward` method needs to refer `self._decoder_n_layers` instead of `decoder.n_layers`. + +2. For RuntimeError `python value of type '...' cannot be used as a value. Perhaps it is a closed over global variable? If so, please consider passing it in as an argument or use a local variable instead.`, store global variables' values as attributes in the model constructor (there's no need to add them to a special list called `__constants__`). The reason is that global values can be used conveniently in normal model training and inference, but the global values are not accessible during the scripting. + +For example, `device` and `SOS_token` are global variables, and to make `script` work, they need to be added to the `GreedySearchDecoder`'s constructor: + +:: + + self._device = device + self._SOS_token = SOS_token + +and referred to as `self._device` and `self._SOS_token` instead of `device` and `SOS_token` in the `GreedySearchDecoder`'s `forward` method. + +3. For RuntimeError `RuntimeError: all inputs of range must be '...', found Tensor (inferred) in argument`, add type definitions for each of the module's forward method arguments. Because all parameters to a TorchScript function are of the `torch.Tensor` type by default, you need to specifically declare the type for each parameter that is not of type 'Tensor'. For a complete list of TorchScript-supported types, see `here `_. + +For example, the `GreedySearchDecoder`'s `forward` method signature needs to be changed from: + +:: + + def forward(self, input_seq, input_length, max_length): + +to: + +:: + + def forward(self, input_seq, input_length, max_length : int): + +After using the `trace` or `script` method above, and fixing possible errors, you should have a TorchScript model ready to be optimized for mobile. + + + +Optimize a TorchScript Model +-------------------------------------- + +Simply run the following code snippet to optimize a TorchScript model generated with the `trace` and/or `script` method: + +:: + + from torch.utils.mobile_optimizer import optimize_for_mobile + optimized_torchscript_model = optimize_for_mobile(torchscript_model) + +The optimized model can then be saved and deployed in mobile apps: + +:: + + optimized_torchscript_model.save("optimized_torchscript_model.pth") + +For more details on what `optimize_for_mobile` does behind the scene, see `here `_. From be34ce57f119cb1ed2c62f7c3beacf5055a470fc Mon Sep 17 00:00:00 2001 From: Jeff Tang Date: Tue, 20 Oct 2020 14:50:50 -0700 Subject: [PATCH 2/5] 1 pt mobile new recipes summary and 5 recipes --- recipes_source/fuse.rst | 157 +++++++++++++++++++ recipes_source/model_preparation_android.rst | 82 ++++++++++ recipes_source/model_preparation_ios.rst | 91 +++++++++++ recipes_source/ptmobile_recipes_summary.rst | 40 +++++ recipes_source/quantization.rst | 114 ++++++++++++++ recipes_source/script_optimized.rst | 31 ++-- 6 files changed, 504 insertions(+), 11 deletions(-) create mode 100644 recipes_source/fuse.rst create mode 100644 recipes_source/model_preparation_android.rst create mode 100644 recipes_source/model_preparation_ios.rst create mode 100644 recipes_source/ptmobile_recipes_summary.rst create mode 100644 recipes_source/quantization.rst diff --git a/recipes_source/fuse.rst b/recipes_source/fuse.rst new file mode 100644 index 00000000000..97ae0df736f --- /dev/null +++ b/recipes_source/fuse.rst @@ -0,0 +1,157 @@ +Fuse Modules Recipe +===================================== + +This recipe demonstrates how to fuse a list of PyTorch modules into a single module and how to do the performance test to compare the fused model with its non-fused version. + +Introduction +------------ + +Before quantization is applied to a model to reduce its size and memory footprint (see `Quantization Recipe `_ for details on quantization), the list of modules in the model may be fused first into a single module. Fusion is optional, but it may save on memory access, make the model run faster, and improve its accuracy. + + +Pre-requisites +-------------- + +PyTorch 1.6.0 or 1.7.0 + +Steps +-------------- + +Follow the steps below to fuse an example model, quantize it, script it, optimize it for mobile, save it and test it with the Android benchmark tool. + +1. Define the Example Model +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Use the same example model defined in the `PyTorch Mobile Performance Recipes `_: + +:: + + import torch + from torch.utils.mobile_optimizer import optimize_for_mobile + + class AnnotatedConvBnReLUModel(torch.nn.Module): + def __init__(self): + super(AnnotatedConvBnReLUModel, self).__init__() + self.conv = torch.nn.Conv2d(3, 5, 3, bias=False).to(dtype=torch.float) + self.bn = torch.nn.BatchNorm2d(5).to(dtype=torch.float) + self.relu = torch.nn.ReLU(inplace=True) + self.quant = torch.quantization.QuantStub() + self.dequant = torch.quantization.DeQuantStub() + + def forward(self, x): + x.contiguous(memory_format=torch.channels_last) + x = self.quant(x) + x = self.conv(x) + x = self.bn(x) + x = self.relu(x) + x = self.dequant(x) + return x + + +2. Generate Two Models with and without `fuse_modules` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Add the following code below the model definition above and run the script: + +:: + + model = AnnotatedConvBnReLUModel() + + print(model) + + def prepare_save(model, fused): + model.qconfig = torch.quantization.get_default_qconfig('qnnpack') + torch.quantization.prepare(model, inplace=True) + torch.quantization.convert(model, inplace=True) + torchscript_model = torch.jit.script(model) + torchscript_model_optimized = optimize_for_mobile(torchscript_model) + torch.jit.save(torchscript_model_optimized, "model.pt" if not fused else "model_fused.pt") + + prepare_save(model, False) + + model = AnnotatedConvBnReLUModel() + model_fused = torch.quantization.fuse_modules(model, [['bn', 'relu']], inplace=False) + print(model_fused) + prepare_save(model_fused, True) + + +The outputs of the original model and its fused version will be: + +:: + + AnnotatedConvBnReLUModel( + (conv): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1), bias=False) + (bn): BatchNorm2d(5, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + (relu): ReLU(inplace=True) + (quant): QuantStub() + (dequant): DeQuantStub() + ) + + AnnotatedConvBnReLUModel( + (conv): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1), bias=False) + (bn): BNReLU2d( + (0): BatchNorm2d(5, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) + (1): ReLU(inplace=True) + ) + (relu): Identity() + (quant): QuantStub() + (dequant): DeQuantStub() + ) + +In the second fused model output, the first item `bn` in the list is replaced with the fused module, and the rest of the modules (`relu` in this example) is replaced with identity. In addition, the non-fused and fused versions of the model `model.pt` and `model_fused.pt` are generated. + +3. Build the Android benchmark Tool +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Get the PyTorch source and build the Android benchmark tool as follows: + +:: + + git clone --recursive https://github.com/pytorch/pytorch + cd pytorch + git submodule update --init --recursive + BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=arm64-v8a ./scripts/build_android.sh -DBUILD_BINARY=ON + + +This will generate the Android benchmark binary `speed_benchmark_torch` in the `build_android/bin` folder. + +4. Test Compare the Fused and Non-Fused Models +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Connect your Android device, then copy `speed_benchmark_torch` and the model files and run the benchmark tool on them: + +:: + + adb push build_android/bin/speed_benchmark_torch /data/local/tmp + adb push model.pt /data/local/tmp + adb push model_fused.pt /data/local/tmp + adb shell "/data/local/tmp/speed_benchmark_torch --model=/data/local/tmp/model.pt" --input_dims="1,3,224,224" --input_type="float" + adb shell "/data/local/tmp/speed_benchmark_torch --model=/data/local/tmp/model_fused.pt" --input_dims="1,3,224,224" --input_type="float" + + +The results from the last two commands should be like: + +:: + + Main run finished. Microseconds per iter: 6189.07. Iters per second: 161.575 + +and + +:: + + Main run finished. Microseconds per iter: 6216.65. Iters per second: 160.858 + +For this example model, there is no much performance difference between the fused and non-fused models. But the similar steps can be used to fuse and prepare a real deep model and test to see the performance improvement. Keep in mind that currently `torch.quantization.fuse_modules` only fuses the following sequence of modules: + +* conv, bn +* conv, bn, relu +* conv, relu +* linear, relu +* bn, relu + +If any other sequence list is provided to the `fuse_modules` call, it will simply be ignored. + +Learn More +--------------- + +See `here `_ for the official documentation of `torch.quantization.fuse_modules`. diff --git a/recipes_source/model_preparation_android.rst b/recipes_source/model_preparation_android.rst new file mode 100644 index 00000000000..a2a7838ea03 --- /dev/null +++ b/recipes_source/model_preparation_android.rst @@ -0,0 +1,82 @@ +Model Preparation for Android Recipe +===================================== + +This recipe demonstrates how to prepare a PyTorch MobileNet v2 image classification model for Android apps, and how to set up Android projects to use the mobile-ready model file. + +Introduction +----------------- + +After a PyTorch model is trained or a pre-trained model is made available, it is normally not ready to be used in mobile apps yet. It needs to be quantized (see the `Quantization Recipe `_), converted to TorchScript so Android apps can load it, and optimized for mobile apps. Furthermore, Android apps need to be set up correctly to enable the use of PyTorch Mobile libraries, before they can load and use the model for inference. + +Pre-requisites +----------------- + +PyTorch 1.6.0 or 1.7.0 + +torchvision 0.6.0 or 0.7.0 + +Android Studio 3.5.1 or above with NDK installed + +Steps +----------------- + +1. Get Pretrained and Quantized MobileNet v2 Model +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +To get the MobileNet v2 quantized model, simply do: + +:: + + import torchvision + + model_quantized = torchvision.models.quantization.mobilenet_v2(pretrained=True, quantize=True) + +2. Script and Optimize the Model for Mobile Apps +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Use either the `script` or `trace` method to convert the quantized model to the TorchScript format: + +:: + + import torch + + dummy_input = torch.rand(1, 3, 224, 224) + torchscript_model = torch.jit.trace(model_quantized, dummy_input) + +or + +:: + + torchscript_model = torch.jit.script(model_quantized) + + +Then optimize the TorchScript formatted model for mobile and save it: + +:: + + from torch.utils.mobile_optimizer import optimize_for_mobile + torchscript_model_optimized = optimize_for_mobile(torchscript_model) + torch.jit.save(torchscript_model_optimized, "mobilenetv2_quantized.pt") + +With the total 7 or 8 (depending on if the `script` or `trace` method is called to get the TorchScript format of the model) lines of code in the two steps above, we have a model ready to be added to mobile apps. + +3. Add the Model and PyTorch Library on Android +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +* In your current or a new Android Studio project, open the build.gradle file, and add the following two lines (the second one is required only if you plan to use a TorchVision model): + +:: + + implementation 'org.pytorch:pytorch_android:1.6.0' + implementation 'org.pytorch:pytorch_android_torchvision:1.6.0' + +* Drag and drop the model file `mobilenetv2_quantized.pt` to your project's assets folder. + +That's it! Now you can build your Android app with the PyTorch library and the model ready to use. To actually write code to use the model, refer to the PyTorch Mobile `Android Quickstart with a HelloWorld Example `_ and `Android Hackathon Example `_. + +Learn More +----------------- + +1. `PyTorch Mobile site `_ + +2. `Introduction to TorchScript `_ diff --git a/recipes_source/model_preparation_ios.rst b/recipes_source/model_preparation_ios.rst new file mode 100644 index 00000000000..759237d1f9a --- /dev/null +++ b/recipes_source/model_preparation_ios.rst @@ -0,0 +1,91 @@ +Model Preparation for iOS Recipe +===================================== + +This recipe demonstrates how to prepare a PyTorch MobileNet v2 image classification model for iOS apps, and how to set up an iOS project to use the mobile-ready model file. + +Introduction +----------------- + +After a PyTorch model is trained or a pre-trained model is made available, it is normally not ready to be used in mobile apps yet. It needs to be quantized (see `Quantization Recipe `_ for more details), converted to TorchScript so iOS apps can load it and optimized for mobile apps (see `Script and Optimize for Mobile Recipe `_). Furthermore, iOS apps need to be set up correctly to enable the use of PyTorch Mobile libraries, before they can load and use the model for inference. + +Pre-requisites +----------------- + +PyTorch 1.6.0 or 1.7.0 + +torchvision 0.6.0 or 0.7.0 + +Xcode 11 or 12 + +Steps +----------------- + +1. Get Pretrained and Quantized MobileNet v2 Model +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +To get the MobileNet v2 quantized model, simply do: + +:: + + import torchvision + + model_quantized = torchvision.models.quantization.mobilenet_v2(pretrained=True, quantize=True) + +2. Script and Optimize the Model for Mobile Apps +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Use either the script or trace method to convert the quantized model to the TorchScript format: + +:: + + import torch + + dummy_input = torch.rand(1, 3, 224, 224) + torchscript_model = torch.jit.trace(model_quantized, dummy_input) + +or + +:: + + torchscript_model = torch.jit.script(model_quantized) + +Then optimize the TorchScript formatted model for mobile and save it: + +:: + + from torch.utils.mobile_optimizer import optimize_for_mobile + torchscript_model_optimized = optimize_for_mobile(torchscript_model) + torch.jit.save(torchscript_model_optimized, "mobilenetv2_quantized.pt") + +With the total 7 or 8 (depending on if the script or trace method is called to get the TorchScript format of the model) lines of code in the two steps above, we have a model ready to be added to mobile apps. + +3. Add the Model and PyTorch Library on iOS +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +To use the mobile-ready model `mobilenetv2_quantized.pt` in an iOS app, either create a new Xcode project or in your existing Xcode project, then follow the steps below: + +* Open a Mac Terminal, cd to your iOS app's project folder; + +* If your iOS app does not use Cocoapods yet, run `pod init` first to generate the `Podfile` file. + +* Edit `Podfile` either from Xcode or any editor, and add the following line under the target: + +:: + + pod 'LibTorch', '~>1.6.1' + +* Run `pod install` from the Terminal and then open your project's xcworkspace file; + +* Drag and drop the two files `TorchModule.h` and `TorchModule.mm` to your project. If your project is Swift based, a message box with the title "Would you like to configure an Objective-C bridging header?" will show up; click the "Create Bridging Header" button to create a Swift to Objective-c bridging header file, and add `#import "TorchModule.h"` to the header file `-Bridging-Header.h`; + +* Drag and drop the model file `mobilenetv2_quantized.pt` to the project. + +After these steps, you can successfully build and run your Xcode project. To actually write code to use the model, refer to the PyTorch Mobile `iOS Code Walkthrough `_ and two complete ready-to-run sample iOS apps `HelloWorld `_ and `iOS Hackathon Example `_. + + +Learn More +----------------- + +1. `PyTorch Mobile site `_ + +2. `Introduction to TorchScript `_ diff --git a/recipes_source/ptmobile_recipes_summary.rst b/recipes_source/ptmobile_recipes_summary.rst new file mode 100644 index 00000000000..cddee940f2a --- /dev/null +++ b/recipes_source/ptmobile_recipes_summary.rst @@ -0,0 +1,40 @@ +Summary of PyTorch Mobile Recipes +===================================== + +This summary provides a top level overview of recipes for PyTorch Mobile to help developers choose which recipes to follow for their PyTorch-powered mobile app development. + +Introduction +---------------- + +When a PyTorch model is trained or retrained, or when a pre-trained model is available, for mobile deployment, follow the the recipes outlined in this summary so mobile apps can successfully use the model. + +Pre-requisites +---------------- + +PyTorch 1.6.0 or 1.7.0 + +(Optional) torchvision 0.6.0 or 0.7.0 + +For iOS development: Xcode 11 or 12 + +For Android development: Android Studio 3.5.1 or above (with NDK installed); or Android SDK, NDK, Gradle, JDK. + +New Recipes for PyTorch Mobile +-------------------------------- + +* (Recommended) To fuse a list of PyTorch modules into a single module to reduce the model size before quantization, read the `Fuse Modules recipe `_. + +* (Recommended) To reduce the model size and make it run faster without losing much on accuracy, read the `Quantization Recipe `_. + +* (Must) To convert the model to TorchScipt and (optional) optimize it for mobile apps, read the `Script and Optimize for Mobile Recipe `_. + +* (Must for iOS development) To add the model in an iOS project and use PyTorch pod for iOS, read the `Model preparation for iOS Recipe `_. + +* (Must for Android development) To add the model in an Android project and use the PyTorch library for Android, read the `Model preparation for Android Recipe `_. + + +Learn More +----------------- + +1. `PyTorch Mobile site `_ +2. `PyTorch Mobile Performance Recipes `_ diff --git a/recipes_source/quantization.rst b/recipes_source/quantization.rst new file mode 100644 index 00000000000..7ab7a1b8736 --- /dev/null +++ b/recipes_source/quantization.rst @@ -0,0 +1,114 @@ +Quantization Recipe +===================================== + +This recipe demonstrates how to quantize a PyTorch model so it can be used on iOS and Android apps, or in other production . Follow the steps below on how to use four different methods to quantize different models. + +Introduction +------------ + +Quantization is a technique that converts 32-bit floating numbers in the model parameters to 8-bit integers. With quantization, the model size and memory footprint can be reduced to 1/4 of its original size, and the inference can be made about 2-4 times faster, while the accuracy stays about the same. + +There are overall three approaches or workflows to quantize a model: post training dynamic quantization, post training static quantization, and quantization aware training. But if the model you want to use already has a quantized version, you can use it directly without going through any of the three workflows above. For example, the `torchvision` library already includes quantized versions for models MobileNet v2, ResNet 18, ResNet 50, Inception v3, GoogleNet, among others. So we will make the last approach another workflow, albeit a simple one. + +Pre-requisites +----------------- + +PyTorch 1.6.0 or 1.7.0 + +torchvision 0.6.0 or 0.7.0 + +Workflows +------------ + +Use one of the four workflows below to quantize a model. + +1. Use Pretrained Quantized MobileNet v2 +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +To get the MobileNet v2 quantized model, simply do: + +:: + + import torchvision + model_quantized = torchvision.models.quantization.mobilenet_v2(pretrained=True, quantize=True) + + +To compare the size difference of a non-quantized MobileNet v2 model with its quantized version: + +:: + + model = torchvision.models.mobilenet_v2(pretrained=True) + + import os + import torch + + def print_model_size(mdl): + torch.save(mdl.state_dict(), "tmp.pt") + print("%.2f MB" %(os.path.getsize("tmp.pt")/1e6)) + os.remove('tmp.pt') + + print_model_size(model) + print_model_size(quantized_model) + + +The outputs will be: + +:: + + 14.27 MB + 3.63 MB + +2. Post Training Dynamic Quantization +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +To apply dynamic quantization, which converts all the weights in a model from 32-bit floating numbers to 8-bit integers but doesn't convert the activations to int8 till just before performing the computation on the activations, simply call `torch.quantization.quantize_dynamic`: + +:: + + model_dynamic_quantized = torch.quantization.quantize_dynamic( + model, qconfig_spec={torch.nn.Linear}, dtype=torch.qint8 + ) + +where `qconfig_spec` specifies the list of submodule names in `model` to apply quantization to. + +The full documentation of the `quantize_dynamic` API call is `here `_. Three other examples of using the post training dynamic quantization are `the Bert example `_, `an LSTM model example `_, and another `demo LSTM example `_. + +3. Post Training Static Quantization +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This method converts both the weights and the activations to 8-bit integers beforehand so there won't be on-the-fly conversion on the activations during the inference, as the dynamic quantization does, hence improving the performance significantly. + +To apply static quantization which converts both the weights and the activations to 8-bit integers beforehand so there won't be on-the-fly conversion on the activations during the inference, run the following code: + +:: + + backend = "qnnpack" + model.qconfig = torch.quantization.get_default_qconfig(backend) + torch.backends.quantized.engine = backend + model_static_quantized = torch.quantization.prepare(model, inplace=False) + model_static_quantized = torch.quantization.convert(model_static_quantized, inplace=False) + +After this, running `print_size_of_model(model_static_quantized)` shows the static quantized model is `3.98MB`. + +Notice that to make the model run on mobile devices which normally have arm architecture, you need to use the 'qnnpack' for `backend`; to run the model on computer with x86 architecture, use `fbgemm`. + +4. Quantization Aware Training +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +To apply quantization aware training, which inserts fake quantization to all the weights and activations during the model training process, use the following code snippet: + +:: + + model.qconfig = torch.quantization.get_default_qat_qconfig(backend) + model_qat = torch.quantization.prepare_qat(model, inplace=False) + # quantization aware training goes here + model_qat = torch.quantization.convert(model_qat.eval(), inplace=False) + +After a quantized model is generated using one of the steps above, before the model can be used to run on mobile devices, it needs to be further converted to the `TorchScript` format and then optimized for mobile apps. See the `Script and Optimize for Mobile recipe `_ for details. + +For a complete example of the quantization aware training, read this `tutorial `_. + +Learn More +----------------- + +For more info on the different workflows of quantization, see `here `_ and `here `_. diff --git a/recipes_source/script_optimized.rst b/recipes_source/script_optimized.rst index 6a56ee2bd95..7e0f284241f 100644 --- a/recipes_source/script_optimized.rst +++ b/recipes_source/script_optimized.rst @@ -6,7 +6,7 @@ This recipe demonstrates how to convert a PyTorch model to TorchScript which can Introduction ------------ -After a PyTorch model is trained and optionally but preferably quantized (see the Quantization for Mobile Apps Recipe), one essential step before the model can be used in iOS and Android apps is to convert the Python-dependent model to TorchScript, which can then further be optimized for mobile apps. Conversion to TorchScript can be as simple as a single call, or as complicated as changing the original model in many different places. +After a PyTorch model is trained and optionally but preferably quantized (see `Quantization Recipe `_ for more details), one essential step before the model can be used in iOS and Android apps is to convert the Python-dependent model to TorchScript, which can then further be optimized for mobile apps. Conversion to TorchScript can be as simple as a single call, or as complicated as changing the original model in many different places. Pre-requisites -------------- @@ -56,10 +56,11 @@ But if a model has some flow control, then `trace` won't correctly record all th print(traced_cell.code) The code above will output: -TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! :: + TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can''t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! + if x.sum() > 0: def forward(self, input: Tensor, @@ -71,7 +72,7 @@ TracerWarning: Converting a tensor to a Python boolean might cause the trace to return (_3, _3) -It is important to notice that "the trace might not generalize to other inputs" warning above means that if the model has any kind of data-dependent control flow, `trace` is not the right answer. But if we replace the last two lines of the Python code above with: +Note that "the trace might not generalize to other inputs" warning above means that if the model has any kind of data-dependent control flow, `trace` is not the right answer. But if we replace the last two lines of the Python code snippet above (before the code output) with: :: @@ -92,7 +93,7 @@ The output will be covering all possible inputs, thus generalizing to other inpu return _1 -This is another example of using `trace` and `script` - it converts the model trained in the PyTorch [neural machine translation tutorial](): +This is another example of using `trace` and `script` - it converts the model trained in the PyTorch tutorial `NLP FROM SCRATCH: TRANSLATION WITH A SEQUENCE TO SEQUENCE NETWORK AND ATTENTION `_: :: @@ -116,15 +117,18 @@ This is another example of using `trace` and `script` - it converts the model tr scripted_encoder = torch.jit.script(encoder) scripted_decoder = torch.jit.script(decoder) -So is it true that one can simply always use the `script` call and the model is converted to TorchScript? The answer is no, because TorchScript is actually a subset of Python and to make `script` work, the PyTorch model definition must only use the language features of that TorchScript subset of Python. `TorchScript Language Reference `_ covers all the details of what is supported in TorchScript. +So is it true that one can simply always use the `script` call and the model is converted to TorchScript? The answer is no, because TorchScript is actually a subset of Python and to make `script` work, the PyTorch model definition must only use the language features of that TorchScript subset of Python. `TorchScript Language Reference `_ covers all the details of what is supported in TorchScript. Below we will describe some of the common errors when using the `script` method. -Fix Errors with the `script` Method ---------------------------------------------- +Fix Common Errors When Using the `script` Method +---------------------------------------------------- If you apply the `script` method to a non-trivial model, chances are you may encounter several types of errors. Check out `this tutorial `_ for a complete example of converting a chatbot model to TorchScript. But follow the steps below to fix common errors when you run the `script` method: -1. For RuntimeError `attribute lookup is not defined on python value of type` (of a model), pass the value of the model as a parameter in the constructor. This is because when calling `script` on a model that accepts another model as a parameter, the model passed is actually of type `TracedModule` or `ScriptModule`, not of type `Module`, making the the model attribute not defined when scripting. +1. RuntimeError `attribute lookup is not defined on python value of type` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +For this error, pass the value of the model as a parameter in the constructor. This is because when calling `script` on a model that accepts another model as a parameter, the model passed is actually of type `TracedModule` or `ScriptModule`, not of type `Module`, making the the model attribute not defined when scripting. For example, the `LuongAttnDecoderRNN` module in the tutorial above has an attribute `n_layers`, and the `GreedySearchDecoder` module refers to the `n_layers` attribute of a `decoder` instance of the `LuongAttnDecoderRNN` module, so in order to make `script` work, the `GreedySearchDecoder` module's constructor needs to be changed from: @@ -143,7 +147,10 @@ to: and the `GreedySearchDecoder`'s `forward` method needs to refer `self._decoder_n_layers` instead of `decoder.n_layers`. -2. For RuntimeError `python value of type '...' cannot be used as a value. Perhaps it is a closed over global variable? If so, please consider passing it in as an argument or use a local variable instead.`, store global variables' values as attributes in the model constructor (there's no need to add them to a special list called `__constants__`). The reason is that global values can be used conveniently in normal model training and inference, but the global values are not accessible during the scripting. +2. RuntimeError `python value of type '...' cannot be used as a value.` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The complete error message for this one continues with `Perhaps it is a closed over global variable? If so, please consider passing it in as an argument or use a local variable instead.`, store global variables' values as attributes in the model constructor (there's no need to add them to a special list called `__constants__`). The reason is that global values can be used conveniently in normal model training and inference, but the global values are not accessible during the scripting. For example, `device` and `SOS_token` are global variables, and to make `script` work, they need to be added to the `GreedySearchDecoder`'s constructor: @@ -154,7 +161,10 @@ For example, `device` and `SOS_token` are global variables, and to make `script` and referred to as `self._device` and `self._SOS_token` instead of `device` and `SOS_token` in the `GreedySearchDecoder`'s `forward` method. -3. For RuntimeError `RuntimeError: all inputs of range must be '...', found Tensor (inferred) in argument`, add type definitions for each of the module's forward method arguments. Because all parameters to a TorchScript function are of the `torch.Tensor` type by default, you need to specifically declare the type for each parameter that is not of type 'Tensor'. For a complete list of TorchScript-supported types, see `here `_. +3. RuntimeError `all inputs of range must be '...', found Tensor (inferred) in argument` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The error message continues with: `add type definitions for each of the module's forward method arguments. Because all parameters to a TorchScript function are of the `torch.Tensor` type by default, you need to specifically declare the type for each parameter that is not of type 'Tensor'. For a complete list of TorchScript-supported types, see `here `_. For example, the `GreedySearchDecoder`'s `forward` method signature needs to be changed from: @@ -171,7 +181,6 @@ to: After using the `trace` or `script` method above, and fixing possible errors, you should have a TorchScript model ready to be optimized for mobile. - Optimize a TorchScript Model -------------------------------------- From 1c047cc9cae73212cd38a3f42a037c29b9a17dc9 Mon Sep 17 00:00:00 2001 From: Jeff Tang Date: Wed, 21 Oct 2020 09:41:08 -0700 Subject: [PATCH 3/5] updated recipes_index.rst --- recipes_source/recipes_index.rst | 48 +++++++++++++++++++++++++++++++- 1 file changed, 47 insertions(+), 1 deletion(-) diff --git a/recipes_source/recipes_index.rst b/recipes_source/recipes_index.rst index 3a05d729b67..79ef395c7f1 100644 --- a/recipes_source/recipes_index.rst +++ b/recipes_source/recipes_index.rst @@ -166,7 +166,42 @@ Recipes are bite-sized, actionable examples of how to use specific PyTorch featu :image: ../_static/img/thumbnails/cropped/android.png :link: ../recipes/android_native_app_with_custom_op.html :tags: Mobile - + +.. customcarditem:: + :header: Fuse Modules recipe + :card_description: Learn how to fuse a list of PyTorch modules into a single module to reduce the model size before quantization. + :image: ../_static/img/thumbnails/cropped/mobile.png + :link: ../recipes/fuse.html + :tags: Mobile + +.. customcarditem:: + :header: Quantization for Mobile Recipe + :card_description: Learn how to reduce the model size and make it run faster without losing much on accuracy. + :image: ../_static/img/thumbnails/cropped/mobile.png + :link: ../recipes/quantization.html + :tags: Mobile,Quantization + +.. customcarditem:: + :header: Script and Optimize for Mobile + :card_description: Learn how to convert the model to TorchScipt and (optional) optimize it for mobile apps. + :image: ../_static/img/thumbnails/cropped/mobile.png + :link: ../recipes/script_optimized.html + :tags: Mobile + +.. customcarditem:: + :header: Model Preparation for iOS Recipe + :card_description: Learn how to add the model in an iOS project and use PyTorch pod for iOS. + :image: ../_static/img/thumbnails/cropped/mobile.png + :link: ../recipes/model_preparation_ios.html + :tags: Mobile + +.. customcarditem:: + :header: Model Preparation for Android Recipe + :card_description: Learn how to add the model in an Android project and use the PyTorch library for Android. + :image: ../_static/img/thumbnails/cropped/android.png + :link: ../recipes/model_preparation_android.html + :tags: Mobile + .. customcarditem:: :header: Profiling PyTorch RPC-Based Workloads :card_description: How to use the PyTorch profiler to profile RPC-based workloads. @@ -183,6 +218,15 @@ Recipes are bite-sized, actionable examples of how to use specific PyTorch featu :link: ../recipes/recipes/amp_recipe.html :tags: Model-Optimization +.. Performance + +.. customcarditem:: + :header: Performance Tuning Guide + :card_description: Tips for achieving optimal performance. + :image: ../_static/img/thumbnails/cropped/profiler.png + :link: ../recipes/recipes/tuning_guide.html + :tags: Model-Optimization + .. End of tutorial card section .. raw:: html @@ -216,6 +260,8 @@ Recipes are bite-sized, actionable examples of how to use specific PyTorch featu /recipes/recipes/tensorboard_with_pytorch /recipes/recipes/dynamic_quantization /recipes/recipes/amp_recipe + /recipes/recipes/tuning_guide /recipes/torchscript_inference /recipes/deployment_with_flask /recipes/distributed_rpc_profiling + /recipes/distributed_rpc_profiling From d4f3261d3adddd51afae3cfe667386ef56b7a11b Mon Sep 17 00:00:00 2001 From: Jeff Tang Date: Wed, 21 Oct 2020 09:46:05 -0700 Subject: [PATCH 4/5] thumbnail png fix for ios recipe in recipes_index.rst --- recipes_source/recipes_index.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/recipes_source/recipes_index.rst b/recipes_source/recipes_index.rst index 79ef395c7f1..618929d1130 100644 --- a/recipes_source/recipes_index.rst +++ b/recipes_source/recipes_index.rst @@ -191,7 +191,7 @@ Recipes are bite-sized, actionable examples of how to use specific PyTorch featu .. customcarditem:: :header: Model Preparation for iOS Recipe :card_description: Learn how to add the model in an iOS project and use PyTorch pod for iOS. - :image: ../_static/img/thumbnails/cropped/mobile.png + :image: ../_static/img/thumbnails/cropped/ios.png :link: ../recipes/model_preparation_ios.html :tags: Mobile From a6fb9d0b51a2c0a5ddb328eed759861b4e0e56e4 Mon Sep 17 00:00:00 2001 From: Jeff Tang Date: Fri, 23 Oct 2020 18:10:45 -0700 Subject: [PATCH 5/5] edits based on feedback --- recipes_source/fuse.rst | 12 ++++---- recipes_source/model_preparation_android.rst | 3 ++ recipes_source/model_preparation_ios.rst | 4 +++ recipes_source/quantization.rst | 31 ++++++++++++++++---- recipes_source/script_optimized.rst | 16 +++++++++- 5 files changed, 53 insertions(+), 13 deletions(-) diff --git a/recipes_source/fuse.rst b/recipes_source/fuse.rst index 97ae0df736f..b465042628d 100644 --- a/recipes_source/fuse.rst +++ b/recipes_source/fuse.rst @@ -57,8 +57,6 @@ Add the following code below the model definition above and run the script: model = AnnotatedConvBnReLUModel() - print(model) - def prepare_save(model, fused): model.qconfig = torch.quantization.get_default_qconfig('qnnpack') torch.quantization.prepare(model, inplace=True) @@ -68,14 +66,16 @@ Add the following code below the model definition above and run the script: torch.jit.save(torchscript_model_optimized, "model.pt" if not fused else "model_fused.pt") prepare_save(model, False) - - model = AnnotatedConvBnReLUModel() model_fused = torch.quantization.fuse_modules(model, [['bn', 'relu']], inplace=False) - print(model_fused) prepare_save(model_fused, True) + print(model) + print(model_fused) + + + -The outputs of the original model and its fused version will be: +The graphs of the original model and its fused version will be printed as follows: :: diff --git a/recipes_source/model_preparation_android.rst b/recipes_source/model_preparation_android.rst index a2a7838ea03..55ef7d9735c 100644 --- a/recipes_source/model_preparation_android.rst +++ b/recipes_source/model_preparation_android.rst @@ -50,6 +50,9 @@ or torchscript_model = torch.jit.script(model_quantized) +.. warning:: + The `trace` method only scripts the code path executed during the trace, so it will not work properly for models that include decision branches. See the `Script and Optimize for Mobile Recipe `_ for more details. + Then optimize the TorchScript formatted model for mobile and save it: :: diff --git a/recipes_source/model_preparation_ios.rst b/recipes_source/model_preparation_ios.rst index 759237d1f9a..4e01e0d48bd 100644 --- a/recipes_source/model_preparation_ios.rst +++ b/recipes_source/model_preparation_ios.rst @@ -49,6 +49,10 @@ or torchscript_model = torch.jit.script(model_quantized) +.. warning:: + The `trace` method only scripts the code path executed during the trace, so it will not work properly for models that include decision branches. See the `Script and Optimize for Mobile Recipe `_ for more details. + + Then optimize the TorchScript formatted model for mobile and save it: :: diff --git a/recipes_source/quantization.rst b/recipes_source/quantization.rst index 7ab7a1b8736..d3a7b12c85a 100644 --- a/recipes_source/quantization.rst +++ b/recipes_source/quantization.rst @@ -1,7 +1,7 @@ Quantization Recipe ===================================== -This recipe demonstrates how to quantize a PyTorch model so it can be used on iOS and Android apps, or in other production . Follow the steps below on how to use four different methods to quantize different models. +This recipe demonstrates how to quantize a PyTorch model so it can run with reduced size and faster inference speed with about the same accuracy as the original model. Quantization can be applied to both server and mobile model deployment, but it can be especially important or even critical on mobile, because a non-quantized model's size may exceed the limit that an iOS or Android app allows for, cause the deployment or OTA update to take too much time, and make the inference too slow for a good user experience. Introduction ------------ @@ -10,6 +10,9 @@ Quantization is a technique that converts 32-bit floating numbers in the model p There are overall three approaches or workflows to quantize a model: post training dynamic quantization, post training static quantization, and quantization aware training. But if the model you want to use already has a quantized version, you can use it directly without going through any of the three workflows above. For example, the `torchvision` library already includes quantized versions for models MobileNet v2, ResNet 18, ResNet 50, Inception v3, GoogleNet, among others. So we will make the last approach another workflow, albeit a simple one. +.. note:: + The quantization support is available for a limited set of operators. See `this `_ for more information. + Pre-requisites ----------------- @@ -78,7 +81,7 @@ The full documentation of the `quantize_dynamic` API call is `here `_. A dedicated static quantization tutorial is `here `_. + +.. note:: + To make the model run on mobile devices which normally have arm architecture, you need to use `qnnpack` for `backend`; to run the model on computer with x86 architecture, use `fbgemm`. 4. Quantization Aware Training ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -To apply quantization aware training, which inserts fake quantization to all the weights and activations during the model training process, use the following code snippet: +Quantization aware training inserts fake quantization to all the weights and activations during the model training process and results in higher inference accuracy than the post-training quantization methods. It is typically used in CNN models. + +To enable a model for quantization aware traing, define in the `__init__` method of the model definition a `QuantStub` and a `DeQuantStub` to convert tensors from floating point to quantized type and vice versa: + +:: + + self.quant = torch.quantization.QuantStub() + self.dequant = torch.quantization.DeQuantStub() + +Then in the beginning and the end of the `forward` method of the model definition, call `x = self.quant(x)` and `x = self.dequant(x)`. + +To do a quantization aware training, use the following code snippet: :: @@ -104,9 +121,11 @@ To apply quantization aware training, which inserts fake quantization to all the # quantization aware training goes here model_qat = torch.quantization.convert(model_qat.eval(), inplace=False) -After a quantized model is generated using one of the steps above, before the model can be used to run on mobile devices, it needs to be further converted to the `TorchScript` format and then optimized for mobile apps. See the `Script and Optimize for Mobile recipe `_ for details. +For more detailed examples of the quantization aware training, see `here `_ and `here `_. -For a complete example of the quantization aware training, read this `tutorial `_. +A pre-trained quantized model can also be used for quantized aware transfer learning, using the same `quant` and `dequant` calls shown above. See `here `_ for a complete example. + +After a quantized model is generated using one of the steps above, before the model can be used to run on mobile devices, it needs to be further converted to the `TorchScript` format and then optimized for mobile apps. See the `Script and Optimize for Mobile recipe `_ for details. Learn More ----------------- diff --git a/recipes_source/script_optimized.rst b/recipes_source/script_optimized.rst index 7e0f284241f..6dfb8e18001 100644 --- a/recipes_source/script_optimized.rst +++ b/recipes_source/script_optimized.rst @@ -197,4 +197,18 @@ The optimized model can then be saved and deployed in mobile apps: optimized_torchscript_model.save("optimized_torchscript_model.pth") -For more details on what `optimize_for_mobile` does behind the scene, see `here `_. +By default, `optimize_for_mobile` will perform the following types of optimizations: + +* Conv2D and BatchNorm fusion which folds Conv2d-BatchNorm2d into Conv2d; + +* Insert and fold prepacked ops which rewrites the model graph to replace 2D convolutions and linear ops with their prepacked counterparts. + +* ReLU and hardtanh fusion which rewrites graph by finding ReLU/hardtanh ops and fuses them together. + +* Dropout removal which removes dropout nodes from this module when training is false. + + +Learn More +----------------- +1. The official `TorchScript Language Reference `_. +2. The `torch.utils.mobile_optimizer` `API documentation `_.