Skip to content

Commit 84bf3e3

Browse files
brianjoxta0jeffxtangpatmellonvincentqb
authored
1.7 release (#1206)
* [iOS][GPU] Add iOS GPU workflow (#1200) * pt mobile script and optimize recipe (#1193) * pt mobile script and optimize recipe * 1 pt mobile new recipes summary and 5 recipes * updated recipes_index.rst * thumbnail png fix for ios recipe in recipes_index.rst * edits based on feedback * Updating 1.7 branch (#1205) * Update event tracking (#1188) * Update beginner_source/audio_preprocessing_tutorial.py (#1199) * Typo in beginner_source/audio_preprocessing_tutorial.py Typo in beginner_source/audio_preprocessing_tutorial.py fron > from * update title. * fix file access. Co-authored-by: JuHyuk Park <creduo@gmail.com> * Update audio_preprocessing_tutorial.py (#1202) Adds a comment for running this tutorial in Google Colab. Co-authored-by: Pat Mellon <16585245+patmellon@users.noreply.github.com> Co-authored-by: Vincent QB <vincentqb@users.noreply.github.com> Co-authored-by: JuHyuk Park <creduo@gmail.com> Co-authored-by: Tao Xu <taox@fb.com> Co-authored-by: Jeff Tang <jeffxtang@fb.com> Co-authored-by: Pat Mellon <16585245+patmellon@users.noreply.github.com> Co-authored-by: Vincent QB <vincentqb@users.noreply.github.com> Co-authored-by: JuHyuk Park <creduo@gmail.com>
1 parent 41f58c1 commit 84bf3e3

File tree

8 files changed

+869
-1
lines changed

8 files changed

+869
-1
lines changed

prototype_source/ios_gpu_workflow.rst

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
(Prototype) Use iOS GPU in PyTorch
2+
==================================
3+
4+
**Author**: `Tao Xu <https://github.com/xta0>`_
5+
6+
Introduction
7+
------------
8+
9+
This tutorial introduces the steps to run your models on iOS GPU. We'll be using the mobilenetv2 model as an example. Since the mobile GPU features are currently in the prototype stage, you'll need to build a custom pytorch binary from source. For the time being, only a limited number of operators are supported, and certain client side APIs are subject to change in the future versions.
10+
11+
Model Preparation
12+
-------------------
13+
14+
Since GPUs consume weights in a different order, the first step we need to do is to convert our TorchScript model to a GPU compatible model. This step is also known as "prepacking". To do that, we'll build a custom pytorch binary from source that includes the Metal backend. Go ahead checkout the pytorch source code from github and run the command below
15+
16+
.. code:: shell
17+
18+
cd PYTORCH_ROOT
19+
USE_PYTORCH_METAL=ON python setup.py install --cmake
20+
21+
The command above will build a custom pytorch binary from master. The ``install`` argument simply tells ``setup.py`` to override the existing PyTorch on your desktop. Once the build finished, open another terminal to check the PyTorch version to see if the installation was successful. As the time of writing of this recipe, the version is ``1.8.0a0+41237a4``. You might be seeing different numbers depending on when you check out the code from master, but it should be greater than 1.7.0.
22+
23+
.. code:: python
24+
25+
import torch
26+
torch.__version__ #1.8.0a0+41237a4
27+
28+
29+
The next step is going to be converting the mobilenetv2 torchscript model to a Metal compatible model. We'll be leveraging the ``optimize_for_mobile`` API from the ``torch.utils`` module. As shown below
30+
31+
.. code:: python
32+
33+
import torch
34+
import torchvision
35+
from torch.utils.mobile_optimizer import optimize_for_mobile
36+
37+
model = torchvision.models.mobilenet_v2(pretrained=True)
38+
scripted_model = torch.jit.script(model)
39+
optimized_model = optimize_for_mobile(scripted_model, backend='metal')
40+
print(torch.jit.export_opnames(optimized_model))
41+
torch.jit.save(optimized_model, './mobilenetv2_metal.pt')
42+
43+
Note that the ``torch.jit.export_opnames(optimized_model)`` is going to dump all the optimized operators from the ``optimized_mobile``. If everything works well, you should be able to see the following ops being printed out from the console
44+
45+
46+
.. code:: shell
47+
48+
['aten::adaptive_avg_pool2d',
49+
'aten::add.Tensor',
50+
'aten::addmm',
51+
'aten::reshape',
52+
'aten::size.int',
53+
'metal::copy_to_host',
54+
'metal_prepack::conv2d_run']
55+
56+
Those are all the ops we need to run the mobilenetv2 model on iOS GPU. Cool! Now that you have the ``mobilenetv2_metal.pt`` saved on your disk, let's move on to the iOS part.
57+
58+
59+
Use C++ APIs
60+
---------------------
61+
62+
In this section, we'll be using the `HelloWorld example <https://github.com/pytorch/ios-demo-app>`_ to demonstrate how to use the C++ APIs. The first thing we need to do is to build a custom LibTorch from Source. Make sure you have deleted the **build** folder from the previous step in PyTorch root directory. Then run the command below
63+
64+
.. code:: shell
65+
66+
IOS_ARCH=arm64 USE_PYTORCH_METAL=1 ./scripts/build_ios.sh
67+
68+
Note ``IOS_ARCH`` tells the script to build a arm64 version of Libtorch. This is because in PyTorch, Metal is only available for the iOS devices that support the Apple A9 chip or above. Once the build finished, follow the `Build PyTorch iOS libraries from source <https://pytorch.org/mobile/ios/#build-pytorch-ios-libraries-from-source>`_ section from the iOS tutorial to setup the XCode settings properly. Don't forget to copy the `./mobilenetv2_metal.pt` to your XCode project.
69+
70+
Next we need to make some changes in ``TorchModule.mm``
71+
72+
.. code:: objective-c
73+
74+
- (NSArray<NSNumber*>*)predictImage:(void*)imageBuffer {
75+
torch::jit::GraphOptimizerEnabledGuard opguard(false);
76+
at::Tensor tensor = torch::from_blob(imageBuffer, {1, 3, 224, 224}, at::kFloat).metal();
77+
auto outputTensor = _impl.forward({tensor}).toTensor().cpu();
78+
...
79+
return nil;
80+
}
81+
82+
As you can see, we simply just call ``.metal()`` to move our input tensor from CPU to GPU, and then call ``.cpu()`` to move the result back. Internally, ``.metal()`` will copy the input data from the CPU buffer to a GPU buffer with a GPU compatible memory format. When `.cpu()` is invoked, the GPU command buffer will be flushed and synced. After `forward` finished, the final result will then be copied back from the GPU buffer back to a CPU buffer.
83+
84+
The last step we have to do is to add the `Accelerate.framework` and the `MetalShaderPerformance.framework` to your xcode project.
85+
86+
If everything works fine, you should be able to see the inference results on your phone. The result below was captured from an iPhone11 device
87+
88+
.. code:: shell
89+
90+
- timber wolf, grey wolf, gray wolf, Canis lupus
91+
- malamute, malemute, Alaskan malamute
92+
- Eskimo dog, husky
93+
94+
You may notice that the results are slighly different from the `results <https://pytorch.org/mobile/ios/#install-libtorch-via-cocoapods>`_ we got from the CPU model as shown in the iOS tutorial. This is because by default Metal uses fp16 rather than fp32 to compute. The precision loss is expected.
95+
96+
97+
Conclusion
98+
----------
99+
100+
In this tutorial, we demonstrated how to convert a mobilenetv2 model to a GPU compatible model. We walked through a HelloWorld example to show how to use the C++ APIs to run models on iOS GPU. Please be aware of that GPU feature is still under development, new operators will continue to be added. APIs are subject to change in the future versions.
101+
102+
Thanks for reading! As always, we welcome any feedback, so please create an issue `here <https://github.com/pytorch/pytorch/issues>`_ if you have any.
103+
104+
Learn More
105+
----------
106+
107+
- The `Mobilenetv2 <https://pytorch.org/hub/pytorch_vision_mobilenet_v2/>`_ from Torchvision
108+
- To learn more about how to use ``optimize_for_mobile``, please refer to the `Mobile Perf Recipe <https://pytorch.org/tutorials/recipes/mobile_perf.html>`_

recipes_source/fuse.rst

Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
Fuse Modules Recipe
2+
=====================================
3+
4+
This recipe demonstrates how to fuse a list of PyTorch modules into a single module and how to do the performance test to compare the fused model with its non-fused version.
5+
6+
Introduction
7+
------------
8+
9+
Before quantization is applied to a model to reduce its size and memory footprint (see `Quantization Recipe <quantization.html>`_ for details on quantization), the list of modules in the model may be fused first into a single module. Fusion is optional, but it may save on memory access, make the model run faster, and improve its accuracy.
10+
11+
12+
Pre-requisites
13+
--------------
14+
15+
PyTorch 1.6.0 or 1.7.0
16+
17+
Steps
18+
--------------
19+
20+
Follow the steps below to fuse an example model, quantize it, script it, optimize it for mobile, save it and test it with the Android benchmark tool.
21+
22+
1. Define the Example Model
23+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
24+
25+
Use the same example model defined in the `PyTorch Mobile Performance Recipes <https://pytorch.org/tutorials/recipes/mobile_perf.html>`_:
26+
27+
::
28+
29+
import torch
30+
from torch.utils.mobile_optimizer import optimize_for_mobile
31+
32+
class AnnotatedConvBnReLUModel(torch.nn.Module):
33+
def __init__(self):
34+
super(AnnotatedConvBnReLUModel, self).__init__()
35+
self.conv = torch.nn.Conv2d(3, 5, 3, bias=False).to(dtype=torch.float)
36+
self.bn = torch.nn.BatchNorm2d(5).to(dtype=torch.float)
37+
self.relu = torch.nn.ReLU(inplace=True)
38+
self.quant = torch.quantization.QuantStub()
39+
self.dequant = torch.quantization.DeQuantStub()
40+
41+
def forward(self, x):
42+
x.contiguous(memory_format=torch.channels_last)
43+
x = self.quant(x)
44+
x = self.conv(x)
45+
x = self.bn(x)
46+
x = self.relu(x)
47+
x = self.dequant(x)
48+
return x
49+
50+
51+
2. Generate Two Models with and without `fuse_modules`
52+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
53+
54+
Add the following code below the model definition above and run the script:
55+
56+
::
57+
58+
model = AnnotatedConvBnReLUModel()
59+
60+
def prepare_save(model, fused):
61+
model.qconfig = torch.quantization.get_default_qconfig('qnnpack')
62+
torch.quantization.prepare(model, inplace=True)
63+
torch.quantization.convert(model, inplace=True)
64+
torchscript_model = torch.jit.script(model)
65+
torchscript_model_optimized = optimize_for_mobile(torchscript_model)
66+
torch.jit.save(torchscript_model_optimized, "model.pt" if not fused else "model_fused.pt")
67+
68+
prepare_save(model, False)
69+
model_fused = torch.quantization.fuse_modules(model, [['bn', 'relu']], inplace=False)
70+
prepare_save(model_fused, True)
71+
72+
print(model)
73+
print(model_fused)
74+
75+
76+
77+
78+
The graphs of the original model and its fused version will be printed as follows:
79+
80+
::
81+
82+
AnnotatedConvBnReLUModel(
83+
(conv): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1), bias=False)
84+
(bn): BatchNorm2d(5, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
85+
(relu): ReLU(inplace=True)
86+
(quant): QuantStub()
87+
(dequant): DeQuantStub()
88+
)
89+
90+
AnnotatedConvBnReLUModel(
91+
(conv): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1), bias=False)
92+
(bn): BNReLU2d(
93+
(0): BatchNorm2d(5, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
94+
(1): ReLU(inplace=True)
95+
)
96+
(relu): Identity()
97+
(quant): QuantStub()
98+
(dequant): DeQuantStub()
99+
)
100+
101+
In the second fused model output, the first item `bn` in the list is replaced with the fused module, and the rest of the modules (`relu` in this example) is replaced with identity. In addition, the non-fused and fused versions of the model `model.pt` and `model_fused.pt` are generated.
102+
103+
3. Build the Android benchmark Tool
104+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
105+
106+
Get the PyTorch source and build the Android benchmark tool as follows:
107+
108+
::
109+
110+
git clone --recursive https://github.com/pytorch/pytorch
111+
cd pytorch
112+
git submodule update --init --recursive
113+
BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=arm64-v8a ./scripts/build_android.sh -DBUILD_BINARY=ON
114+
115+
116+
This will generate the Android benchmark binary `speed_benchmark_torch` in the `build_android/bin` folder.
117+
118+
4. Test Compare the Fused and Non-Fused Models
119+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
120+
121+
Connect your Android device, then copy `speed_benchmark_torch` and the model files and run the benchmark tool on them:
122+
123+
::
124+
125+
adb push build_android/bin/speed_benchmark_torch /data/local/tmp
126+
adb push model.pt /data/local/tmp
127+
adb push model_fused.pt /data/local/tmp
128+
adb shell "/data/local/tmp/speed_benchmark_torch --model=/data/local/tmp/model.pt" --input_dims="1,3,224,224" --input_type="float"
129+
adb shell "/data/local/tmp/speed_benchmark_torch --model=/data/local/tmp/model_fused.pt" --input_dims="1,3,224,224" --input_type="float"
130+
131+
132+
The results from the last two commands should be like:
133+
134+
::
135+
136+
Main run finished. Microseconds per iter: 6189.07. Iters per second: 161.575
137+
138+
and
139+
140+
::
141+
142+
Main run finished. Microseconds per iter: 6216.65. Iters per second: 160.858
143+
144+
For this example model, there is no much performance difference between the fused and non-fused models. But the similar steps can be used to fuse and prepare a real deep model and test to see the performance improvement. Keep in mind that currently `torch.quantization.fuse_modules` only fuses the following sequence of modules:
145+
146+
* conv, bn
147+
* conv, bn, relu
148+
* conv, relu
149+
* linear, relu
150+
* bn, relu
151+
152+
If any other sequence list is provided to the `fuse_modules` call, it will simply be ignored.
153+
154+
Learn More
155+
---------------
156+
157+
See `here <https://pytorch.org/docs/stable/quantization.html#preparing-model-for-quantization>`_ for the official documentation of `torch.quantization.fuse_modules`.
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
Model Preparation for Android Recipe
2+
=====================================
3+
4+
This recipe demonstrates how to prepare a PyTorch MobileNet v2 image classification model for Android apps, and how to set up Android projects to use the mobile-ready model file.
5+
6+
Introduction
7+
-----------------
8+
9+
After a PyTorch model is trained or a pre-trained model is made available, it is normally not ready to be used in mobile apps yet. It needs to be quantized (see the `Quantization Recipe <quantization.html>`_), converted to TorchScript so Android apps can load it, and optimized for mobile apps. Furthermore, Android apps need to be set up correctly to enable the use of PyTorch Mobile libraries, before they can load and use the model for inference.
10+
11+
Pre-requisites
12+
-----------------
13+
14+
PyTorch 1.6.0 or 1.7.0
15+
16+
torchvision 0.6.0 or 0.7.0
17+
18+
Android Studio 3.5.1 or above with NDK installed
19+
20+
Steps
21+
-----------------
22+
23+
1. Get Pretrained and Quantized MobileNet v2 Model
24+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
25+
26+
To get the MobileNet v2 quantized model, simply do:
27+
28+
::
29+
30+
import torchvision
31+
32+
model_quantized = torchvision.models.quantization.mobilenet_v2(pretrained=True, quantize=True)
33+
34+
2. Script and Optimize the Model for Mobile Apps
35+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
36+
37+
Use either the `script` or `trace` method to convert the quantized model to the TorchScript format:
38+
39+
::
40+
41+
import torch
42+
43+
dummy_input = torch.rand(1, 3, 224, 224)
44+
torchscript_model = torch.jit.trace(model_quantized, dummy_input)
45+
46+
or
47+
48+
::
49+
50+
torchscript_model = torch.jit.script(model_quantized)
51+
52+
53+
.. warning::
54+
The `trace` method only scripts the code path executed during the trace, so it will not work properly for models that include decision branches. See the `Script and Optimize for Mobile Recipe <script_optimized.html>`_ for more details.
55+
56+
Then optimize the TorchScript formatted model for mobile and save it:
57+
58+
::
59+
60+
from torch.utils.mobile_optimizer import optimize_for_mobile
61+
torchscript_model_optimized = optimize_for_mobile(torchscript_model)
62+
torch.jit.save(torchscript_model_optimized, "mobilenetv2_quantized.pt")
63+
64+
With the total 7 or 8 (depending on if the `script` or `trace` method is called to get the TorchScript format of the model) lines of code in the two steps above, we have a model ready to be added to mobile apps.
65+
66+
3. Add the Model and PyTorch Library on Android
67+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
68+
69+
* In your current or a new Android Studio project, open the build.gradle file, and add the following two lines (the second one is required only if you plan to use a TorchVision model):
70+
71+
::
72+
73+
implementation 'org.pytorch:pytorch_android:1.6.0'
74+
implementation 'org.pytorch:pytorch_android_torchvision:1.6.0'
75+
76+
* Drag and drop the model file `mobilenetv2_quantized.pt` to your project's assets folder.
77+
78+
That's it! Now you can build your Android app with the PyTorch library and the model ready to use. To actually write code to use the model, refer to the PyTorch Mobile `Android Quickstart with a HelloWorld Example <https://pytorch.org/mobile/android/#quickstart-with-a-helloworld-example>`_ and `Android Hackathon Example <https://github.com/pytorch/workshops/tree/master/PTMobileWalkthruAndroid>`_.
79+
80+
Learn More
81+
-----------------
82+
83+
1. `PyTorch Mobile site <https://pytorch.org/mobile>`_
84+
85+
2. `Introduction to TorchScript <https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html>`_

0 commit comments

Comments
 (0)