-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Add NNAPI tutorial #1229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NNAPI tutorial #1229
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,185 @@ | ||
(Prototype) Convert MobileNetV2 to NNAPI | ||
======================================== | ||
|
||
Introduction | ||
------------ | ||
|
||
This tutorial shows how to prepare a computer vision model to use | ||
`Android's Neural Networks API (NNAPI) <https://developer.android.com/ndk/guides/neuralnetworks>`_. | ||
NNAPI provides access to powerful and efficient computational cores | ||
on many modern Android devices. | ||
|
||
PyTorch's NNAPI is currently in the "prototype" phase and only supports | ||
a limited range of operators, but we expect to solidify the integration | ||
and expand our operator support over time. | ||
|
||
|
||
Environment | ||
----------- | ||
|
||
Install PyTorch and torchvision. | ||
This tutorial is currently incompatible with the latest trunk, | ||
so we recommend running | ||
``pip install --upgrade --pre --find-links https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html torch==1.8.0.dev20201106+cpu torchvision==0.9.0.dev20201107+cpu`` | ||
until this incompatibility is corrected. | ||
|
||
|
||
Model Preparation | ||
----------------- | ||
|
||
First, we must prepare our model to execute with NNAPI. | ||
This step runs on your training server or laptop. | ||
The key conversion function to call is | ||
``torch.backends._nnapi.prepare.convert_model_to_nnapi``, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a link to the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No. This still needs to be written. |
||
but some extra steps are required to ensure that | ||
the model is properly structured. | ||
Most notably, quantizing the model is required | ||
in order to run the model on certain accelerators. | ||
|
||
You can copy/paste this entire Python script and run it, | ||
or make your own modifications. | ||
By default, it will save the models to ``~/mobilenetv2-nnapi/``. | ||
Please create that directory first. | ||
|
||
.. code:: python | ||
|
||
#!/usr/bin/env python | ||
import sys | ||
import os | ||
import torch | ||
import torch.utils.bundled_inputs | ||
import torch.utils.mobile_optimizer | ||
import torch.backends._nnapi.prepare | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Which PyTorch and torchvision versions are required to run this? I built and installed PyTorch from the master branch on Oct 22, got an error "ModuleNotFoundError: No module named 'torch.backends._nnapi'". Then I built from the latest pytorch master branch, and got a new error when running the script: Exception: Unsupported node kind ('aten::size') in node %13 : int = aten::size(%input.88, %32) # /Users/jeffxtang/opt/anaconda3/lib/python3.8/site-packages/torchvision/models/mobilenet.py:166:0 I then installed torchvision from the latest source but the script still picks the older 0.7... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated instructions to reference the specific known-working version. |
||
import torchvision.models.quantization.mobilenet | ||
from pathlib import Path | ||
|
||
|
||
# This script supports 3 modes of quantization: | ||
# - "none": Fully floating-point model. | ||
# - "core": Quantize the core of the model, but wrap it a | ||
# quantizer/dequantizer pair, so the interface uses floating point. | ||
# - "full": Quantize the model, and use quantized tensors | ||
# for input and output. | ||
# | ||
# "none" maintains maximum accuracy | ||
# "core" sacrifices some accuracy for performance, | ||
# but maintains the same interface. | ||
# "full" maximized performance (with the same accuracy as "core"), | ||
# but requires the application to use quantized tensors. | ||
# | ||
# There is a fourth option, not supported by this script, | ||
# where we include the quant/dequant steps as NNAPI operators. | ||
def make_mobilenetv2_nnapi(output_dir_path, quantize_mode): | ||
quantize_core, quantize_iface = { | ||
"none": (False, False), | ||
"core": (True, False), | ||
"full": (True, True), | ||
}[quantize_mode] | ||
|
||
model = torchvision.models.quantization.mobilenet.mobilenet_v2(pretrained=True, quantize=quantize_core) | ||
model.eval() | ||
|
||
# Fuse BatchNorm operators in the floating point model. | ||
# (Quantized models already have this done.) | ||
# Remove dropout for this inference-only use case. | ||
if not quantize_core: | ||
model.fuse_model() | ||
assert type(model.classifier[0]) == torch.nn.Dropout | ||
model.classifier[0] = torch.nn.Identity() | ||
|
||
input_float = torch.zeros(1, 3, 224, 224) | ||
input_tensor = input_float | ||
|
||
# If we're doing a quantized model, we need to trace only the quantized core. | ||
# So capture the quantizer and dequantizer, use them to prepare the input, | ||
# and replace them with identity modules so we can trace without them. | ||
if quantize_core: | ||
quantizer = model.quant | ||
dequantizer = model.dequant | ||
model.quant = torch.nn.Identity() | ||
model.dequant = torch.nn.Identity() | ||
input_tensor = quantizer(input_float) | ||
|
||
# Many NNAPI backends prefer NHWC tensors, so convert our input to channels_last, | ||
# and set the "nnapi_nhwc" attribute for the converter. | ||
input_tensor = input_tensor.contiguous(memory_format=torch.channels_last) | ||
input_tensor.nnapi_nhwc = True | ||
|
||
# Trace the model. NNAPI conversion only works with TorchScript models, | ||
# and traced models are more likely to convert successfully than scripted. | ||
with torch.no_grad(): | ||
traced = torch.jit.trace(model, input_tensor) | ||
nnapi_model = torch.backends._nnapi.prepare.convert_model_to_nnapi(traced, input_tensor) | ||
|
||
# If we're not using a quantized interface, wrap a quant/dequant around the core. | ||
if quantize_core and not quantize_iface: | ||
nnapi_model = torch.nn.Sequential(quantizer, nnapi_model, dequantizer) | ||
model.quant = quantizer | ||
model.dequant = dequantizer | ||
# Switch back to float input for benchmarking. | ||
input_tensor = input_float.contiguous(memory_format=torch.channels_last) | ||
|
||
# Optimize the CPU model to make CPU-vs-NNAPI benchmarks fair. | ||
model = torch.utils.mobile_optimizer.optimize_for_mobile(torch.jit.script(model)) | ||
|
||
# Bundle sample inputs with the models for easier benchmarking. | ||
# This step is optional. | ||
class BundleWrapper(torch.nn.Module): | ||
def __init__(self, mod): | ||
super().__init__() | ||
self.mod = mod | ||
def forward(self, arg): | ||
return self.mod(arg) | ||
nnapi_model = torch.jit.script(BundleWrapper(nnapi_model)) | ||
torch.utils.bundled_inputs.augment_model_with_bundled_inputs( | ||
model, [(torch.utils.bundled_inputs.bundle_large_tensor(input_tensor),)]) | ||
torch.utils.bundled_inputs.augment_model_with_bundled_inputs( | ||
nnapi_model, [(torch.utils.bundled_inputs.bundle_large_tensor(input_tensor),)]) | ||
|
||
# Save both models. | ||
model.save(output_dir_path / ("mobilenetv2-quant_{}-cpu.pt".format(quantize_mode))) | ||
nnapi_model.save(output_dir_path / ("mobilenetv2-quant_{}-nnapi.pt".format(quantize_mode))) | ||
|
||
|
||
if __name__ == "__main__": | ||
for quantize_mode in ["none", "core", "full"]: | ||
make_mobilenetv2_nnapi(Path(os.environ["HOME"]) / "mobilenetv2-nnapi", quantize_mode) | ||
|
||
|
||
Running Benchmarks | ||
------------------ | ||
|
||
Now that the models are ready, we can benchmark them on our Android devices. | ||
See `our performance recipe <https://pytorch.org/tutorials/recipes/mobile_perf.html#android-benchmarking-setup>`_ for details. | ||
The best-performing models are likely to be the "fully-quantized" models: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it'd be great to show benchmark data here.. maybe in a future update. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah. I'll add this after the blog post goes live. |
||
``mobilenetv2-quant_full-cpu.pt`` and ``mobilenetv2-quant_full-nnapi.pt``. | ||
|
||
Because these models have bundled inputs, we can run the benchmark as follows: | ||
|
||
.. code:: shell | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The code below doesn't show in the preview, maybe due to some format issue. code => code-block? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Corrected. |
||
|
||
./speed_benchmark_torch --pthreadpool_size=1 --model=mobilenetv2-quant_full-nnapi.pt --use_bundled_input=0 --warmup=5 --iter=200 | ||
|
||
Adjusting increasing the thread pool size can can reduce latency, | ||
at the cost of increased CPU usage. | ||
Omitting that argument will use one thread per big core. | ||
The CPU models can get improved performance (at the cost of memory usage) | ||
by passing ``--use_caching_allocator=true``. | ||
|
||
|
||
Integration | ||
----------- | ||
|
||
The converted models are ordinary TorchScript models. | ||
You can use them in your app just like any other PyTorch model. | ||
See `https://pytorch.org/mobile/android/ <https://pytorch.org/mobile/android/>`_ | ||
for an introduction to using PyTorch on Android. | ||
|
||
|
||
Learn More | ||
---------- | ||
|
||
- Learn more about optimization in our | ||
`Mobile Performance Recipe <https://pytorch.org/tutorials/recipes/mobile_perf.html>`_ | ||
- `MobileNetV2 <https://pytorch.org/hub/pytorch_vision_mobilenet_v2/>`_ from torchvision | ||
- Information about `NNAPI <https://developer.android.com/ndk/guides/neuralnetworks>`_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should a developer who wants to try this prototype feature go to the PyTorch master branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating to reference the specific version that is known to work.