|
| 1 | +--- |
| 2 | +layout: blog_detail |
| 3 | +title: 'Running PyTorch Models on Jetson Nano' |
| 4 | +author: Jeff Tang, Hamid Shojanazeri, Geeta Chauhan |
| 5 | +featured-img: 'assets/images/pytorch-logo.jpg' |
| 6 | +--- |
| 7 | + |
| 8 | +### Overview |
| 9 | +Nvidia [Jetson Nano](https://developer.nvidia.com/embedded/jetson-nano-developer-kit), part of the [Jetson family of products](https://developer.nvidia.com/embedded/jetson-modules) or Jetson modules, is a small yet powerful Linux (Ubuntu) based embedded computer with 2/4GB GPU. With it, you can run many PyTorch models efficiently. This document summarizes our experience of running different deep learning models using 3 different mechanisms on Jetson Nano: |
| 10 | + |
| 11 | + 1. Jetson Inference the higher-level Nvidia API that has built-in support for running most common computer vision models which can be transfer-learned with PyTorch on the Jetson platform. |
| 12 | + |
| 13 | + 2. TensorRT a high-performance inference framework from Nvidia that requires the conversion of a PyTorch model to ONNX, and then to the TensorRT engine file that the TensorRT runtime can run. |
| 14 | + |
| 15 | + 3. PyTorch with the direct PyTorch API `torch.nn` for inference. |
| 16 | + |
| 17 | +### Setting up Jetson Nano |
| 18 | +After purchasing a Jetson Nano [here](https://developer.nvidia.com/buy-jetson?product=jetson_nano&location=US), simply follow the clear step-by-step [instructions](https://developer.nvidia.com/embedded/learn/get-started-jetson-nano-devkit) to download and write the Jetson Nano Developer Kit SD Card Image to a microSD card, and complete the setup. After the setup is done and the Nano is booted, you’ll see the standard Linux prompt along with the username and the Nano name used in the setup. |
| 19 | + |
| 20 | +To check the GPU status on Nano, run the following commands: |
| 21 | + |
| 22 | +``` |
| 23 | +sudo pip3 install jetson-stats |
| 24 | +sudo jtop |
| 25 | +``` |
| 26 | + |
| 27 | +You’ll see information, including: |
| 28 | + |
| 29 | +<div class="text-center"> |
| 30 | + <img src="{{ site.baseurl }}/assets/images/blog-2022-3-10-setting-up-jetson-nano.png" width="60%"> |
| 31 | +</div> |
| 32 | + |
| 33 | +You can also see the installed CUDA version: |
| 34 | + |
| 35 | +``` |
| 36 | +$ ls -lt /usr/local |
| 37 | +lrwxrwxrwx 1 root root 22 Aug 2 01:47 cuda -> /etc/alternatives/cuda |
| 38 | +lrwxrwxrwx 1 root root 25 Aug 2 01:47 cuda-10 -> /etc/alternatives/cuda-10 |
| 39 | +drwxr-xr-x 12 root root 4096 Aug 2 01:47 cuda-10.2 |
| 40 | +``` |
| 41 | + |
| 42 | +To use a camera on Jetson Nano, for example, Arducam 8MP IMX219, follow the instructions [here](https://www.arducam.com/docs/camera-for-jetson-nano/mipi-camera-modules-for-jetson-nano/driver-installation/) or run the commands below after [installing a camera module](https://developer.nvidia.com/embedded/learn/jetson-nano-2gb-devkit-user-guide#id-.JetsonNano2GBDeveloperKitUserGuidevbatuu_v1.0-Camera): |
| 43 | + |
| 44 | +``` |
| 45 | +cd ~ |
| 46 | +wget https://github.com/ArduCAM/MIPI_Camera/releases/download/v0.0.3/install_full.sh |
| 47 | +chmod +x install_full.sh |
| 48 | +./install_full.sh -m arducam |
| 49 | +``` |
| 50 | + |
| 51 | +Another way to do this is to use the original Jetson Nano camera driver: |
| 52 | + |
| 53 | +``` |
| 54 | +sudo dpkg -r arducam-nvidia-l4t-kernel |
| 55 | +sudo shutdown -r now |
| 56 | +``` |
| 57 | + |
| 58 | +Then, use ls /dev/video0 to confirm the camera is found: |
| 59 | + |
| 60 | +``` |
| 61 | +$ ls /dev/video0 |
| 62 | +/dev/video0 |
| 63 | +``` |
| 64 | + |
| 65 | +And finally, the following command to see the camera in action: |
| 66 | + |
| 67 | +``` |
| 68 | +nvgstcapture-1.0 --orientation=2 |
| 69 | +``` |
| 70 | + |
| 71 | +### Using Jetson Inference |
| 72 | +Nvidia [Jetson Inference](https://github.com/dusty-nv/jetson-inference) API offers the easiest way to run image recognition, object detection, semantic segmentation, and pose estimation models on Jetson Nano. Jetson Inference has TensorRT built-in, so it’s very fast. |
| 73 | + |
| 74 | +To test run Jetson Inference, first clone the repo and download the models: |
| 75 | + |
| 76 | +``` |
| 77 | +git clone --recursive https://github.com/dusty-nv/jetson-inference |
| 78 | +cd jetson-inference |
| 79 | +``` |
| 80 | + |
| 81 | +Then use the pre-built [Docker Container](https://github.com/dusty-nv/jetson-inference/blob/master/docs/jetpack-setup-2.md) that already has PyTorch installed to test run the models: |
| 82 | + |
| 83 | +``` |
| 84 | +docker/run.sh --volume ~/jetson_inference:/jetson_inference |
| 85 | +``` |
| 86 | + |
| 87 | +To run image recognition, object detection, semantic segmentation, and pose estimation models on test images, use the following: |
| 88 | + |
| 89 | +``` |
| 90 | +cd build/aarch64/bin |
| 91 | +./imagenet.py images/jellyfish.jpg /jetson_inference/jellyfish.jpg |
| 92 | +./segnet.py images/dog.jpg /jetson_inference/dog.jpeg |
| 93 | +./detectnet.py images/peds_0.jpg /jetson_inference/peds_0.jpg |
| 94 | +./posenet.py images/humans_0.jpg /jetson_inference/pose_humans_0.jpg |
| 95 | +``` |
| 96 | + |
| 97 | +Four result images from running the four different models will be generated. Exit the docker image to see them: |
| 98 | + |
| 99 | +``` |
| 100 | +$ ls -lt ~/jetson_inference/ |
| 101 | +-rw-r--r-- 1 root root 68834 Oct 15 21:30 pose_humans_0.jpg |
| 102 | +-rw-r--r-- 1 root root 914058 Oct 15 21:30 peds_0.jpg |
| 103 | +-rw-r--r-- 1 root root 666239 Oct 15 21:30 dog.jpeg |
| 104 | +-rw-r--r-- 1 root root 179760 Oct 15 21:29 jellyfish.jpg |
| 105 | +``` |
| 106 | + |
| 107 | + |
| 108 | +<div style="display: flex; justify-content: space-between;"> |
| 109 | + <img src="/assets/images/blog-2022-3-10-using-jetson-interface-1.jpeg" alt="Using jest interface example 1" width="40%"> |
| 110 | + <img src="/assets/images/blog-2022-3-10-using-jetson-interface-2.jpeg" alt="Using jest interface example 2" width="60%"> |
| 111 | +</div> |
| 112 | + |
| 113 | + |
| 114 | +<div style="display: flex; justify-content: space-between;"> |
| 115 | + <img src="/assets/images/blog-2022-3-10-using-jetson-interface-3.jpeg" alt="Using jest interface example 3" width="60%"> |
| 116 | + <img src="/assets/images/blog-2022-3-10-using-jetson-interface-4.jpeg" alt="Using jest interface example 4" width="40%"> |
| 117 | +</div> |
| 118 | + |
| 119 | +You can also use the docker image to run PyTorch models because the image has PyTorch, torchvision and torchaudio installed: |
| 120 | + |
| 121 | +``` |
| 122 | +# pip list|grep torch |
| 123 | +torch (1.9.0) |
| 124 | +torchaudio (0.9.0a0+33b2469) |
| 125 | +torchvision (0.10.0a0+300a8a4) |
| 126 | +``` |
| 127 | + |
| 128 | +Although Jetson Inference includes models already converted to the TensorRT engine file format, you can fine-tune the models by following the steps in Transfer Learning with PyTorch (for Jetson Inference) [here](https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-transfer-learning.md). |
| 129 | + |
| 130 | +### Using TensorRT |
| 131 | +[TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/) is a high-performance inference framework from Nvidia. Jetson Nano supports TensorRT via the Jetpack SDK, included in the SD Card image used to set up Jetson Nano. To confirm that TensorRT is already installed in Nano, `run dpkg -l|grep -i tensorrt`: |
| 132 | + |
| 133 | + |
| 134 | +<div class="text-center"> |
| 135 | + <img src="{{ site.baseurl }}/assets/images/blog-2022-3-10-using-tensorrt.png" width="80%"> |
| 136 | +</div> |
| 137 | + |
| 138 | +Theoretically, TensorRT can be used to “take a trained PyTorch model and optimize it to run more efficiently during inference on an NVIDIA GPU.” Follow the instructions and code in the [notebook](https://github.com/NVIDIA/TensorRT/blob/master/quickstart/IntroNotebooks/4.%20Using%20PyTorch%20through%20ONNX.ipynb) to see how to use PyTorch with TensorRT through ONNX on a torchvision Resnet50 model: |
| 139 | + |
| 140 | +1. How to convert the model from PyTorch to ONNX; |
| 141 | + |
| 142 | +2. How to convert the ONNX model to a TensorRT engine file; |
| 143 | + |
| 144 | +3. How to run the engine file with the TensorRT runtime for performance improvement: inference time improved from the original 31.5ms/19.4ms (FP32/FP16 precision) to 6.28ms (TensorRT). |
| 145 | + |
| 146 | +You can replace the Resnet50 model in the notebook code with another PyTorch model, go through the conversion process above, and run the finally converted model TensorRT engine file with the TensorRT runtime to see the optimized performance. But be aware that due to the Nano GPU memory size, models larger than 100MB are likely to fail to run, with the following error information: |
| 147 | + |
| 148 | +`Error Code 1: Cuda Runtime (all CUDA-capable devices are busy or unavailable)` |
| 149 | + |
| 150 | +You may also see an error when converting a PyTorch model to ONNX model, which may be fixed by replacing: |
| 151 | + |
| 152 | +`torch.onnx.export(resnet50, dummy_input, "resnet50_pytorch.onnx", verbose=False)` |
| 153 | + |
| 154 | +with: |
| 155 | + |
| 156 | +`torch.onnx.export(model, dummy_input, "deeplabv3_pytorch.onnx", opset_version=11, verbose=False)` |
| 157 | + |
| 158 | +### Using PyTorch |
| 159 | +First, to download and install PyTorch 1.9 on Nano, run the following commands (see [here](https://forums.developer.nvidia.com/t/pytorch-for-jetson-version-1-10-now-available/72048) for more information): |
| 160 | + |
| 161 | +``` |
| 162 | +wget https://nvidia.box.com/shared/static/p57jwntv436lfrd78inwl7iml6p13fzh.whl -O torch-1.8.0-cp36-cp36m-linux_aarch64.whl -O torch-1.9.0-cp36-cp36m-linux_aarch64.whl |
| 163 | +sudo apt-get install python3-pip libopenblas-base libopenmpi-dev |
| 164 | +pip3 install Cython |
| 165 | +pip3 install numpy torch-1.9.0-cp36-cp36m-linux_aarch64.whl |
| 166 | +``` |
| 167 | + |
| 168 | +To download and install torchvision 0.10 on Nano, run the commands below: |
| 169 | + |
| 170 | +``` |
| 171 | +https://drive.google.com/uc?id=1tU6YlPjrP605j4z8PMnqwCSoP6sSC91Z |
| 172 | +pip3 install torchvision-0.10.0a0+300a8a4-cp36-cp36m-linux_aarch64.whl |
| 173 | +``` |
| 174 | + |
| 175 | +After the steps above, run this to confirm: |
| 176 | +``` |
| 177 | +$ pip3 list|grep torch |
| 178 | +torch (1.9.0) |
| 179 | +torchvision (0.10.0) |
| 180 | +``` |
| 181 | + |
| 182 | +You can also use the docker image described in the section *Using Jetson Inference* (which also has PyTorch and torchvision installed), to skip the manual steps above. |
| 183 | + |
| 184 | +The official [YOLOv5](https://github.com/ultralytics/yolov5) repo is used to run the PyTorch YOLOv5 model on Jetson Nano. After logging in to Jetson Nano, follow the steps below: |
| 185 | + |
| 186 | +* Get the repo and install what’s required: |
| 187 | + |
| 188 | +``` |
| 189 | +git clone https://github.com/ultralytics/yolov5 |
| 190 | +cd yolov5 |
| 191 | +pip install -r requirements.txt |
| 192 | +``` |
| 193 | + |
| 194 | +* Run `python3 detect.py`, which by default uses the PyTorch yolov5s.pt model. You should see something like: |
| 195 | + |
| 196 | +``` |
| 197 | +detect: weights=yolov5s.pt, source=data/images, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False |
| 198 | +YOLOv5 🚀 v5.0-499-g48b00db torch 1.9.0 CUDA:0 (NVIDIA Tegra X1, 3956.1015625MB) |
| 199 | +
|
| 200 | +Fusing layers... |
| 201 | +Model Summary: 224 layers, 7266973 parameters, 0 gradients |
| 202 | +image 1/5 /home/jeff/repos/yolov5-new/yolov5/data/images/bus.jpg: 640x480 4 persons, 1 bus, 1 fire hydrant, Done. (0.142s) |
| 203 | +... |
| 204 | +``` |
| 205 | + |
| 206 | +**The inference time on Jetson Nano GPU is about 140ms, more than twice as fast as the inference time on iOS or Android (about 330ms).** |
| 207 | + |
| 208 | +If you get an error `“ImportError: The _imagingft C module is not installed.”` then you need to reinstall pillow: |
| 209 | +``` |
| 210 | +sudo apt-get install libpng-dev |
| 211 | +sudo apt-get install libfreetype6-dev |
| 212 | +pip3 uninstall pillow |
| 213 | +pip3 install --no-cache-dir pillow |
| 214 | +``` |
| 215 | + |
| 216 | +After successfully completing the `python3 detect.py` run, the object detection results of the test images located in `data/images` will be in the `runs/detect/exp` directory. To test the detection with a live webcam instead of local images, use the `--source 0` parameter when running `python3 detect.py`): |
| 217 | + |
| 218 | +``` |
| 219 | +~/repos/yolov5$ ls -lt runs/detect/exp10 |
| 220 | +total 1456 |
| 221 | +-rw-rw-r-- 1 jeff jeff 254895 Oct 15 16:12 zidane.jpg |
| 222 | +-rw-rw-r-- 1 jeff jeff 202674 Oct 15 16:12 test3.png |
| 223 | +-rw-rw-r-- 1 jeff jeff 217117 Oct 15 16:12 test2.jpg |
| 224 | +-rw-rw-r-- 1 jeff jeff 305826 Oct 15 16:12 test1.png |
| 225 | +-rw-rw-r-- 1 jeff jeff 495760 Oct 15 16:12 bus.jpg |
| 226 | +``` |
| 227 | + |
| 228 | +Using the same test files used in the PyTorch iOS YOLOv5 demo app or Android YOLOv5 demo app, you can compare the results generated with running the YOLOv5 PyTorch model on mobile devices and Jetson Nano: |
| 229 | + |
| 230 | +<div style="display: flex"> |
| 231 | + <img src="{{ site.baseurl }}/assets/images/blog-2022-3-10-using-pytorch-1.png" alt="PyTorch YOLOv5 on Jetson Nano, example with a dog" width="35%"> |
| 232 | + <img src="{{ site.baseurl }}/assets/images/blog-2022-3-10-using-pytorch-2.jpeg" alt="PyTorch YOLOv5 on Jetson Nano, example with a horse and a rider" width="50%"> |
| 233 | +</div> |
| 234 | +Figure 1. PyTorch YOLOv5 on Jetson Nano. |
| 235 | + |
| 236 | +<div style="display: flex"> |
| 237 | + <img src="{{ site.baseurl }}/assets/images/blog-2022-3-10-using-pytorch-3.png" alt="PyTorch YOLOv5 on iOS, example with a dog" width="35%"> |
| 238 | + <img src="{{ site.baseurl }}/assets/images/blog-2022-3-10-using-pytorch-4.png" alt="PyTorch YOLOv5 on iOS, example with a horse and a rider" width="50%"> |
| 239 | +</div> |
| 240 | +Figure 2. PyTorch YOLOv5 on iOS. |
| 241 | + |
| 242 | +<div style="display: flex"> |
| 243 | + <img src="{{ site.baseurl }}/assets/images/blog-2022-3-10-using-pytorch-5.png" alt="PyTorch YOLOv5 on Android, example with a dog" width="35%"> |
| 244 | + <img src="{{ site.baseurl }}/assets/images/blog-2022-3-10-using-pytorch-6.png" alt="PyTorch YOLOv5 on Android, example with a horse and a rider" width="50%"> |
| 245 | +</div> |
| 246 | +Figure 3. PyTorch YOLOv5 on Android. |
| 247 | + |
| 248 | +### Summary |
| 249 | +Based on our experience of running different PyTorch models for potential demo apps on Jetson Nano, we see that even Jetson Nano, a lower-end of the Jetson family of products, provides a powerful GPU and embedded system that can directly run some of the latest PyTorch models, pre-trained or transfer learned, efficiently. |
| 250 | + |
| 251 | +Building PyTorch demo apps on Jetson Nano can be similar to building PyTorch apps on Linux, but you can also choose to use TensorRT after converting the PyTorch models to the TensorRT engine file format. |
| 252 | + |
| 253 | +But if you just need to run some common computer vision models on Jetson Nano using Nvidia’s Jetson Inference which supports image recognition, object detection, semantic segmentation, and pose estimation models, then this is the easiest way. |
| 254 | + |
| 255 | + |
| 256 | +### References |
| 257 | +Torch-TensorRT, a compiler for PyTorch via TensorRT: |
| 258 | +[https://github.com/NVIDIA/Torch-TensorRT/](https://github.com/NVIDIA/Torch-TensorRT/) |
| 259 | + |
| 260 | +Jetson Inference docker image details: |
| 261 | +[https://github.com/dusty-nv/jetson-inference/blob/master/docs/aux-docker.md](https://github.com/dusty-nv/jetson-inference/blob/master/docs/aux-docker.md) |
| 262 | + |
| 263 | +A guide to using TensorRT on the Nvidia Jetson Nano: |
| 264 | +[https://docs.donkeycar.com/guide/robot_sbc/tensorrt_jetson_nano/](https://docs.donkeycar.com/guide/robot_sbc/tensorrt_jetson_nano/) |
| 265 | +including: |
| 266 | + |
| 267 | +1. Use Jetson as a portable GPU device to run an NN chess engine model: |
| 268 | +[https://medium.com/@ezchess/jetson-lc0-running-leela-chess-zero-on-nvidia-jetson-a-portable-gpu-device-a213afc9c018](https://medium.com/@ezchess/jetson-lc0-running-leela-chess-zero-on-nvidia-jetson-a-portable-gpu-device-a213afc9c018) |
| 269 | + |
| 270 | +2. A MaskEraser app using PyTorch and torchvision, installed directly with pip: |
| 271 | +[https://github.com/INTEC-ATI/MaskEraser#install-pytorch](https://github.com/INTEC-ATI/MaskEraser#install-pytorch) |
| 272 | + |
| 273 | +A PyTorch to TensorRT converter: |
| 274 | +[https://github.com/NVIDIA-AI-IOT/torch2trt](https://github.com/NVIDIA-AI-IOT/torch2trt) |
0 commit comments