Skip to content

Commit 9fee938

Browse files
authored
Merge branch 'main' into zzb_fsdp_add_event_sync
2 parents 6689127 + 355f281 commit 9fee938

File tree

5 files changed

+11
-200
lines changed

5 files changed

+11
-200
lines changed

beginner_source/onnx/README.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ ONNX
33

44
1. intro_onnx.py
55
Introduction to ONNX
6-
https://pytorch.org/tutorials/onnx/intro_onnx.html
6+
https://pytorch.org/tutorials/beginner/onnx/intro_onnx.html
77

88
2. export_simple_model_to_onnx_tutorial.py
99
Exporting a PyTorch model to ONNX

en-wordlist.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -392,6 +392,8 @@ FlexAttention
392392
fp
393393
frontend
394394
functionalized
395+
functionalizes
396+
functionalization
395397
functorch
396398
fuser
397399
geomean

intermediate_source/inductor_debug_cpu.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,8 @@
1919
#
2020
# Meanwhile, you may also find related tutorials about ``torch.compile``
2121
# around `basic usage <https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html>`_,
22-
# comprehensive `troubleshooting <https://pytorch.org/docs/stable/dynamo/troubleshooting.html>`_
23-
# and GPU-specific knowledge like `GPU performance profiling <https://github.com/pytorch/pytorch/blob/main/docs/source/compile/profiling_torch_compile.rst>`_.
22+
# comprehensive `troubleshooting <https://pytorch.org/docs/stable/torch.compiler_troubleshooting.html>`_
23+
# and GPU-specific knowledge like `GPU performance profiling <https://pytorch.org/docs/stable/torch.compiler_inductor_profiling.html>`_.
2424
#
2525
# We will start debugging with a motivating example that triggers compilation issues and accuracy problems
2626
# by demonstrating the process of debugging to pinpoint the problems.
@@ -343,7 +343,7 @@ def forward2(self, arg0_1):
343343
return (neg,)
344344

345345
######################################################################
346-
# For more usage details about Minifier, please refer to `Troubleshooting <https://pytorch.org/docs/stable/dynamo/troubleshooting.html>`_.
346+
# For more usage details about Minifier, please refer to `Troubleshooting <https://pytorch.org/docs/stable/torch.compiler_troubleshooting.html>`_.
347347

348348

349349
######################################################################
Lines changed: 4 additions & 195 deletions
Original file line numberDiff line numberDiff line change
@@ -1,201 +1,10 @@
11
(prototype) Tracing-based Selective Build Mobile Interpreter in Android and iOS
22
===============================================================================
33

4+
This tutorial has been replaced with a newer tutorial on this topic: https://pytorch.org/executorch/stable/kernel-library-selective-build.html
45

5-
*Author*: Chen Lai <https://github.com/cccclai>, Dhruv Matani <https://github.com/dhruvbird>
6+
Redirecting in 3 seconds...
67

7-
.. warning::
8-
Tracing-based selective build a prototype feature to minimize library size. Since the traced result relies on the model input and traced environment, if the tracer runs in a different environment than mobile interpreter, the operator list might be different from the actual used operator list and missing operators error might raise.
8+
.. raw:: html
99

10-
Introduction
11-
------------
12-
13-
14-
This tutorial introduces a new way to custom build mobile interpreter to further optimize mobile interpreter size. It restricts the set of operators included in the compiled binary to only the set of operators actually needed by target models. It is a technique to reduce the binary size of PyTorch for mobile deployments. Tracing Based Selective Build runs a model with specific representative inputs, and records which operators were called. The build then includes just those operators.
15-
16-
17-
Following are the processes to use tracing-based selective approach to build a custom mobile interpreter.
18-
19-
1. *Prepare model with bundled input*
20-
21-
.. code:: python
22-
23-
import numpy as np
24-
import torch
25-
import torch.jit
26-
import torch.utils
27-
import torch.utils.bundled_inputs
28-
from PIL import Image
29-
from torchvision import transforms
30-
31-
# Step 1. Get the model
32-
model = torch.hub.load('pytorch/vision:v0.7.0', 'deeplabv3_resnet50', pretrained=True)
33-
model.eval()
34-
35-
scripted_module = torch.jit.script(model)
36-
# Export full jit version model (not compatible lite interpreter), leave it here for comparison
37-
scripted_module.save("deeplabv3_scripted.pt")
38-
# Export lite interpreter version model (compatible with lite interpreter)
39-
# path = "<base directory where models are stored>"
40-
41-
scripted_module._save_for_lite_interpreter(f"${path}/deeplabv3_scripted.ptl")
42-
43-
model_file = f"${path}/deeplabv3_scripted.ptl"
44-
45-
# Step 2. Prepare inputs for the model
46-
input_image_1 = Image.open(f"${path}/dog.jpg")
47-
preprocess = transforms.Compose([
48-
transforms.ToTensor(),
49-
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
50-
])
51-
52-
input_tensor_1 = preprocess(input_image_1)
53-
input_batch_1 = input_tensor_1.unsqueeze(0) # create a mini-batch as expected by the model
54-
55-
scripted_module = torch.jit.load(model_file)
56-
scripted_module.forward(input_batch_1) # optional, to validate the model can run with the input_batch_1
57-
58-
input_image_2 = Image.open(f"${path}/deeplab.jpg")
59-
input_tensor_2 = preprocess(input_image_2)
60-
input_batch_2 = input_tensor_2.unsqueeze(0) # create a mini-batch as expected by the model
61-
62-
scripted_module = torch.jit.load(model_file)
63-
scripted_module.forward(input_batch_2) # optional, to validate the model can run with the input_batch_2
64-
65-
# Step 3. Bundle the model with the prepared input from step2. Can bundle as many input as possible.
66-
bundled_model_input = [
67-
(torch.utils.bundled_inputs.bundle_large_tensor(input_batch_1), ),
68-
(torch.utils.bundled_inputs.bundle_large_tensor(input_batch_2), )]
69-
bundled_model = torch.utils.bundled_inputs.bundle_inputs(scripted_module, bundled_model_input)
70-
bundled_model._save_for_lite_interpreter(f"${path}/deeplabv3_scripted_with_bundled_input.ptl")
71-
72-
2. Build tracer
73-
74-
.. code:: shell
75-
76-
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ MAX_JOBS=16 TRACING_BASED=1 python setup.py develop
77-
78-
3. Run tracer with the model with bundled input
79-
80-
.. code:: shell
81-
82-
./build/bin/model_tracer --model_input_path ${path}/deeplabv3_scripted_with_bundled_input.ptl --build_yaml_path ${path}/deeplabv3_scripted.yaml
83-
84-
85-
86-
Android
87-
-------
88-
89-
Get the Image Segmentation demo app in Android: https://github.com/pytorch/android-demo-app/tree/master/ImageSegmentation
90-
91-
1. **Tracing-based build libtorch lite for android**: Build libtorch for android for all 4 android abis (``armeabi-v7a``, ``arm64-v8a``, ``x86``, ``x86_64``) by running
92-
93-
.. code-block:: bash
94-
95-
SELECTED_OP_LIST=${path}/deeplabv3_scripted.yaml TRACING_BASED=1 ./scripts/build_pytorch_android.sh
96-
97-
if it will be tested on Pixel 4 emulator with ``x86``, use cmd ``BUILD_LITE_INTERPRETER=1 ./scripts/build_pytorch_android.sh x86`` to specify abi to save build time.
98-
99-
.. code-block:: bash
100-
101-
SELECTED_OP_LIST=${path}/deeplabv3_scripted.yaml TRACING_BASED=1 ./scripts/build_pytorch_android.sh x86
102-
103-
104-
After the build finish, it will show the library path:
105-
106-
.. code-block:: bash
107-
108-
BUILD SUCCESSFUL in 55s
109-
134 actionable tasks: 22 executed, 112 up-to-date
110-
+ find /Users/chenlai/pytorch/android -type f -name '*aar'
111-
+ xargs ls -lah
112-
-rw-r--r-- 1 chenlai staff 13M Feb 11 11:48 /Users/chenlai/pytorch/android/pytorch_android/build/outputs/aar/pytorch_android-release.aar
113-
-rw-r--r-- 1 chenlai staff 36K Feb 9 16:45 /Users/chenlai/pytorch/android/pytorch_android_torchvision/build/outputs/aar/pytorch_android_torchvision-release.aar
114-
115-
2. **Use the PyTorch Android libraries built from source in the ImageSegmentation app**: Create a folder `libs` in the path, the path from repository root will be `ImageSegmentation/app/libs`. Copy `pytorch_android-release` to the path ``ImageSegmentation/app/libs/pytorch_android-release.aar``. Copy `pytorch_android_torchvision` (downloaded from `Pytorch Android Torchvision Nightly <https://oss.sonatype.org/#nexus-search;quick~torchvision_android/>`_) to the path ``ImageSegmentation/app/libs/pytorch_android_torchvision.aar``. Update the `dependencies` part of ``ImageSegmentation/app/build.gradle`` to
116-
117-
.. code:: gradle
118-
119-
dependencies {
120-
implementation 'androidx.appcompat:appcompat:1.2.0'
121-
implementation 'androidx.constraintlayout:constraintlayout:2.0.2'
122-
testImplementation 'junit:junit:4.12'
123-
androidTestImplementation 'androidx.test.ext:junit:1.1.2'
124-
androidTestImplementation 'androidx.test.espresso:espresso-core:3.3.0'
125-
126-
127-
implementation(name:'pytorch_android-release', ext:'aar')
128-
implementation(name:'pytorch_android_torchvision', ext:'aar')
129-
130-
implementation 'com.android.support:appcompat-v7:28.0.0'
131-
implementation 'com.facebook.fbjni:fbjni-java-only:0.0.3'
132-
}
133-
134-
Update `all projects` part in ``ImageSegmentation/build.gradle`` to
135-
136-
137-
.. code:: gradle
138-
139-
allprojects {
140-
repositories {
141-
google()
142-
jcenter()
143-
flatDir {
144-
dirs 'libs'
145-
}
146-
}
147-
}
148-
149-
150-
3. **Test app**: Build and run the `ImageSegmentation` app in Android Studio
151-
152-
153-
iOS
154-
---
155-
156-
Get ImageSegmentation demo app in iOS: https://github.com/pytorch/ios-demo-app/tree/master/ImageSegmentation
157-
158-
159-
1. **Build libtorch lite for iOS**:
160-
161-
.. code-block:: bash
162-
163-
SELECTED_OP_LIST=${path}/deeplabv3_scripted.yaml TRACING_BASED=1 IOS_PLATFORM=SIMULATOR ./scripts/build_ios.sh
164-
165-
166-
2. **Remove Cocoapods from the project** (this step is only needed if you ran `pod install`):
167-
168-
169-
.. code-block:: bash
170-
171-
pod deintegrate
172-
173-
174-
3. **Link ImageSegmentation demo app with the custom built library**:
175-
176-
Open your project in XCode, go to your project Target’s **Build Phases - Link Binaries With Libraries**, click the **+** sign and add all the library files located in `build_ios/install/lib`. Navigate to the project **Build Settings**, set the value **Header Search Paths** to `build_ios/install/include` and **Library Search Paths** to `build_ios/install/lib`.
177-
In the build settings, search for **other linker flags**. Add a custom linker flag below `-all_load`.
178-
Finally, disable bitcode for your target by selecting the Build Settings, searching for Enable Bitcode, and set the value to **No**.
179-
180-
181-
4. **Build and test the app in Xcode.**
182-
183-
184-
185-
Conclusion
186-
----------
187-
188-
In this tutorial, we demonstrated a new way to custom build PyTorch's efficient mobile interpreter - tracing-based selective build, in an Android and iOS app.
189-
190-
We walked through an Image Segmentation example to show how to bundle inputs to a model, generated operator list by tracing the model with bundled input, and build a custom torch library from source with the operator list from tracing result.
191-
192-
The custom build is still under development, and we will continue improving its size in the future. Note, however, that the APIs are subject to change in future versions.
193-
194-
Thanks for reading! As always, we welcome any feedback, so please create an issue here <https://github.com/pytorch/pytorch/issues>`.
195-
196-
Learn More
197-
198-
199-
- To learn more about PyTorch Mobile, please refer to PyTorch Mobile Home Page <https://pytorch.org/mobile/home/>
200-
201-
* To learn more about Image Segmentation, please refer to the Image Segmentation DeepLabV3 on Android Recipe <https://pytorch.org/tutorials/beginner/deeplabv3_on_android.html>_
10+
<meta http-equiv="Refresh" content="3; url='https://pytorch.org/executorch/stable/kernel-library-selective-build.html'" />

recipes_source/distributed_device_mesh.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,7 @@ DeviceMesh allows users to slice child mesh from the parent mesh and re-use the
164164
165165
# Users can access the underlying process group thru `get_group` API.
166166
replicate_group = hsdp_mesh["replicate"].get_group()
167-
shard_group = hsdp_mesh["Shard"].get_group()
167+
shard_group = hsdp_mesh["shard"].get_group()
168168
tp_group = tp_mesh.get_group()
169169
170170

0 commit comments

Comments
 (0)