Skip to content

Commit bfd4506

Browse files
author
Svetlana Karslioglu
authored
Merge branch 'main' into spellcheck-intermediate-python
2 parents 1ceb5e6 + 8c1d408 commit bfd4506

File tree

2 files changed

+68
-1
lines changed

2 files changed

+68
-1
lines changed

intermediate_source/torchserve_with_ipex.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -204,7 +204,7 @@ We'll compare the following three configurations:
204204

205205
(2) `torch.set_num_threads <https://pytorch.org/docs/stable/generated/torch.set_num_threads.html>`_ = ``number of physical cores / number of workers`` (no core pinning)
206206

207-
(3) core pinning via the launch script
207+
(3) core pinning via the launch script (Required Torchserve>=0.6.1)
208208

209209
After this exercise, we'll have verified that we prefer avoiding logical cores and prefer local memory access via core pinning with a real TorchServe use case.
210210

recipes_source/mobile_perf.rst

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -199,6 +199,73 @@ You can check how it looks in code in `pytorch android application example <http
199199
Member fields ``mModule``, ``mInputTensorBuffer`` and ``mInputTensor`` are initialized only once
200200
and buffer is refilled using ``org.pytorch.torchvision.TensorImageUtils.imageYUV420CenterCropToFloatBuffer``.
201201

202+
6. Load time optimization
203+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
204+
**Available since Pytorch 1.13**
205+
206+
PyTorch Mobile also supports a FlatBuffer-based file format that is faster
207+
to load. Both flatbuffer and pickle-based model file can be load with the
208+
same ``_load_for_lite_interpreter`` (Python) or ``_load_for_mobile``(C++) API.
209+
210+
To use the FlatBuffer format, instead of creating the model file with
211+
``model._save_for_lite_interpreter('path/to/file.ptl')``, you can run the following command:
212+
213+
214+
One can save using
215+
216+
::
217+
218+
model._save_for_lite_interpreter('path/to/file.ptl', _use_flatbuffer=True)
219+
220+
221+
The extra argument ``_use_flatbuffer`` makes a FlatBuffer file instead of a
222+
zip file. The created file will be faster to load.
223+
224+
For example, using ResNet-50 and running the following script:
225+
226+
::
227+
228+
import torch
229+
from torch.jit import mobile
230+
import time
231+
model = torch.hub.load('pytorch/vision:v0.10.0', 'deeplabv3_resnet50', pretrained=True)
232+
model.eval()
233+
jit_model = torch.jit.script(model)
234+
235+
jit_model._save_for_lite_interpreter('/tmp/jit_model.ptl')
236+
jit_model._save_for_lite_interpreter('/tmp/jit_model.ff', _use_flatbuffer=True)
237+
238+
import timeit
239+
print('Load ptl file:')
240+
print(timeit.timeit('from torch.jit import mobile; mobile._load_for_lite_interpreter("/tmp/jit_model.ptl")',
241+
number=20))
242+
print('Load flatbuffer file:')
243+
print(timeit.timeit('from torch.jit import mobile; mobile._load_for_lite_interpreter("/tmp/jit_model.ff")',
244+
number=20))
245+
246+
247+
248+
you would get the following result:
249+
250+
::
251+
252+
Load ptl file:
253+
0.5387594579999999
254+
Load flatbuffer file:
255+
0.038842832999999466
256+
257+
While speed ups on actual mobile devices will be smaller, you can still expect
258+
3x - 6x load time reductions.
259+
260+
### Reasons to avoid using a FlatBuffer-based mobile model
261+
262+
However, FlatBuffer format also has some limitations that you might want to consider:
263+
264+
* It is only available in PyTorch 1.13 or later. Therefore, client devices compiled
265+
with earlier PyTorch versions might not be able to load it.
266+
* The Flatbuffer library imposes a 4GB limit for file sizes. So it is not suitable
267+
for large models.
268+
202269
Benchmarking
203270
------------
204271

0 commit comments

Comments
 (0)