Skip to content

Commit a1f703e

Browse files
committed
Add a section to advertise use of flatbuffer format for mobile models.
1 parent 87fa403 commit a1f703e

File tree

1 file changed

+54
-0
lines changed

1 file changed

+54
-0
lines changed

recipes_source/mobile_perf.rst

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -199,6 +199,60 @@ You can check how it looks in code in `pytorch android application example <http
199199
Member fields ``mModule``, ``mInputTensorBuffer`` and ``mInputTensor`` are initialized only once
200200
and buffer is refilled using ``org.pytorch.torchvision.TensorImageUtils.imageYUV420CenterCropToFloatBuffer``.
201201

202+
6. Load time optimization
203+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
204+
**Available since Pytorch 1.13**
205+
206+
Pytorch mobile also support a flatbuffer based file format that is faster
207+
to load. Both flatbuffer and pickle based model file can be load with the
208+
same `_load_for_lite_interpreter` (Python) or `_load_for_mobile`(C++) API.
209+
210+
To use flatbuffer format, instead of create model file with
211+
212+
::
213+
model._save_for_lite_interpreter('path/to/file.ptl')
214+
215+
One can save using
216+
217+
::
218+
model._save_for_lite_interpreter('path/to/file.ptl', _use_flatbuffer=True)
219+
220+
The extra kwarg `_use_flatbuffer` makes a flatbuffer file instead of
221+
zip file. The created file will be faster to load.
222+
223+
For example, using resnet-50, running the following script:
224+
225+
::
226+
import torch
227+
from torch.jit import mobile
228+
import time
229+
model = torch.hub.load('pytorch/vision:v0.10.0', 'deeplabv3_resnet50', pretrained=True)
230+
model.eval()
231+
jit_model = torch.jit.script(model)
232+
233+
jit_model._save_for_lite_interpreter('/tmp/jit_model.ptl')
234+
jit_model._save_for_lite_interpreter('/tmp/jit_model.ff', _use_flatbuffer=True)
235+
236+
import timeit
237+
print('Load ptl file:')
238+
print(timeit.timeit('from torch.jit import mobile; mobile._load_for_lite_interpreter("/tmp/jit_model.ptl")',
239+
number=20))
240+
print('Load flatbuffer file:')
241+
print(timeit.timeit('from torch.jit import mobile; mobile._load_for_lite_interpreter("/tmp/jit_model.ff")',
242+
number=20))
243+
244+
yields
245+
246+
::
247+
Load ptl file:
248+
0.5387594579999999
249+
Load flatbuffer file:
250+
0.038842832999999466
251+
252+
Speed ups on actual mobile devices will be smaller. One can still expect
253+
3x - 6x load time reductions.
254+
255+
202256
Benchmarking
203257
------------
204258

0 commit comments

Comments
 (0)