@@ -199,6 +199,60 @@ You can check how it looks in code in `pytorch android application example <http
199
199
Member fields ``mModule ``, ``mInputTensorBuffer `` and ``mInputTensor `` are initialized only once
200
200
and buffer is refilled using ``org.pytorch.torchvision.TensorImageUtils.imageYUV420CenterCropToFloatBuffer ``.
201
201
202
+ 6. Load time optimization
203
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
204
+ **Available since Pytorch 1.13 **
205
+
206
+ Pytorch mobile also support a flatbuffer based file format that is faster
207
+ to load. Both flatbuffer and pickle based model file can be load with the
208
+ same `_load_for_lite_interpreter ` (Python) or `_load_for_mobile`(C++) API.
209
+
210
+ To use flatbuffer format, instead of create model file with
211
+
212
+ ::
213
+ model._save_for_lite_interpreter('path/to/file.ptl')
214
+
215
+ One can save using
216
+
217
+ ::
218
+ model._save_for_lite_interpreter('path/to/file.ptl', _use_flatbuffer=True)
219
+
220
+ The extra kwarg `_use_flatbuffer ` makes a flatbuffer file instead of
221
+ zip file. The created file will be faster to load.
222
+
223
+ For example, using resnet-50, running the following script:
224
+
225
+ ::
226
+ import torch
227
+ from torch.jit import mobile
228
+ import time
229
+ model = torch.hub.load('pytorch/vision:v0.10.0', 'deeplabv3_resnet50', pretrained=True)
230
+ model.eval()
231
+ jit_model = torch.jit.script(model)
232
+
233
+ jit_model._save_for_lite_interpreter('/tmp/jit_model.ptl')
234
+ jit_model._save_for_lite_interpreter('/tmp/jit_model.ff', _use_flatbuffer=True)
235
+
236
+ import timeit
237
+ print('Load ptl file:')
238
+ print(timeit.timeit('from torch.jit import mobile; mobile._load_for_lite_interpreter("/tmp/jit_model.ptl")',
239
+ number=20))
240
+ print('Load flatbuffer file:')
241
+ print(timeit.timeit('from torch.jit import mobile; mobile._load_for_lite_interpreter("/tmp/jit_model.ff")',
242
+ number=20))
243
+
244
+ yields
245
+
246
+ ::
247
+ Load ptl file:
248
+ 0.5387594579999999
249
+ Load flatbuffer file:
250
+ 0.038842832999999466
251
+
252
+ Speed ups on actual mobile devices will be smaller. One can still expect
253
+ 3x - 6x load time reductions.
254
+
255
+
202
256
Benchmarking
203
257
------------
204
258
0 commit comments