@@ -199,6 +199,68 @@ You can check how it looks in code in `pytorch android application example <http
199
199
Member fields ``mModule ``, ``mInputTensorBuffer `` and ``mInputTensor `` are initialized only once
200
200
and buffer is refilled using ``org.pytorch.torchvision.TensorImageUtils.imageYUV420CenterCropToFloatBuffer ``.
201
201
202
+ 6. Load time optimization
203
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
204
+ **Available since Pytorch 1.13 **
205
+
206
+ Pytorch mobile also support a flatbuffer based file format that is faster
207
+ to load. Both flatbuffer and pickle based model file can be load with the
208
+ same `_load_for_lite_interpreter ` (Python) or `_load_for_mobile`(C++) API.
209
+
210
+ To use flatbuffer format, instead of create model file with
211
+
212
+ ::
213
+
214
+ model._save_for_lite_interpreter('path/to/file.ptl')
215
+
216
+
217
+ One can save using
218
+
219
+ ::
220
+
221
+ model._save_for_lite_interpreter('path/to/file.ptl', _use_flatbuffer=True)
222
+
223
+
224
+ The extra kwarg `_use_flatbuffer ` makes a flatbuffer file instead of
225
+ zip file. The created file will be faster to load.
226
+
227
+ For example, using resnet-50, running the following script:
228
+
229
+ ::
230
+
231
+ import torch
232
+ from torch.jit import mobile
233
+ import time
234
+ model = torch.hub.load('pytorch/vision:v0.10.0', 'deeplabv3_resnet50', pretrained=True)
235
+ model.eval()
236
+ jit_model = torch.jit.script(model)
237
+
238
+ jit_model._save_for_lite_interpreter('/tmp/jit_model.ptl')
239
+ jit_model._save_for_lite_interpreter('/tmp/jit_model.ff', _use_flatbuffer=True)
240
+
241
+ import timeit
242
+ print('Load ptl file:')
243
+ print(timeit.timeit('from torch.jit import mobile; mobile._load_for_lite_interpreter("/tmp/jit_model.ptl")',
244
+ number=20))
245
+ print('Load flatbuffer file:')
246
+ print(timeit.timeit('from torch.jit import mobile; mobile._load_for_lite_interpreter("/tmp/jit_model.ff")',
247
+ number=20))
248
+
249
+
250
+
251
+ yields
252
+
253
+ ::
254
+
255
+ Load ptl file:
256
+ 0.5387594579999999
257
+ Load flatbuffer file:
258
+ 0.038842832999999466
259
+
260
+ Speed ups on actual mobile devices will be smaller. One can still expect
261
+ 3x - 6x load time reductions.
262
+
263
+
202
264
Benchmarking
203
265
------------
204
266
0 commit comments