@@ -199,6 +199,73 @@ You can check how it looks in code in `pytorch android application example <http
199
199
Member fields ``mModule ``, ``mInputTensorBuffer `` and ``mInputTensor `` are initialized only once
200
200
and buffer is refilled using ``org.pytorch.torchvision.TensorImageUtils.imageYUV420CenterCropToFloatBuffer ``.
201
201
202
+ 6. Load time optimization
203
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
204
+ **Available since Pytorch 1.13 **
205
+
206
+ PyTorch Mobile also supports a FlatBuffer-based file format that is faster
207
+ to load. Both flatbuffer and pickle-based model file can be load with the
208
+ same ``_load_for_lite_interpreter `` (Python) or ``_load_for_mobile``(C++) API.
209
+
210
+ To use the FlatBuffer format, instead of creating the model file with
211
+ ``model._save_for_lite_interpreter('path/to/file.ptl') ``, you can run the following command:
212
+
213
+
214
+ One can save using
215
+
216
+ ::
217
+
218
+ model._save_for_lite_interpreter('path/to/file.ptl', _use_flatbuffer=True)
219
+
220
+
221
+ The extra argument ``_use_flatbuffer `` makes a FlatBuffer file instead of a
222
+ zip file. The created file will be faster to load.
223
+
224
+ For example, using ResNet-50 and running the following script:
225
+
226
+ ::
227
+
228
+ import torch
229
+ from torch.jit import mobile
230
+ import time
231
+ model = torch.hub.load('pytorch/vision:v0.10.0', 'deeplabv3_resnet50', pretrained=True)
232
+ model.eval()
233
+ jit_model = torch.jit.script(model)
234
+
235
+ jit_model._save_for_lite_interpreter('/tmp/jit_model.ptl')
236
+ jit_model._save_for_lite_interpreter('/tmp/jit_model.ff', _use_flatbuffer=True)
237
+
238
+ import timeit
239
+ print('Load ptl file:')
240
+ print(timeit.timeit('from torch.jit import mobile; mobile._load_for_lite_interpreter("/tmp/jit_model.ptl")',
241
+ number=20))
242
+ print('Load flatbuffer file:')
243
+ print(timeit.timeit('from torch.jit import mobile; mobile._load_for_lite_interpreter("/tmp/jit_model.ff")',
244
+ number=20))
245
+
246
+
247
+
248
+ you would get the following result:
249
+
250
+ ::
251
+
252
+ Load ptl file:
253
+ 0.5387594579999999
254
+ Load flatbuffer file:
255
+ 0.038842832999999466
256
+
257
+ While speed ups on actual mobile devices will be smaller, you can still expect
258
+ 3x - 6x load time reductions.
259
+
260
+ ### Reasons to avoid using a FlatBuffer-based mobile model
261
+
262
+ However, FlatBuffer format also has some limitations that you might want to consider:
263
+
264
+ * It is only available in PyTorch 1.13 or later. Therefore, client devices compiled
265
+ with earlier PyTorch versions might not be able to load it.
266
+ * The Flatbuffer library imposes a 4GB limit for file sizes. So it is not suitable
267
+ for large models.
268
+
202
269
Benchmarking
203
270
------------
204
271
0 commit comments