Merge branch 'main' into spellcheck-intermediate-python

Svetlana Karslioglu · web-flow · commit bfd4506ced5b · 2023-04-17T16:14:25.000-07:00
diff --git a/intermediate_source/torchserve_with_ipex.rst b/intermediate_source/torchserve_with_ipex.rst
@@ -204,7 +204,7 @@ We'll compare the following three configurations:
 
 (2) `torch.set_num_threads <https://pytorch.org/docs/stable/generated/torch.set_num_threads.html>`_ = ``number of physical cores / number of workers`` (no core pinning)
 
-(3) core pinning via the launch script 
+(3) core pinning via the launch script (Required Torchserve>=0.6.1)
 
 After this exercise, we'll have verified that we prefer avoiding logical cores and prefer local memory access via core pinning with a real TorchServe use case. 
 
diff --git a/recipes_source/mobile_perf.rst b/recipes_source/mobile_perf.rst
@@ -199,6 +199,73 @@ You can check how it looks in code in `pytorch android application example <http
 Member fields ``mModule``, ``mInputTensorBuffer`` and ``mInputTensor`` are initialized only once
 and buffer is refilled using ``org.pytorch.torchvision.TensorImageUtils.imageYUV420CenterCropToFloatBuffer``.
 
+6. Load time optimization
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+**Available since Pytorch 1.13**
+
+PyTorch Mobile also supports a FlatBuffer-based file format that is faster
+to load. Both flatbuffer and pickle-based model file can be load with the
+same ``_load_for_lite_interpreter`` (Python) or ``_load_for_mobile``(C++) API.
+
+To use the FlatBuffer format, instead of creating the model file with
+``model._save_for_lite_interpreter('path/to/file.ptl')``, you can run the following command:
+
+
+One can save using
+
+::
+
+  model._save_for_lite_interpreter('path/to/file.ptl', _use_flatbuffer=True)
+
+
+The extra argument ``_use_flatbuffer`` makes a FlatBuffer file instead of a
+zip file. The created file will be faster to load.
+
+For example, using ResNet-50 and running the following script:
+
+::
+
+  import torch
+  from torch.jit import mobile
+  import time
+  model = torch.hub.load('pytorch/vision:v0.10.0', 'deeplabv3_resnet50', pretrained=True)
+  model.eval()
+  jit_model = torch.jit.script(model)
+
+  jit_model._save_for_lite_interpreter('/tmp/jit_model.ptl')
+  jit_model._save_for_lite_interpreter('/tmp/jit_model.ff', _use_flatbuffer=True)
+
+  import timeit
+  print('Load ptl file:')
+  print(timeit.timeit('from torch.jit import mobile; mobile._load_for_lite_interpreter("/tmp/jit_model.ptl")',
+                         number=20))
+  print('Load flatbuffer file:')
+  print(timeit.timeit('from torch.jit import mobile; mobile._load_for_lite_interpreter("/tmp/jit_model.ff")',
+                         number=20))
+
+
+
+you would get the following result: 
+
+::
+
+  Load ptl file:
+  0.5387594579999999
+  Load flatbuffer file:
+  0.038842832999999466
+
+While speed ups on actual mobile devices will be smaller, you can still expect
+3x - 6x load time reductions.
+
+### Reasons to avoid using a FlatBuffer-based mobile model
+
+However, FlatBuffer format also has some limitations that you might want to consider:
+
+* It is only available in PyTorch 1.13 or later. Therefore, client devices compiled
+  with earlier PyTorch versions might not be able to load it.
+* The Flatbuffer library imposes a 4GB limit for file sizes. So it is not suitable
+  for large models.
+
 Benchmarking
 ------------