You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: recipes_source/xeon_run_cpu.rst
+138-2Lines changed: 138 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -11,8 +11,10 @@ helping the users try out an optimal coordination of resource utilization for th
11
11
What You Will Learn
12
12
-------------------
13
13
14
-
* How to utilize tools like ``numactl``, ``taskset``, Intel(R) OpenMP Runtime Library and optimized memory allocators such as ``TCMalloc`` and ``JeMalloc`` for enhanced performance.
15
-
* How to configure CPU resources and memory management to maximize PyTorch inference performance on Intel(R) Xeon(R) processors.
14
+
* How to utilize tools like ``numactl``, ``taskset``, Intel(R) OpenMP Runtime Library and
15
+
optimized memory allocators such as ``TCMalloc`` and ``JeMalloc`` for enhanced performance.
16
+
* How to configure CPU resources and memory management to maximize PyTorch
17
+
inference performance on Intel(R) Xeon(R) processors.
16
18
17
19
Introduction of the Optimizations
18
20
---------------------------------
@@ -196,6 +198,16 @@ The command above has the following positional arguments:
196
198
| ``program_args`` | All the arguments for the program/script to be launched.|
@@ -229,6 +269,30 @@ Knobs for applying or disabling optimizations are:
229
269
| ``--disable-iomp`` | bool | False | By default, Intel(R) OpenMP lib will be used if installed. Setting this flag would disable the usage of Intel(R) OpenMP. |
- Use default memory allocator. Neither ``TCMalloc`` nor ``JeMalloc`` would be used.
291
+
* - ``--disable-iomp``
292
+
- bool
293
+
- False
294
+
- By default, Intel(R) OpenMP lib will be used if installed. Setting this flag would disable the usage of Intel(R) OpenMP.
295
+
232
296
.. note::
233
297
234
298
Memory allocator influences performance. If users do not specify a desired memory allocator, the ``run_cpu`` script will search if any of them is installed in the order of TCMalloc > JeMalloc > PyTorch default memory allocator, and takes the first matched one.
@@ -263,6 +327,62 @@ Knobs for controlling instance number and compute resource allocation are:
263
327
| ``--disable-taskset`` | bool | False | Disable the usage of ``taskset`` command. |
- Depending on knobs you set, <lib>/libiomp5.so, <lib>/libjemalloc.so, <lib>/libtcmalloc.so might be appended to LD_PRELOAD.
413
+
* - KMP_AFFINITY
414
+
- If libiomp5.so is preloaded, KMP_AFFINITY could be set to ``"granularity=fine,compact,1,0"``.
415
+
* - KMP_BLOCKTIME
416
+
- If libiomp5.so is preloaded, KMP_BLOCKTIME is set to "1".
417
+
* - OMP_NUM_THREADS
418
+
- Value of ``ncores_per_instance``
419
+
* - MALLOC_CONF
420
+
- If libjemalloc.so is preloaded, MALLOC_CONF will be set to ``"oversize_threshold:1,background_thread:true,metadata_thp:auto"``.
421
+
286
422
Please note that the script respects environment variables set preliminarily. For example, if you have set the environment variables mentioned above before running the script, the values of the variables will not be overwritten by the script.
0 commit comments