@@ -190,14 +190,6 @@ The argument list and usage guidance can be shown with the following command:
190
190
191
191
The command above has the following positional arguments:
192
192
193
- +------------------+---------------------------------------------------------+
194
- | knob | help |
195
- +==================+=========================================================+
196
- | ``program `` | The full path of the program/script to be launched. |
197
- +------------------+---------------------------------------------------------+
198
- | ``program_args `` | All the arguments for the program/script to be launched.|
199
- +------------------+---------------------------------------------------------+
200
-
201
193
.. list-table ::
202
194
:widths: 25 50
203
195
:header-rows: 1
@@ -213,20 +205,6 @@ Explanation of the options
213
205
214
206
The generic option settings (knobs) include the following:
215
207
216
- +----------------------+------+---------------+-------------------------------------------------------------------------------------------------------------------------+
217
- | knob | type | default value | help |
218
- +======================+======+===============+=========================================================================================================================+
219
- | ``-h ``, ``--help `` | | | Show the help message and exit. |
220
- +----------------------+------+---------------+-------------------------------------------------------------------------------------------------------------------------+
221
- | ``-m ``, ``--module `` | | | Changes each process to interpret the launch script as a python module, executing with the same behavior as "python -m".|
222
- +----------------------+------+---------------+-------------------------------------------------------------------------------------------------------------------------+
223
- | ``--no-python `` | bool | False | Do not prepend the program with "python" - just exec it directly. Useful when the script is not a Python script. |
224
- +----------------------+------+---------------+-------------------------------------------------------------------------------------------------------------------------+
225
- | ``--log-path `` | str | '' | The log file directory. Default path is ``'' ``, which means disable logging to files. |
226
- +----------------------+------+---------------+-------------------------------------------------------------------------------------------------------------------------+
227
- | ``--log-file-prefix``| str | 'run' | log file name prefix. |
228
- +----------------------+------+---------------+-------------------------------------------------------------------------------------------------------------------------+
229
-
230
208
.. list-table ::
231
209
:widths: 25 10 15 50
232
210
:header-rows: 1
@@ -257,18 +235,6 @@ The generic option settings (knobs) include the following:
257
235
258
236
Knobs for applying or disabling optimizations are:
259
237
260
- +-----------------------------+------+---------------+--------------------------------------------------------------------------------------------------------------------------+
261
- | knob | type | default value | help |
262
- +=============================+======+===============+==========================================================================================================================+
263
- | ``--enable-tcmalloc `` | bool | False | Enable ``TCMalloc `` memory allocator. |
264
- +-----------------------------+------+---------------+--------------------------------------------------------------------------------------------------------------------------+
265
- | ``--enable-jemalloc `` | bool | False | Enable ``JeMalloc `` memory allocator. |
266
- +-----------------------------+------+---------------+--------------------------------------------------------------------------------------------------------------------------+
267
- | ``--use-default-allocator `` | bool | False | Use default memory allocator. Neither ``TCMalloc `` nor ``JeMalloc `` would be used. |
268
- +-----------------------------+------+---------------+--------------------------------------------------------------------------------------------------------------------------+
269
- | ``--disable-iomp `` | bool | False | By default, Intel(R) OpenMP lib will be used if installed. Setting this flag would disable the usage of Intel(R) OpenMP. |
270
- +-----------------------------+------+---------------+--------------------------------------------------------------------------------------------------------------------------+
271
-
272
238
.. list-table ::
273
239
:widths: 25 10 15 50
274
240
:header-rows: 1
@@ -299,34 +265,6 @@ Knobs for applying or disabling optimizations are:
299
265
300
266
Knobs for controlling instance number and compute resource allocation are:
301
267
302
- +-----------------------------+------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------+
303
- | knob | type | default value | help |
304
- +=============================+======+===============+==============================================================================================================================================+
305
- | ``--ninstances `` | int | 0 | Number of instances. |
306
- +-----------------------------+------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------+
307
- | ``--ncores-per-instance `` | int | 0 | Number of cores used by every instance. |
308
- +-----------------------------+------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------+
309
- | ``--node-id `` | int | -1 | Node id for multi-instance, by default all nodes will be used. |
310
- +-----------------------------+------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------+
311
- | ``--core-list `` | str | '' | Specify the core list as "core_id, core_id, ...." or core range as "core_id-core_id". By dafault all the cores will be used. |
312
- +-----------------------------+------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------+
313
- | ``--use-logical-core `` | bool | False | By default only physical cores are used. Specify this flag to use logical cores. |
314
- +-----------------------------+------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------+
315
- | ``--skip-cross-node-cores `` | bool | False | Prevent the workload to be executed on cores across NUMA nodes. |
316
- +-----------------------------+------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------+
317
- | ``--rank `` | int | -1 | Specify instance index to assign ncores_per_instance for rank; otherwise ncores_per_instance will be assigned sequentially to the instances. |
318
- +-----------------------------+------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------+
319
- | ``--multi-instance `` | bool | False | A quick set to invoke multiple instances of the workload on multi-socket CPU servers. |
320
- +-----------------------------+------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------+
321
- | ``--latency-mode `` | bool | False | A quick set to invoke benchmarking with latency mode, in which all physical cores are used and 4 cores per instance. |
322
- +-----------------------------+------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------+
323
- | ``--throughput-mode `` | bool | False | A quick set to invoke benchmarking with throughput mode, in which all physical cores are used and 1 numa node per instance. |
324
- +-----------------------------+------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------+
325
- | ``--disable-numactl `` | bool | False | By default ``numactl `` command is used to control NUMA access. Setting this flag will disable it. |
326
- +-----------------------------+------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------+
327
- | ``--disable-taskset `` | bool | False | Disable the usage of ``taskset `` command. |
328
- +-----------------------------+------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------+
329
-
330
268
.. list-table ::
331
269
:widths: 25 10 15 50
332
270
:header-rows: 1
@@ -387,26 +325,10 @@ Knobs for controlling instance number and compute resource allocation are:
387
325
388
326
Environment variables that will be set by this script include the following:
389
327
390
- +------------------+-------------------------------------------------------------------------------------------------+
391
- | Environ Variable | Value |
392
- +==================+=================================================================================================+
393
- | LD_PRELOAD | Depending on knobs you set, <lib>/libiomp5.so, <lib>/libjemalloc.so, <lib>/libtcmalloc.so might |
394
- | | be appended to LD_PRELOAD. |
395
- +------------------+-------------------------------------------------------------------------------------------------+
396
- | KMP_AFFINITY | If libiomp5.so is preloaded, KMP_AFFINITY could be set to "granularity=fine,compact,1,0". |
397
- +------------------+-------------------------------------------------------------------------------------------------+
398
- | KMP_BLOCKTIME | If libiomp5.so is preloaded, KMP_BLOCKTIME is set to "1". |
399
- +------------------+-------------------------------------------------------------------------------------------------+
400
- | OMP_NUM_THREADS | value of ncores_per_instance |
401
- +------------------+-------------------------------------------------------------------------------------------------+
402
- | MALLOC_CONF | If libjemalloc.so is preloaded, MALLOC_CONF will be set to |
403
- | | "oversize_threshold:1,background_thread:true,metadata_thp:auto". |
404
- +------------------+-------------------------------------------------------------------------------------------------+
405
-
406
328
.. list-table ::
407
329
:widths: 25 50
408
330
:header-rows: 1
409
- * - Environ Variable
331
+ * - Environment Variable
410
332
- Value
411
333
* - LD_PRELOAD
412
334
- Depending on knobs you set, <lib>/libiomp5.so, <lib>/libjemalloc.so, <lib>/libtcmalloc.so might be appended to LD_PRELOAD.
@@ -435,4 +357,4 @@ See also:
435
357
436
358
* `PyTorch Performance Tuning Guide <https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html#cpu-specific-optimizations >`__
437
359
* `PyTorch Multiprocessing Best Practices <https://pytorch.org/docs/stable/notes/multiprocessing.html#cpu-in-multiprocessing >`__
438
- * Grokking PyTorch Intel CPU performance: `Part 1 <https://pytorch.org/tutorials/intermediate/torchserve_with_ipex >`__ `Part 2 <https://pytorch.org/tutorials/intermediate/torchserve_with_ipex_2 >`__
360
+ * Grokking PyTorch Intel CPU performance: `Part 1 <https://pytorch.org/tutorials/intermediate/torchserve_with_ipex >`__ `Part 2 <https://pytorch.org/tutorials/intermediate/torchserve_with_ipex_2 >`__
0 commit comments