Skip to content

Commit 517e66f

Browse files
committed
[trial] change table format
1 parent 73fe42c commit 517e66f

File tree

1 file changed

+138
-2
lines changed

1 file changed

+138
-2
lines changed

recipes_source/xeon_run_cpu.rst

Lines changed: 138 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,10 @@ helping the users try out an optimal coordination of resource utilization for th
1111
What You Will Learn
1212
-------------------
1313

14-
* How to utilize tools like ``numactl``, ``taskset``, Intel(R) OpenMP Runtime Library and optimized memory allocators such as ``TCMalloc`` and ``JeMalloc`` for enhanced performance.
15-
* How to configure CPU resources and memory management to maximize PyTorch inference performance on Intel(R) Xeon(R) processors.
14+
* How to utilize tools like ``numactl``, ``taskset``, Intel(R) OpenMP Runtime Library and
15+
optimized memory allocators such as ``TCMalloc`` and ``JeMalloc`` for enhanced performance.
16+
* How to configure CPU resources and memory management to maximize PyTorch
17+
inference performance on Intel(R) Xeon(R) processors.
1618

1719
Introduction of the Optimizations
1820
---------------------------------
@@ -196,6 +198,16 @@ The command above has the following positional arguments:
196198
| ``program_args`` | All the arguments for the program/script to be launched.|
197199
+------------------+---------------------------------------------------------+
198200

201+
.. list-table::
202+
:widths: 25 50
203+
:header-rows: 1
204+
* - knob
205+
- help
206+
* - ``program``
207+
- The full path of the program/script to be launched.
208+
* - ``program_args``
209+
- The input arguments for the program/script to be launched.
210+
199211
Explanation of the options
200212
~~~~~~~~~~~~~~~~~~~~~~~~~~
201213

@@ -215,6 +227,34 @@ The generic option settings (knobs) include the following:
215227
| ``--log-file-prefix``| str | 'run' | log file name prefix. |
216228
+----------------------+------+---------------+-------------------------------------------------------------------------------------------------------------------------+
217229

230+
.. list-table::
231+
:widths: 25 10 15 50
232+
:header-rows: 1
233+
* - knob
234+
- type
235+
- default value
236+
- help
237+
* - ``-h``, ``--help``
238+
-
239+
-
240+
- Show the help message and exit.
241+
* - ``-m``, ``--module``
242+
-
243+
-
244+
- Changes each process to interpret the launch script as a python module, executing with the same behavior as "python -m".
245+
* - ``--no-python``
246+
- bool
247+
- False
248+
- Do not prepend the program with "python" - just exec it directly. Useful when the script is not a Python script.
249+
* - ``--log-path``
250+
- str
251+
- ``''``
252+
- The log file directory. Default path is ``''``, which means disable logging to files.
253+
* - ``--log-file-prefix``
254+
- str
255+
- 'run'
256+
- log file name prefix.
257+
218258
Knobs for applying or disabling optimizations are:
219259

220260
+-----------------------------+------+---------------+--------------------------------------------------------------------------------------------------------------------------+
@@ -229,6 +269,30 @@ Knobs for applying or disabling optimizations are:
229269
| ``--disable-iomp`` | bool | False | By default, Intel(R) OpenMP lib will be used if installed. Setting this flag would disable the usage of Intel(R) OpenMP. |
230270
+-----------------------------+------+---------------+--------------------------------------------------------------------------------------------------------------------------+
231271

272+
.. list-table::
273+
:widths: 25 10 15 50
274+
:header-rows: 1
275+
* - knob
276+
- type
277+
- default value
278+
- help
279+
* - ``--enable-tcmalloc``
280+
- bool
281+
- False
282+
- Enable ``TCMalloc`` memory allocator.
283+
* - ``--enable-jemalloc``
284+
- bool
285+
- False
286+
- Enable ``JeMalloc`` memory allocator.
287+
* - ``--use-default-allocator``
288+
- bool
289+
- False
290+
- Use default memory allocator. Neither ``TCMalloc`` nor ``JeMalloc`` would be used.
291+
* - ``--disable-iomp``
292+
- bool
293+
- False
294+
- By default, Intel(R) OpenMP lib will be used if installed. Setting this flag would disable the usage of Intel(R) OpenMP.
295+
232296
.. note::
233297

234298
Memory allocator influences performance. If users do not specify a desired memory allocator, the ``run_cpu`` script will search if any of them is installed in the order of TCMalloc > JeMalloc > PyTorch default memory allocator, and takes the first matched one.
@@ -263,6 +327,62 @@ Knobs for controlling instance number and compute resource allocation are:
263327
| ``--disable-taskset`` | bool | False | Disable the usage of ``taskset`` command. |
264328
+-----------------------------+------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------+
265329

330+
.. list-table::
331+
:widths: 25 10 15 50
332+
:header-rows: 1
333+
* - knob
334+
- type
335+
- default value
336+
- help
337+
* - ``--ninstances``
338+
- int
339+
- 0
340+
- Number of instances.
341+
* - ``--ncores-per-instance``
342+
- int
343+
- 0
344+
- Number of cores used by every instance.
345+
* - ``--node-id``
346+
- int
347+
- -1
348+
- Node id for multi-instance, by default all nodes will be used.
349+
* - ``--core-list``
350+
- str
351+
- ``''``
352+
- Specify the core list as ``'core_id, core_id, ....'`` or core range as ``'core_id-core_id'``. By dafault all the cores will be used.
353+
* - ``--use-logical-core``
354+
- bool
355+
- False
356+
- By default only physical cores are used. Specify this flag to use logical cores.
357+
* - ``--skip-cross-node-cores``
358+
- bool
359+
- False
360+
- Prevent the workload to be executed on cores across NUMA nodes.
361+
* - ``--rank``
362+
- int
363+
- -1
364+
- Specify instance index to assign ncores_per_instance for rank; otherwise ncores_per_instance will be assigned sequentially to the instances.
365+
* - ``--multi-instance``
366+
- bool
367+
- False
368+
- A quick set to invoke multiple instances of the workload on multi-socket CPU servers.
369+
* - ``--latency-mode``
370+
- bool
371+
- False
372+
- A quick set to invoke benchmarking with latency mode, in which all physical cores are used and 4 cores per instance.
373+
* - ``--throughput-mode``
374+
- bool
375+
- False
376+
- A quick set to invoke benchmarking with throughput mode, in which all physical cores are used and 1 numa node per instance.
377+
* - ``--disable-numactl``
378+
- bool
379+
- False
380+
- By default ``numactl`` command is used to control NUMA access. Setting this flag will disable it.
381+
* - ``--disable-taskset``
382+
- bool
383+
- False
384+
- Disable the usage of ``taskset`` command.
385+
266386
.. note::
267387

268388
Environment variables that will be set by this script include the following:
@@ -283,6 +403,22 @@ Knobs for controlling instance number and compute resource allocation are:
283403
| | "oversize_threshold:1,background_thread:true,metadata_thp:auto". |
284404
+------------------+-------------------------------------------------------------------------------------------------+
285405

406+
.. list-table::
407+
:widths: 25 50
408+
:header-rows: 1
409+
* - Environ Variable
410+
- Value
411+
* - LD_PRELOAD
412+
- Depending on knobs you set, <lib>/libiomp5.so, <lib>/libjemalloc.so, <lib>/libtcmalloc.so might be appended to LD_PRELOAD.
413+
* - KMP_AFFINITY
414+
- If libiomp5.so is preloaded, KMP_AFFINITY could be set to ``"granularity=fine,compact,1,0"``.
415+
* - KMP_BLOCKTIME
416+
- If libiomp5.so is preloaded, KMP_BLOCKTIME is set to "1".
417+
* - OMP_NUM_THREADS
418+
- Value of ``ncores_per_instance``
419+
* - MALLOC_CONF
420+
- If libjemalloc.so is preloaded, MALLOC_CONF will be set to ``"oversize_threshold:1,background_thread:true,metadata_thp:auto"``.
421+
286422
Please note that the script respects environment variables set preliminarily. For example, if you have set the environment variables mentioned above before running the script, the values of the variables will not be overwritten by the script.
287423

288424
Conclusion

0 commit comments

Comments
 (0)