Skip to content

Commit 9879170

Browse files
committed
correct trademark sign
1 parent 1bdfb89 commit 9879170

File tree

1 file changed

+13
-13
lines changed

1 file changed

+13
-13
lines changed

recipes_source/xeon_run_cpu.rst

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,20 @@
1-
Optimizing PyTorch Inference with Intel(R) Xeon(R) Scalable Processors
1+
Optimizing PyTorch Inference with Intel® Xeon® Scalable Processors
22
======================================================================
33

4-
There are several configuration options that can impact the performance of PyTorch inference when executed on Intel(R) Xeon(R) Scalable Processors.
4+
There are several configuration options that can impact the performance of PyTorch inference when executed on Intel® Xeon® Scalable Processors.
55
To get peak performance, the ``torch.backends.xeon.run_cpu`` script is provided that optimizes the configuration of thread and memory management.
6-
For thread management, the script configures thread affinity and the preload of Intel(R) OMP library.
6+
For thread management, the script configures thread affinity and the preload of Intel® OMP library.
77
For memory management, it configures NUMA binding and preloads optimized memory allocation libraries, such as TCMalloc and JeMalloc.
88
In addition, the script provides tunable parameters for compute resource allocation in both single instance and multiple instance scenarios,
99
helping the users try out an optimal coordination of resource utilization for the specific workloads.
1010

1111
What You Will Learn
1212
-------------------
1313

14-
* How to utilize tools like ``numactl``, ``taskset``, Intel(R) OpenMP Runtime Library and
14+
* How to utilize tools like ``numactl``, ``taskset``, Intel® OpenMP Runtime Library and
1515
optimized memory allocators such as ``TCMalloc`` and ``JeMalloc`` for enhanced performance.
1616
* How to configure CPU resources and memory management to maximize PyTorch
17-
inference performance on Intel(R) Xeon(R) processors.
17+
inference performance on Intel® Xeon® processors.
1818

1919
Introduction of the Optimizations
2020
---------------------------------
@@ -31,7 +31,7 @@ Local memory access is much faster than remote memory access.
3131

3232
Users can get CPU information with ``lscpu`` command on Linux to learn how many cores and sockets are there on the machine.
3333
Additionally, this command provides NUMA information, such as the distribution of CPU cores.
34-
Below is an example of executing ``lscpu`` on a machine equipped with an Intel(R) Xeon(R) CPU Max 9480:
34+
Below is an example of executing ``lscpu`` on a machine equipped with an Intel® Xeon® CPU Max 9480:
3535

3636
.. code-block:: console
3737
@@ -88,13 +88,13 @@ on CentOS you can run the following command:
8888
8989
$ yum install util-linux
9090
91-
Using Intel(R) OpenMP Runtime Library
91+
Using Intel® OpenMP Runtime Library
9292
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9393

9494
OpenMP is an implementation of multithreading, a method of parallelizing where a primary thread (a series of instructions executed consecutively) forks a specified number of sub-threads and the system divides a task among them. The threads then run concurrently, with the runtime environment allocating threads to different processors.
95-
Users can control OpenMP behaviors with some environment variable settings to fit for their workloads, the settings are read and executed by OMP libraries. By default, PyTorch uses GNU OpenMP Library (GNU libgomp) for parallel computation. On Intel(R) platforms, Intel(R) OpenMP Runtime Library (libiomp) provides OpenMP API specification support. It usually brings more performance benefits compared to libgomp.
95+
Users can control OpenMP behaviors with some environment variable settings to fit for their workloads, the settings are read and executed by OMP libraries. By default, PyTorch uses GNU OpenMP Library (GNU libgomp) for parallel computation. On Intel® platforms, Intel® OpenMP Runtime Library (libiomp) provides OpenMP API specification support. It usually brings more performance benefits compared to libgomp.
9696

97-
The Intel(R) OpenMP Runtime Library can be installed using one of these commands:
97+
The Intel® OpenMP Runtime Library can be installed using one of these commands:
9898

9999
.. code-block:: console
100100
@@ -260,7 +260,7 @@ Knobs for applying or disabling optimizations are:
260260
* - ``--disable-iomp``
261261
- bool
262262
- False
263-
- By default, Intel(R) OpenMP lib will be used if installed. Setting this flag would disable the usage of Intel(R) OpenMP.
263+
- By default, Intel® OpenMP lib will be used if installed. Setting this flag would disable the usage of Intel® OpenMP.
264264

265265
.. note::
266266

@@ -351,12 +351,12 @@ Knobs for controlling instance number and compute resource allocation are:
351351
Conclusion
352352
----------
353353

354-
In this tutorial, we explored a variety of advanced configurations and tools designed to optimize PyTorch inference performance on Intel(R) Xeon(R) Scalable Processors.
354+
In this tutorial, we explored a variety of advanced configurations and tools designed to optimize PyTorch inference performance on Intel® Xeon® Scalable Processors.
355355
By leveraging the ``torch.backends.xeon.run_cpu`` script, we demonstrated how to fine-tune thread and memory management to achieve peak performance.
356-
We covered essential concepts such as NUMA access control, optimized memory allocators like ``TCMalloc`` and ``JeMalloc``, and the use of Intel(R) OpenMP for efficient multithreading.
356+
We covered essential concepts such as NUMA access control, optimized memory allocators like ``TCMalloc`` and ``JeMalloc``, and the use of Intel® OpenMP for efficient multithreading.
357357

358358
Additionally, we provided practical command-line examples to guide you through setting up single and multiple instance scenarios, ensuring optimal resource utilization tailored to specific workloads.
359-
By understanding and applying these techniques, users can significantly enhance the efficiency and speed of their PyTorch applications on Intel(R) Xeon(R) platforms.
359+
By understanding and applying these techniques, users can significantly enhance the efficiency and speed of their PyTorch applications on Intel® Xeon® platforms.
360360

361361
See also:
362362

0 commit comments

Comments
 (0)