Skip to content

Commit 6ea7c10

Browse files
committed
corrections
1 parent 280fcdc commit 6ea7c10

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

recipes_source/xeon_run_cpu.rst

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,14 @@ For memory management, it configures NUMA binding and preloads optimized memory
88
In addition, the script provides tunable parameters for compute resource allocation in both single instance and multiple instance scenarios,
99
helping the users try out an optimal coordination of resource utilization for the specific workloads.
1010

11-
What you will learn
11+
What You Will Learn
1212
-------------------
1313

14-
* How to utilize tools like ``numactl``, ``taskset``, Intel(R) OpenMP Runtime Library and optimized memory allocators such as TCMalloc and JeMalloc for enhanced performance.
15-
* How to configure CPU cores and memory management to maximize PyTorch inference performance on Intel(R) Xeon(R) processors.
14+
* How to utilize tools like ``numactl``, ``taskset``, Intel(R) OpenMP Runtime Library and optimized memory allocators such as ``TCMalloc`` and ``JeMalloc`` for enhanced performance.
15+
* How to configure CPU resources and memory management to maximize PyTorch inference performance on Intel(R) Xeon(R) processors.
1616

17-
Introduction for Optimizations
18-
------------------------------
17+
Introduction of the Optimizations
18+
---------------------------------
1919

2020
Applying NUMA Access Control
2121
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -107,9 +107,9 @@ or
107107
Choosing an Optimized Memory Allocator
108108
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
109109

110-
Memory allocator plays an important role from performance perspective as well. A more efficient memory usage reduces overhead on unnecessary memory allocations or destructions, and thus results in a faster execution. From practical experiences, for deep learning workloads, JeMalloc or TCMalloc can get better performance by reusing memory as much as possible than default malloc function.
110+
Memory allocator plays an important role from performance perspective as well. A more efficient memory usage reduces overhead on unnecessary memory allocations or destructions, and thus results in a faster execution. From practical experiences, for deep learning workloads, ``TCMalloc`` or ``JeMalloc`` can get better performance by reusing memory as much as possible than default malloc operations.
111111

112-
You can install TCMalloc by running the following command on Ubuntu:
112+
You can install ``TCMalloc`` by running the following command on Ubuntu:
113113

114114
.. code-block:: console
115115
@@ -127,7 +127,7 @@ In a conda environment, it can also be installed by running:
127127
128128
$ conda install conda-forge::gperftools
129129
130-
On Ubuntu JeMalloc can be installed by this command:
130+
On Ubuntu ``JeMalloc`` can be installed by this command:
131131

132132
.. code-block:: console
133133
@@ -289,8 +289,8 @@ Conclusion
289289
----------
290290

291291
In this tutorial, we explored a variety of advanced configurations and tools designed to optimize PyTorch inference performance on Intel(R) Xeon(R) Scalable Processors.
292-
By leveraging the ``torch.backends.xeon.run_cpu script``, we demonstrated how to fine-tune thread and memory management to achieve peak performance.
293-
We covered essential concepts such as NUMA access control, optimized memory allocators like TCMalloc and JeMalloc, and the use of Intel(R) OpenMP for efficient multithreading.
292+
By leveraging the ``torch.backends.xeon.run_cpu`` script, we demonstrated how to fine-tune thread and memory management to achieve peak performance.
293+
We covered essential concepts such as NUMA access control, optimized memory allocators like ``TCMalloc`` and ``JeMalloc``, and the use of Intel(R) OpenMP for efficient multithreading.
294294

295295
Additionally, we provided practical command-line examples to guide you through setting up single and multiple instance scenarios, ensuring optimal resource utilization tailored to specific workloads.
296296
By understanding and applying these techniques, users can significantly enhance the efficiency and speed of their PyTorch applications on Intel(R) Xeon(R) platforms.

0 commit comments

Comments
 (0)