You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2024-05-21-perfboost-windows-cpu.md
+6-6Lines changed: 6 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -8,21 +8,21 @@ The challenge of PyTorch's lower CPU performance on Windows compared to Linux ha
8
8
9
9
In version 2.0, PyTorch on Windows with CPU directly utilizes the default malloc mechanism of Windows, which, compared to the malloc used in PyTorch Linux version 2.0, significantly increases the time for memory allocation, resulting in decreased performance. Intel engineer Xu Han took the initiative to replace the original Windows malloc mechanism, which PyTorch automatically calls, with another well-known malloc library developed by Microsoft, known as mimalloc. This replacement of malloc has already been released with Pytorch v2.1 and can significantly improve PyTorch's performance on Windows CPUs (See the following graph).
10
10
11
-

12
-
***Image 1: Relative throughput improvement achieved by upgrading from Windows PyTorch version 2.0 to 2.1 (higher is better)***.
11
+
{:style="width:100%;"}
12
+
_Figire 1: Relative throughput improvement achieved by upgrading from Windows PyTorch version 2.0 to 2.1 (higher is better)._
13
13
**Note**: The performance is measured on Intel Core 13th Gen i7-13700H with 32G Memory.
14
14
15
15
16
16
From this graph, it's evident that PyTorch on Windows CPU showcases significant performance improvements. The variations in performance enhancements across different workloads mainly stem from varying proportions of different operations within distinct models, consequently affecting the frequency of memory access operations. It shows a comparatively smaller enhancement in BERT model performance, while there is a more substantial improvement in ResNet50 and MobileNetv3 Large model performances.
17
17
18
18
On a high-performance CPU, memory allocation becomes a performance bottleneck. This is also why addressing this issue has led to such significant performance improvements.
19
19
20
-

21
-
***Image 2.1: Relative performance of Windows vs Linux with Pytorch version 2.0 (higher is better)***.
20
+
{:style="width:100%;"}
21
+
_Figure 2.1: Relative performance of Windows vs Linux with Pytorch version 2.0 (higher is better)._
22
22
**Note**: The performance is measured on Intel Core 13th Gen i7-13700H with 32G Memory.
23
23
24
-

25
-
***Image 2.2: Relative performance of Windows vs Linux with Pytorch version 2.1 (higher is better)***.
24
+
{:style="width:100%;"}
25
+
_Figure 2.2: Relative performance of Windows vs Linux with Pytorch version 2.1 (higher is better)._
26
26
**Note**: The performance is measured on Intel Core 13th Gen i7-13700H with 32G Memory.
27
27
28
28
As shown in the graphs, it is evident that PyTorch's performance on Windows CPUs can significantly improved. However, there is still a noticeable gap when compared to its performance on Linux. This can be attributed to several factors, including the fact that malloc has not yet fully reached the performance level of Linux, among other reasons. Intel engineers will continue to delve into this issue, collaborating with Meta engineers, to reduce the performance gap of PyTorch between Windows and Linux.
0 commit comments