You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: recipes_source/amx.rst
+10-6Lines changed: 10 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ Introduction
6
6
============
7
7
8
8
Advanced Matrix Extensions (AMX), also known as Intel® Advanced Matrix Extensions (Intel® AMX), is an x86 extension,
9
-
which introduce two new components: a 2-dimensional register file called 'tiles' and an accelerator of Tile Matrix Multiplication (TMUL) that are able to operate on those tiles.
9
+
which introduce two new components: a 2-dimensional register file called 'tiles' and an accelerator of Tile Matrix Multiplication (TMUL) that is able to operate on those tiles.
10
10
AMX is designed to work on matrices to accelerate deep-learning training and inference on the CPU and is ideal for workloads like natural-language processing, recommendation systems and image recognition.
11
11
12
12
Intel advances AI capabilities with 4th Gen Intel® Xeon® Scalable processors and Intel® AMX, delivering 3x to 10x higher inference and training performance versus the previous generation, see `Accelerate AI Workloads with Intel® AMX`_.
@@ -40,14 +40,20 @@ Using ``torch.cpu.amp`` or ``torch.autocast("cpu")`` would utilize AMX accelerat
40
40
41
41
Note: Use channels last format to get better performance.
42
42
43
-
- quantization:
43
+
- Quantization:
44
44
45
45
Applying quantization would utilize AMX acceleration for supported operators.
46
46
47
47
- torch.compile:
48
48
49
49
When the generated graph model runs into oneDNN implementations with the supported operators, AMX accelerations will be activated.
50
50
51
+
Note: When using PyTorch on CPUs that support AMX, the framework will automatically enable AMX usage by default.
52
+
This means that PyTorch will attempt to leverage the AMX feature whenever possible to speed up matrix multiplication operations.
53
+
However, it's important to note that the decision to dispatch to the AMX kernel ultimately depends on
54
+
the internal optimization strategy of the oneDNN library and the quantization backend, which PyTorch relies on for performance enhancements.
55
+
The specific details of how AMX utilization is handled internally by PyTorch and the oneDNN library may be subject to change with updates and improvements to the framework.
56
+
51
57
52
58
CPU operators that can leverage AMX:
53
59
------------------------------------
@@ -78,9 +84,6 @@ CPU operators that can leverage AMX:
78
84
``conv_transpose3d``,
79
85
``linear``
80
86
81
-
Note: For quantized linear, whether to leverage AMX depends on the policy of the quantization backend.
82
-
83
-
84
87
85
88
86
89
Confirm AMX is being utilized
@@ -91,7 +94,8 @@ Set environment variable ``export ONEDNN_VERBOSE=1``, or use ``torch.backends.mk
91
94
::
92
95
93
96
with torch.backends.mkldnn.verbose(torch.backends.mkldnn.VERBOSE_ON):
0 commit comments