You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Advanced Matrix Extensions (AMX), also known as Intel Advanced Matrix Extensions (Intel AMX), is an extensions to the x86 instruction set architecture (ISA).
8
-
AMX is designed to improve performance of deep-learning training and inference on the CPU and is ideal for workloads like natural-language processing, recommendation systems and image recognition.
9
-
AMX supports two data types, INT8 and BFloat16, compared to AVX512 FP32, it can achieve up to 32x and 16x acceleration, respectively.
10
-
For more detailed information of AMX, see `here <https://www.intel.com/content/www/us/en/products/docs/accelerator-engines/advanced-matrix-extensions/overview.html>`_ and `here <https://www.intel.com/content/www/us/en/products/docs/accelerator-engines/advanced-matrix-extensions/ai-solution-brief.html>`_.
8
+
Advanced Matrix Extensions (AMX), also known as Intel® Advanced Matrix Extensions (Intel® AMX), is an extension to the x86 instruction set architecture (ISA).
9
+
Intel advances AI capabilities with 4th Gen Intel® Xeon® Scalable processors and Intel® AMX, delivering 3x to 10x higher inference and training performance versus the previous generation, see `Accelerate AI Workloads with Intel® AMX`_.
10
+
AMX supports two data types, INT8 and BFloat16, compared to AVX512 FP32, it can achieve up to 32x and 16x acceleration, respectively, see figure 6 of `Accelerate AI Workloads with Intel® AMX`_.
11
+
For more detailed information of AMX, see `Intel® AMX Overview`_.
11
12
12
-
Note: AMX will have FP16 support on the next generation of Xeon.
13
13
14
14
AMX in PyTorch
15
-
--------------
15
+
==============
16
16
17
17
PyTorch leverages AMX for computing intensive operators with BFloat16 and quantization with INT8 by its backend oneDNN
18
18
to get higher performance out-of-box on x86 CPUs with AMX support.
19
+
For more detailed information of oneDNN, see `oneDNN`_.
20
+
19
21
The operation is fully handled by oneDNN according to the execution code path generated. I.e. when a supported operation gets executed into oneDNN implementation on a hardware platform with AMX support, AMX instructions will be invoked automatically inside oneDNN.
20
22
No manual operations are required to enable this feature.
21
23
22
-
BF16 CPU ops that can leverage AMX:
23
-
"""""""""""""""""""""""""""""""""""
24
+
- BF16 CPU ops that can leverage AMX:
24
25
25
26
``conv1d``,
26
27
``conv2d``,
@@ -37,8 +38,7 @@ BF16 CPU ops that can leverage AMX:
37
38
``matmul``,
38
39
``_convolution``
39
40
40
-
Quantization CPU ops that can leverage AMX:
41
-
"""""""""""""""""""""""""""""""""""
41
+
- Quantization CPU ops that can leverage AMX:
42
42
43
43
``conv1d``,
44
44
``conv2d``,
@@ -51,77 +51,58 @@ Quantization CPU ops that can leverage AMX:
51
51
``conv_transpose3d``,
52
52
``linear``
53
53
54
-
Preliminary requirements to activate AMX support for PyTorch:
Using `torch.cpu.amp` or `torch.autocast("cpu")` would utilize AMX acceleration.
60
+
Using ``torch.cpu.amp`` or ``torch.autocast("cpu")`` would utilize AMX acceleration.
81
61
62
+
::
82
63
83
-
```
84
-
model = model.to(memory_format=torch.channels_last)
85
-
with torch.cpu.amp.autocast():
86
-
output = model(input)
87
-
```
64
+
model = model.to(memory_format=torch.channels_last)
65
+
with torch.cpu.amp.autocast():
66
+
output = model(input)
88
67
68
+
Note: Use channels last format to get better performance.
89
69
90
-
For quantization:
91
-
'''''''''''''''''
70
+
- quantization:
92
71
93
72
Applying quantization would utilize AMX acceleration.
94
73
95
-
Note: Use channels last format to get better performance.
96
-
97
-
For torch.compile:
98
-
'''''''''''''''''
74
+
- torch.compile:
99
75
100
76
When the generated graph model runs into oneDNN implementations with the supported operators mentioned in lists above, AMX accelerations will be activated.
101
77
102
78
103
79
Confirm AMX is being utilized
104
-
''''''''''''''''''''''
80
+
------------------------------
105
81
106
82
Set environment variable `export ONEDNN_VERBOSE=1` to get oneDNN verbose at runtime.
107
-
For more detailed information of oneDNN, see `here <https://oneapi-src.github.io/oneDNN/index.html>`_.
If we get the verbose of ``avx512_core_amx_bf16`` for BFloat16 or ``avx512_core_amx_int8`` for quantization with INT8, it indicates that AMX is activated.
103
+
104
+
.. _Accelerate AI Workloads with Intel® AMX: https://www.intel.com/content/www/us/en/products/docs/accelerator-engines/advanced-matrix-extensions/ai-solution-brief.html
0 commit comments