Skip to content

Commit 47ae25d

Browse files
committed
Update document
1 parent 7baf4b5 commit 47ae25d

File tree

1 file changed

+10
-6
lines changed

1 file changed

+10
-6
lines changed

recipes_source/amx.rst

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Introduction
66
============
77

88
Advanced Matrix Extensions (AMX), also known as Intel® Advanced Matrix Extensions (Intel® AMX), is an x86 extension,
9-
which introduce two new components: a 2-dimensional register file called 'tiles' and an accelerator of Tile Matrix Multiplication (TMUL) that are able to operate on those tiles.
9+
which introduce two new components: a 2-dimensional register file called 'tiles' and an accelerator of Tile Matrix Multiplication (TMUL) that is able to operate on those tiles.
1010
AMX is designed to work on matrices to accelerate deep-learning training and inference on the CPU and is ideal for workloads like natural-language processing, recommendation systems and image recognition.
1111

1212
Intel advances AI capabilities with 4th Gen Intel® Xeon® Scalable processors and Intel® AMX, delivering 3x to 10x higher inference and training performance versus the previous generation, see `Accelerate AI Workloads with Intel® AMX`_.
@@ -40,14 +40,20 @@ Using ``torch.cpu.amp`` or ``torch.autocast("cpu")`` would utilize AMX accelerat
4040

4141
Note: Use channels last format to get better performance.
4242

43-
- quantization:
43+
- Quantization:
4444

4545
Applying quantization would utilize AMX acceleration for supported operators.
4646

4747
- torch.compile:
4848

4949
When the generated graph model runs into oneDNN implementations with the supported operators, AMX accelerations will be activated.
5050

51+
Note: When using PyTorch on CPUs that support AMX, the framework will automatically enable AMX usage by default.
52+
This means that PyTorch will attempt to leverage the AMX feature whenever possible to speed up matrix multiplication operations.
53+
However, it's important to note that the decision to dispatch to the AMX kernel ultimately depends on
54+
the internal optimization strategy of the oneDNN library and the quantization backend, which PyTorch relies on for performance enhancements.
55+
The specific details of how AMX utilization is handled internally by PyTorch and the oneDNN library may be subject to change with updates and improvements to the framework.
56+
5157

5258
CPU operators that can leverage AMX:
5359
------------------------------------
@@ -78,9 +84,6 @@ CPU operators that can leverage AMX:
7884
``conv_transpose3d``,
7985
``linear``
8086

81-
Note: For quantized linear, whether to leverage AMX depends on the policy of the quantization backend.
82-
83-
8487

8588

8689
Confirm AMX is being utilized
@@ -91,7 +94,8 @@ Set environment variable ``export ONEDNN_VERBOSE=1``, or use ``torch.backends.mk
9194
::
9295

9396
with torch.backends.mkldnn.verbose(torch.backends.mkldnn.VERBOSE_ON):
94-
model(input)
97+
with torch.cpu.amp.autocast():
98+
model(input)
9599

96100
For example, get oneDNN verbose:
97101

0 commit comments

Comments
 (0)