You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: recipes_source/amx.rst
+35-37Lines changed: 35 additions & 37 deletions
Original file line number
Diff line number
Diff line change
@@ -22,74 +22,72 @@ PyTorch leverages AMX for computing intensive operators with BFloat16 and quanti
22
22
to get higher performance out-of-box on x86 CPUs with AMX support.
23
23
For more detailed information of oneDNN, see `oneDNN`_.
24
24
25
-
The operation is fully handled by oneDNN according to the execution code path generated. I.e. when a supported operation gets executed into oneDNN implementation on a hardware platform with AMX support, AMX instructions will be invoked automatically inside oneDNN.
25
+
The operation is fully handled by oneDNN according to the execution code path generated. For example, when a supported operation gets executed into oneDNN implementation on a hardware platform with AMX support, AMX instructions will be invoked automatically inside oneDNN.
26
26
Since oneDNN is the default acceleration library for PyTorch CPU, no manual operations are required to enable the AMX support.
27
27
28
28
Guidelines of leveraging AMX with workloads
29
29
-------------------------------------------
30
30
31
+
This section provides guidelines on how to leverage AMX with various workloads.
32
+
31
33
- BFloat16 data type:
32
34
33
-
Using ``torch.cpu.amp`` or ``torch.autocast("cpu")`` would utilize AMX acceleration for supported operators.
35
+
- Using ``torch.cpu.amp`` or ``torch.autocast("cpu")`` would utilize AMX acceleration for supported operators.
34
36
35
-
::
37
+
::
36
38
37
-
model = model.to(memory_format=torch.channels_last)
38
-
with torch.cpu.amp.autocast():
39
-
output = model(input)
39
+
model = model.to(memory_format=torch.channels_last)
40
+
with torch.cpu.amp.autocast():
41
+
output = model(input)
40
42
41
-
Note: Use channels last format to get better performance.
43
+
.. note::Use ``torch.channels_last`` memory format to get better performance.
42
44
43
45
- Quantization:
44
46
45
-
Applying quantization would utilize AMX acceleration for supported operators.
47
+
- Applying quantization would utilize AMX acceleration for supported operators.
46
48
47
49
- torch.compile:
48
50
49
-
When the generated graph model runs into oneDNN implementations with the supported operators, AMX accelerations will be activated.
51
+
- When the generated graph model runs into oneDNN implementations with the supported operators, AMX accelerations will be activated.
50
52
51
-
Note: When using PyTorch on CPUs that support AMX, the framework will automatically enable AMX usage by default.
52
-
This means that PyTorch will attempt to leverage the AMX feature whenever possible to speed up matrix multiplication operations.
53
-
However, it's important to note that the decision to dispatch to the AMX kernel ultimately depends on
54
-
the internal optimization strategy of the oneDNN library and the quantization backend, which PyTorch relies on for performance enhancements.
55
-
The specific details of how AMX utilization is handled internally by PyTorch and the oneDNN library may be subject to change with updates and improvements to the framework.
53
+
.. note:: When using PyTorch on CPUs that support AMX, the framework will automatically enable AMX usage by default. This means that PyTorch will attempt to leverage the AMX feature whenever possible to speed up matrix multiplication operations. However, it's important to note that the decision to dispatch to the AMX kernel ultimately depends on the internal optimization strategy of the oneDNN library and the quantization backend, which PyTorch relies on for performance enhancements. The specific details of how AMX utilization is handled internally by PyTorch and the oneDNN library may be subject to change with updates and improvements to the framework.
56
54
57
55
58
56
CPU operators that can leverage AMX:
59
57
------------------------------------
60
58
61
-
- BF16 CPU ops that can leverage AMX:
59
+
BF16 CPU ops that can leverage AMX:
62
60
63
-
``conv1d``,
64
-
``conv2d``,
65
-
``conv3d``,
66
-
``conv_transpose1d``,
67
-
``conv_transpose2d``,
68
-
``conv_transpose3d``,
69
-
``bmm``,
70
-
``mm``,
71
-
``baddbmm``,
72
-
``addmm``,
73
-
``addbmm``,
74
-
``linear``,
75
-
``matmul``,
61
+
- ``conv1d``
62
+
- ``conv2d``
63
+
- ``conv3d``
64
+
- ``conv_transpose1d``
65
+
- ``conv_transpose2d``
66
+
- ``conv_transpose3d``
67
+
- ``bmm``
68
+
- ``mm``
69
+
- ``baddbmm``
70
+
- ``addmm``
71
+
- ``addbmm``
72
+
- ``linear``
73
+
- ``matmul``
76
74
77
-
- Quantization CPU ops that can leverage AMX:
75
+
Quantization CPU ops that can leverage AMX:
78
76
79
-
``conv1d``,
80
-
``conv2d``,
81
-
``conv3d``,
82
-
``conv_transpose1d``,
83
-
``conv_transpose2d``,
84
-
``conv_transpose3d``,
85
-
``linear``
77
+
- ``conv1d``
78
+
- ``conv2d``
79
+
- ``conv3d``
80
+
- ``conv_transpose1d``
81
+
- ``conv_transpose2d``
82
+
- ``conv_transpose3d``
83
+
- ``linear``
86
84
87
85
88
86
89
87
Confirm AMX is being utilized
90
88
------------------------------
91
89
92
-
Set environment variable ``export ONEDNN_VERBOSE=1``, or use ``torch.backends.mkldnn.verbose`` to flexibly enable oneDNN to dump verbose messages.
90
+
Set environment variable ``export ONEDNN_VERBOSE=1``, or use ``torch.backends.mkldnn.verbose`` to enable oneDNN to dump verbose messages.
0 commit comments