You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
change installation guide to rst
update compile_bundle.sh and ipex version number gen
change installation guide to index
doc: review edits to examples documentation (#3016)
Signed-off-by: David B. Kinder <david.b.kinder@intel.com>
Update examples.md typo (#3017)
Migrate cheat sheet from IDZ to github (#3024)
* migrate cheat sheet
* Update index.rst
add footer for cache and privacy policy
update cheat sheet
Add more supported optimizers
add scripts for access metrics collection
DDP doc refinement
add installation guide files back
Update known_issues.md
Update getting_started.md
Emphasize IPEX import order
* Correct Conda command
---------
Co-authored-by: Ye Ting <ting.ye@intel.com>
Copy file name to clipboardExpand all lines: docs/tutorials/blogs_publications.md
+2Lines changed: 2 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,8 @@
1
1
Blogs & Publications
2
2
====================
3
3
4
+
*[Accelerate Llama 2 with Intel AI Hardware and Software Optimizations, Jul 2023](https://www.intel.com/content/www/us/en/developer/articles/news/llama2.html)
5
+
*[Accelerate PyTorch\* Training and Inference Performance using Intel® AMX, Jul 2023](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-pytorch-training-inference-on-amx.html)
4
6
*[Intel® Deep Learning Boost (Intel® DL Boost) - Improve Inference Performance of Hugging Face BERT Base Model in Google Cloud Platform (GCP) Technology Guide, Apr 2023](https://networkbuilders.intel.com/solutionslibrary/intel-deep-learning-boost-intel-dl-boost-improve-inference-performance-of-hugging-face-bert-base-model-in-google-cloud-platform-gcp-technology-guide)
5
7
*[Get Started with Intel® Extension for PyTorch\* on GPU | Intel Software, Mar 2023](https://www.youtube.com/watch?v=Id-rE2Q7xZ0&t=1s)
6
8
*[Accelerate PyTorch\* INT8 Inference with New “X86” Quantization Backend on X86 CPUs, Mar 2023](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-pytorch-int8-inf-with-new-x86-backend.html)
| Import Intel® Extension for PyTorch\*|`import intel_extension_for_pytorch as ipex`|
11
+
| Capture a Verbose Log (Command Prompt) |`export ONEDNN_VERBOSE=1`|
12
+
| Optimization During Training |`model = ...`<br>`optimizer = ...`<br>`model.train()`<br>`model, optimizer = ipex.optimize(model, optimizer=optimizer)`|
13
+
| Optimization During Inference |`model = ...`<br>`model.eval()`<br>`model = ipex.optimize(model)`|
14
+
| Optimization Using the Low-Precision Data Type bfloat16 <br>During Training (Default FP32) |`model = ...`<br>`optimizer = ...`<br>`model.train()`<br/><br/>`model, optimizer = ipex.optimize(model, optimizer=optimizer, dtype=torch.bfloat16)`<br/><br/>`with torch.no_grad():`<br>` with torch.cpu.amp.autocast():`<br>` model(data)`|
15
+
| Optimization Using the Low-Precision Data Type bfloat16 <br>During Inference (Default FP32) | `model = ...`<br>`model.eval()`<br/><br/>`model = ipex.optimize(model, dtype=torch.bfloat16)`<br/><br/>`with torch.cpu.amp.autocast():`<br>` model(data)`
**Note:** For examples on CPU, please check [here](../../../cpu/latest/tutorials/examples.html).
4
+
These examples will help you get started using Intel® Extension for PyTorch\*
5
+
with Intel GPUs.
5
6
6
-
## Training
7
+
**Note:** For examples on Intel CPUs, check these [CPU examples](../../../cpu/latest/tutorials/examples.html).
7
8
8
-
### Single-instance Training
9
+
**Note:** You need to install torchvision and transformers to run with the examples.
9
10
10
-
#### Code Changes Highlight
11
+
##Python
11
12
12
-
There are only a few lines of code change required to use Intel® Extension for PyTorch\* on training, as shown:
13
-
1.`ipex.optimize` function applies optimizations against the model object, as well as an optimizer object.
14
-
2. Use Auto Mixed Precision (AMP) with BFloat16 data type.
15
-
3. Convert input tensors, loss criterion and model to XPU.
13
+
### Training
14
+
15
+
#### Single-Instance Training
16
+
17
+
##### Code Changes Highlight
18
+
19
+
You'll only need to change a few lines of codes use Intel® Extension for PyTorch\* on training, as shown:
20
+
21
+
1. Use the `ipex.optimize` function, which applies optimizations against the model object, as well as an optimizer object.
22
+
2. Use Auto Mixed Precision (AMP) with BFloat16 data type.
23
+
3. Convert input tensors, loss criterion and model to XPU.
16
24
17
25
The complete examples for Float32 and BFloat16 training on single-instance are illustrated in the sections.
18
26
@@ -39,131 +47,223 @@ with torch.xpu.amp.autocast(enabled=True, dtype=torch.bfloat16):
39
47
...
40
48
```
41
49
42
-
#### Complete - Float32 Example
50
+
#####Complete - Float32 Example
43
51
44
52
[//]: #(marker_train_single_fp32_complete)
45
53
[//]: #(marker_train_single_fp32_complete)
46
54
47
-
#### Complete - BFloat16 Example
55
+
#####Complete - BFloat16 Example
48
56
49
57
[//]: #(marker_train_single_bf16_complete)
50
58
[//]: #(marker_train_single_bf16_complete)
51
59
52
-
## Inference
60
+
###Inference
53
61
54
-
The `optimize` function of Intel® Extension for PyTorch\* applies optimizations to the model, bringing additional performance boosts. For both computer vision workloads and NLP workloads, we recommend applying the `optimize` function against the model object.
62
+
Get additional performance boosts for your computer vision and NLP workloads by
63
+
applying the Intel® Extension for PyTorch\*`optimize` function against your
64
+
model object.
55
65
56
-
### Float32
66
+
####Float32
57
67
58
-
#### Imperative Mode
68
+
#####Imperative Mode
59
69
60
-
##### Resnet50
70
+
######Resnet50
61
71
62
72
[//]: #(marker_inf_rn50_imp_fp32)
63
73
[//]: #(marker_inf_rn50_imp_fp32)
64
74
65
-
##### BERT
75
+
######BERT
66
76
67
77
[//]: #(marker_inf_bert_imp_fp32)
68
78
[//]: #(marker_inf_bert_imp_fp32)
69
79
70
-
#### TorchScript Mode
80
+
#####TorchScript Mode
71
81
72
-
We recommend you take advantage of Intel® Extension for PyTorch\* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
82
+
We recommend using Intel® Extension for PyTorch\* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
73
83
74
-
##### Resnet50
84
+
######Resnet50
75
85
76
86
[//]: #(marker_inf_rn50_ts_fp32)
77
87
[//]: #(marker_inf_rn50_ts_fp32)
78
88
79
-
##### BERT
89
+
######BERT
80
90
81
91
[//]: #(marker_inf_bert_ts_fp32)
82
92
[//]: #(marker_inf_bert_ts_fp32)
83
93
84
-
### BFloat16
94
+
####BFloat16
85
95
86
-
Similar to running with Float32, the `optimize` function also works for BFloat16 data type. The only difference is setting`dtype` parameter to `torch.bfloat16`.
96
+
The `optimize` function works for both Float32 and BFloat16 data type. For BFloat16 data type, set the`dtype` parameter to `torch.bfloat16`.
87
97
We recommend using Auto Mixed Precision (AMP) with BFloat16 data type.
88
98
89
99
90
-
#### Imperative Mode
100
+
#####Imperative Mode
91
101
92
-
##### Resnet50
102
+
######Resnet50
93
103
94
104
[//]: #(marker_inf_rn50_imp_bf16)
95
105
[//]: #(marker_inf_rn50_imp_bf16)
96
106
97
-
##### BERT
107
+
######BERT
98
108
99
109
[//]: #(marker_inf_bert_imp_bf16)
100
110
[//]: #(marker_inf_bert_imp_bf16)
101
111
102
-
#### TorchScript Mode
112
+
#####TorchScript Mode
103
113
104
-
We recommend you take advantage of Intel® Extension for PyTorch\* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
114
+
We recommend using Intel® Extension for PyTorch\* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
105
115
106
-
##### Resnet50
116
+
######Resnet50
107
117
108
118
[//]: #(marker_inf_rn50_ts_bf16)
109
119
[//]: #(marker_inf_rn50_ts_bf16)
110
120
111
-
##### BERT
121
+
######BERT
112
122
113
123
[//]: #(marker_inf_bert_ts_bf16)
114
124
[//]: #(marker_inf_bert_ts_bf16)
115
125
116
-
### Float16
126
+
####Float16
117
127
118
-
Similar to running with Float32, the `optimize` function also works for Float16 data type. The only difference is setting`dtype` parameter to `torch.float16`.
128
+
The `optimize` function works for both Float32 and Float16 data type. For Float16 data type, set the`dtype` parameter to `torch.float16`.
119
129
We recommend using Auto Mixed Precision (AMP) with Float16 data type.
120
130
121
-
#### Imperative Mode
131
+
#####Imperative Mode
122
132
123
-
##### Resnet50
133
+
######Resnet50
124
134
125
135
[//]: #(marker_inf_rn50_imp_fp16)
126
136
[//]: #(marker_inf_rn50_imp_fp16)
127
137
128
-
##### BERT
138
+
######BERT
129
139
130
140
[//]: #(marker_inf_bert_imp_fp16)
131
141
[//]: #(marker_inf_bert_imp_fp16)
132
142
133
-
#### TorchScript Mode
143
+
#####TorchScript Mode
134
144
135
-
We recommend you take advantage of Intel® Extension for PyTorch\* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
145
+
We recommend using Intel® Extension for PyTorch\* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
136
146
137
-
##### Resnet50
147
+
######Resnet50
138
148
139
149
[//]: #(marker_inf_rn50_ts_fp16)
140
150
[//]: #(marker_inf_rn50_ts_fp16)
141
151
142
-
##### BERT
152
+
######BERT
143
153
144
154
[//]: #(marker_inf_bert_ts_fp16)
145
155
[//]: #(marker_inf_bert_ts_fp16)
146
156
147
-
### INT8
157
+
####INT8
148
158
149
-
We recommend to use TorchScript for INT8 model due to it has wider support for models. Moreover, TorchScript mode would auto enable our optimizations. For TorchScript INT8 model, inserting observer and model quantization is achieved through `prepare_jit` and `convert_jit` separately. Calibration process is required for collecting statistics from real data. After conversion, optimizations like operator fusion would be autoenabled.
159
+
We recommend you use TorchScript for INT8 model because it has wider support for models. TorchScript mode also auto-enables our optimizations. For TorchScript INT8 model, inserting observer and model quantization is achieved through `prepare_jit` and `convert_jit` separately. Calibration process is required for collecting statistics from real data. After conversion, optimizations such as operator fusion would be auto-enabled.
150
160
151
161
[//]: #(marker_int8_static)
152
162
[//]: #(marker_int8_static)
153
163
154
-
### torch.xpu.optimize
164
+
####torch.xpu.optimize
155
165
156
-
`torch.xpu.optimize` is an alternative of `ipex.optimize` in Intel® Extension for PyTorch\*, to provide identical usage for XPU device only. The motivation of adding this alias is to unify the coding style in user scripts base on torch.xpu modular. Refer to below example for usage.
157
-
158
-
#### ResNet50 FP32 imperative inference
166
+
The `torch.xpu.optimize` function is an alternative to `ipex.optimize` in Intel® Extension for PyTorch\*, and provides identical usage for XPU devices only. The motivation for adding this alias is to unify the coding style in user scripts base on `torch.xpu` modular. Refer to the example below for usage.
159
167
160
168
[//]: #(marker_inf_rn50_imp_fp32_alt)
161
169
[//]: #(marker_inf_rn50_imp_fp32_alt)
162
170
163
171
## C++
164
172
165
-
Intel® Extension for PyTorch\* provides its C++ dynamic library to allow users to implement custom DPC++ kernels to run on the XPU device. Refer to the [DPC++ extension](./features/DPC++_Extension.md) for the details.
173
+
To work with libtorch, the PyTorch C++ library, Intel® Extension for PyTorch\* provides its own C++ dynamic library. The C++ library only handles inference workloads, such as service deployment. For regular development, use the Python interface. Unlike using libtorch, no specific code changes are required. Compilation follows the recommended methodology with CMake. Detailed instructions can be found in the [PyTorch tutorial](https://pytorch.org/tutorials/advanced/cpp_export.html#depending-on-libtorch-and-building-the-application).
174
+
175
+
During compilation, Intel optimizations will be activated automatically after the C++ dynamic library of Intel® Extension for PyTorch\* is linked.
Using SYCL codes in an C++ application is also possible. The example below shows how to invoke SYCL codes. You need to explicitly pass `-fsycl` into `CMAKE_CXX_FLAGS`.
252
+
253
+
**example-usm.cpp**
254
+
255
+
[//]: #(marker_cppsdk_sample_usm)
256
+
[//]: #(marker_cppsdk_sample_usm)
257
+
258
+
**CMakeLists.txt**
259
+
260
+
[//]: #(marker_cppsdk_cmake_usm)
261
+
[//]: #(marker_cppsdk_cmake_usm)
262
+
263
+
### Customize DPC++ kernels
264
+
265
+
Intel® Extension for PyTorch\* provides its C++ dynamic library to allow users to implement custom DPC++ kernels to run on the XPU device. Refer to the [DPC++ extension](./features/DPC++_Extension.md) for details.
166
266
167
267
## Model Zoo
168
268
169
-
Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/v2.11.0). A bunch of PyTorch use cases for benchmarking are also available on the [GitHub page](https://github.com/IntelAI/models/tree/v2.11.0#use-cases). Models verified on Intel dGPUs are marked in `Model Documentation` Column. You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
269
+
Use cases that have already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/v2.12.0). A number of PyTorch use cases for benchmarking are also available on the [GitHub page](https://github.com/IntelAI/models/tree/v2.12.0#use-cases). Models verified on Intel GPUs are marked in the `Model Documentation` Column. You can get performance benefits out-of-box by simply running scripts in the Model Zoo.
0 commit comments