intel
diff --git a/‎docs/_templates/footer.html
Lines changed: 3 additions & 0 deletions b/‎docs/_templates/footer.html
Lines changed: 3 additions & 0 deletions
diff --git a/‎docs/_templates/layout.html
Lines changed: 16 additions & 0 deletions b/‎docs/_templates/layout.html
Lines changed: 16 additions & 0 deletions
diff --git a/‎docs/index.rst
Lines changed: 1 addition & 0 deletions b/‎docs/index.rst
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/tutorials/blogs_publications.md
Lines changed: 2 additions & 0 deletions b/‎docs/tutorials/blogs_publications.md
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/tutorials/cheat_sheet.md
Lines changed: 22 additions & 0 deletions b/‎docs/tutorials/cheat_sheet.md
Lines changed: 22 additions & 0 deletions
diff --git a/‎docs/tutorials/examples.md
Lines changed: 146 additions & 46 deletions b/‎docs/tutorials/examples.md
Lines changed: 146 additions & 46 deletions
@@ -0,0 +1,3 @@
+{% extends '!footer.html' %} {% block extrafooter %} {{super}} 
+  <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>| Privacy</a></div>
+{% endblock %}
@@ -0,0 +1,16 @@
+{%- extends "!layout.html" %}
+{% block scripts %}
+<script type="text/javascript">
+  // Configure TMS settings
+  window.wapProfile = 'profile-microsite'; // This is mapped by WAP authorize value
+  window.wapLocalCode = 'us-en'; // Dynamically set per localized site, see mapping table for values
+  window.wapSection = “intel-extension-for-pytorch”; // WAP team will give you a unique section for your site
+  window.wapEnv = 'prod'; // environment to be use in Adobe Tags.
+  // Load TMS
+  (() => {
+        let url = 'https://www.intel.com/content/dam/www/global/wap/main/wap-microsite.js';
+        let po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = url;
+        let s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s);
+  }) ();
+</script>
+{% endblock %}
@@ -31,6 +31,7 @@ Intel® Extension for PyTorch* has been released as an open–source project at
    :maxdepth: 1
 
    tutorials/getting_started
+   tutorials/cheat_sheet
    tutorials/features
    tutorials/releases
    tutorials/installation
 
@@ -1,6 +1,8 @@
 Blogs & Publications
 ====================
 
+* [Accelerate Llama 2 with Intel AI Hardware and Software Optimizations, Jul 2023](https://www.intel.com/content/www/us/en/developer/articles/news/llama2.html)
+* [Accelerate PyTorch\* Training and Inference Performance using Intel® AMX, Jul 2023](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-pytorch-training-inference-on-amx.html)
 * [Intel® Deep Learning Boost (Intel® DL Boost) - Improve Inference Performance of Hugging Face BERT Base Model in Google Cloud Platform (GCP) Technology Guide, Apr 2023](https://networkbuilders.intel.com/solutionslibrary/intel-deep-learning-boost-intel-dl-boost-improve-inference-performance-of-hugging-face-bert-base-model-in-google-cloud-platform-gcp-technology-guide)
 * [Get Started with Intel® Extension for PyTorch\* on GPU | Intel Software, Mar 2023](https://www.youtube.com/watch?v=Id-rE2Q7xZ0&t=1s)
 * [Accelerate PyTorch\* INT8 Inference with New “X86” Quantization Backend on X86 CPUs, Mar 2023](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-pytorch-int8-inf-with-new-x86-backend.html)
 
@@ -0,0 +1,22 @@
+Cheat Sheet
+===========
+
+Get started with Intel® Extension for PyTorch\* using the following commands:
+
+|Description    | Command |
+| -------- | ------- |
+| Basic CPU Installation | `python -m pip install intel_extension_for_pytorch`    |
+| Basic GPU Installation | `pip install torch==<version> -f https://developer.intel.com/ipex-whl-stable-xpu`<br>`pip install intel_extension_for_pytorch==<version> -f https://developer.intel.com/ipex-whl-stable-xpu`|
+| Import Intel® Extension for PyTorch\*    | `import intel_extension_for_pytorch as ipex`|
+| Capture a Verbose Log (Command Prompt)    | `export ONEDNN_VERBOSE=1`   |
+| Optimization During Training   | `model = ...`<br>`optimizer = ...`<br>`model.train()`<br>`model, optimizer = ipex.optimize(model, optimizer=optimizer)`|
+| Optimization During Inference  | `model = ...`<br>`model.eval()`<br>`model = ipex.optimize(model)`   |
+| Optimization Using the Low-Precision Data Type bfloat16 <br>During Training (Default FP32) | `model = ...`<br>`optimizer = ...`<br>`model.train()`<br/><br/>`model, optimizer = ipex.optimize(model, optimizer=optimizer, dtype=torch.bfloat16)`<br/><br/>`with torch.no_grad():`<br>`    with torch.cpu.amp.autocast():`<br>`        model(data)`   |
+| Optimization Using the Low-Precision Data Type bfloat16 <br>During Inference (Default FP32) | `model = ...`<br>`model.eval()`<br/><br/>`model = ipex.optimize(model, dtype=torch.bfloat16)`<br/><br/>`with torch.cpu.amp.autocast():`<br>`    model(data)`
+| [Experimental] Fast BERT Optimization | `from transformers import BertModel`<br>`model = BertModel.from_pretrained("bert-base-uncased")`<br>`model.eval()`<br/><br/>`model = ipex.fast_bert(model, dtype=torch.bfloat16)`|
+| Run CPU Launch Script (Command Prompt): <br>Automate Configuration Settings for Performance | `ipexrun [knobs] <your_pytorch_script> [args]`|
+| [Experimental] Run HyperTune to perform hyperparameter/execution configuration search | `python -m intel_extension_for_pytorch.cpu.hypertune --conf-file <your_conf_file> <your_python_script> [args]`|
+| [Experimental] Enable Graph capture | `model = …`<br>`model.eval()`<br>`model = ipex.optimize(model, graph_mode=True)`|
+| Post-Training INT8 Quantization (Static)  | `model = …`<br>`model.eval()`<br>`data = …`<br/><br/>`qconfig = ipex.quantization.default_static_qconfig`<br/><br/>`prepared_model = ipex.quantization.prepare(model, qconfig, example_inputs=data, anyplace=False)`<br/><br/>`for d in calibration_data_loader():`<br>`  prepared_model(d)`<br/><br/>`converted_model = ipex.quantization.convert(prepared_model)`|
+| Post-Training INT8 Quantization (Dynamic) | `model = …`<br>`model.eval()`<br>`data = …`<br/><br/>`qconfig = ipex.quantization.default_dynamic_qconfig`<br/><br/>`prepared_model = ipex.quantization.prepare(model, qconfig, example_inputs=data)`<br/><br/>`converted_model = ipex.quantization.convert(prepared_model)` |
+| [Experimental] Post-Training INT8 Quantization (Tuning Recipe): | `model = …`<br>`model.eval()`<br>`data = …`<br/><br/>`qconfig = ipex.quantization.default_static_qconfig`<br/><br/>`prepared_model = ipex.quantization.prepare(model, qconfig, example_inputs=data, inplace=False)`<br/><br/>`tuned_model = ipex.quantization.autotune(prepared_model, calibration_data_loader, eval_function, sampling_sizes=[100],`<br>`    accuracy_criterion={'relative': .01}, tuning_time=0)`<br/><br/>`convert_model = ipex.quantization.convert(tuned_model)`|
@@ -1,18 +1,26 @@
 Examples
 ========
 
-**Note:** For examples on CPU, please check [here](../../../cpu/latest/tutorials/examples.html).
+These examples will help you get started using Intel® Extension for PyTorch\*
+with Intel GPUs.
 
-## Training
+**Note:** For examples on Intel CPUs, check these [CPU examples](../../../cpu/latest/tutorials/examples.html).
 
-### Single-instance Training
+**Note:** You need to install torchvision and transformers to run with the examples.
 
-#### Code Changes Highlight
+## Python
 
-There are only a few lines of code change required to use Intel® Extension for PyTorch\* on training, as shown:
-1. `ipex.optimize` function applies optimizations against the model object, as well as an optimizer object.
-2.  Use Auto Mixed Precision (AMP) with BFloat16 data type.
-3.  Convert input tensors, loss criterion and model to XPU.
+### Training
+
+#### Single-Instance Training
+
+##### Code Changes Highlight
+
+You'll only need to change a few lines of codes use Intel® Extension for PyTorch\* on training, as shown:
+
+1. Use the `ipex.optimize` function, which applies optimizations against the model object, as well as an optimizer object.
+2. Use Auto Mixed Precision (AMP) with BFloat16 data type.
+3. Convert input tensors, loss criterion and model to XPU.
 
 The complete examples for Float32 and BFloat16 training on single-instance are illustrated in the sections.
 
@@ -39,131 +47,223 @@ with torch.xpu.amp.autocast(enabled=True, dtype=torch.bfloat16):
 ...
 ```
 
-#### Complete - Float32 Example
+##### Complete - Float32 Example
 
 [//]: # (marker_train_single_fp32_complete)
 [//]: # (marker_train_single_fp32_complete)
 
-#### Complete - BFloat16 Example
+##### Complete - BFloat16 Example
 
 [//]: # (marker_train_single_bf16_complete)
 [//]: # (marker_train_single_bf16_complete)
 
-## Inference
+### Inference
 
-The `optimize` function of Intel® Extension for PyTorch\* applies optimizations to the model, bringing additional performance boosts. For both computer vision workloads and NLP workloads, we recommend applying the `optimize` function against the model object.
+Get additional performance boosts for your computer vision and NLP workloads by
+applying the Intel® Extension for PyTorch\* `optimize` function against your
+model object.
 
-### Float32
+#### Float32
 
-#### Imperative Mode
+##### Imperative Mode
 
-##### Resnet50
+###### Resnet50
 
 [//]: # (marker_inf_rn50_imp_fp32)
 [//]: # (marker_inf_rn50_imp_fp32)
 
-##### BERT
+###### BERT
 
 [//]: # (marker_inf_bert_imp_fp32)
 [//]: # (marker_inf_bert_imp_fp32)
 
-#### TorchScript Mode
+##### TorchScript Mode
 
-We recommend you take advantage of Intel® Extension for PyTorch\* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
+We recommend using Intel® Extension for PyTorch\* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
 
-##### Resnet50
+###### Resnet50
 
 [//]: # (marker_inf_rn50_ts_fp32)
 [//]: # (marker_inf_rn50_ts_fp32)
 
-##### BERT
+###### BERT
 
 [//]: # (marker_inf_bert_ts_fp32)
 [//]: # (marker_inf_bert_ts_fp32)
 
-### BFloat16
+#### BFloat16
 
-Similar to running with Float32, the `optimize` function also works for BFloat16 data type. The only difference is setting `dtype` parameter to `torch.bfloat16`.
+The `optimize` function works for both Float32 and BFloat16 data type. For BFloat16 data type, set the `dtype` parameter to `torch.bfloat16`.
 We recommend using Auto Mixed Precision (AMP) with BFloat16 data type.
 
 
-#### Imperative Mode
+##### Imperative Mode
 
-##### Resnet50
+###### Resnet50
 
 [//]: # (marker_inf_rn50_imp_bf16)
 [//]: # (marker_inf_rn50_imp_bf16)
 
-##### BERT
+###### BERT
 
 [//]: # (marker_inf_bert_imp_bf16)
 [//]: # (marker_inf_bert_imp_bf16)
 
-#### TorchScript Mode
+##### TorchScript Mode
 
-We recommend you take advantage of Intel® Extension for PyTorch\* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
+We recommend using Intel® Extension for PyTorch\* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
 
-##### Resnet50
+###### Resnet50
 
 [//]: # (marker_inf_rn50_ts_bf16)
 [//]: # (marker_inf_rn50_ts_bf16)
 
-##### BERT
+###### BERT
 
 [//]: # (marker_inf_bert_ts_bf16)
 [//]: # (marker_inf_bert_ts_bf16)
 
-### Float16
+#### Float16
 
-Similar to running with Float32, the `optimize` function also works for Float16 data type. The only difference is setting `dtype` parameter to `torch.float16`.
+The `optimize` function works for both Float32 and Float16 data type. For Float16 data type, set the `dtype` parameter to `torch.float16`.
 We recommend using Auto Mixed Precision (AMP) with Float16 data type.
 
-#### Imperative Mode
+##### Imperative Mode
 
-##### Resnet50
+###### Resnet50
 
 [//]: # (marker_inf_rn50_imp_fp16)
 [//]: # (marker_inf_rn50_imp_fp16)
 
-##### BERT
+###### BERT
 
 [//]: # (marker_inf_bert_imp_fp16)
 [//]: # (marker_inf_bert_imp_fp16)
 
-#### TorchScript Mode
+##### TorchScript Mode
 
-We recommend you take advantage of Intel® Extension for PyTorch\* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
+We recommend using Intel® Extension for PyTorch\* with [TorchScript](https://pytorch.org/docs/stable/jit.html) for further optimizations.
 
-##### Resnet50
+###### Resnet50
 
 [//]: # (marker_inf_rn50_ts_fp16)
 [//]: # (marker_inf_rn50_ts_fp16)
 
-##### BERT
+###### BERT
 
 [//]: # (marker_inf_bert_ts_fp16)
 [//]: # (marker_inf_bert_ts_fp16)
 
-### INT8
+#### INT8
 
-We recommend to use TorchScript for INT8 model due to it has wider support for models. Moreover, TorchScript mode would auto enable our optimizations. For TorchScript INT8 model, inserting observer and model quantization is achieved through `prepare_jit` and `convert_jit` separately. Calibration process is required for collecting statistics from real data. After conversion, optimizations like operator fusion would be auto enabled.
+We recommend you use TorchScript for INT8 model because it has wider support for models. TorchScript mode also auto-enables our optimizations. For TorchScript INT8 model, inserting observer and model quantization is achieved through `prepare_jit` and `convert_jit` separately. Calibration process is required for collecting statistics from real data. After conversion, optimizations such as operator fusion would be auto-enabled.
 
 [//]: # (marker_int8_static)
 [//]: # (marker_int8_static)
 
-### torch.xpu.optimize
+#### torch.xpu.optimize
 
-`torch.xpu.optimize` is an alternative of `ipex.optimize` in Intel® Extension for PyTorch\*, to provide identical usage for XPU device only. The motivation of adding this alias is to unify the coding style in user scripts base on torch.xpu modular. Refer to below example for usage.
-
-#### ResNet50 FP32 imperative inference
+The `torch.xpu.optimize` function is an alternative to `ipex.optimize` in Intel® Extension for PyTorch\*, and provides identical usage for XPU devices only. The motivation for adding this alias is to unify the coding style in user scripts base on `torch.xpu` modular. Refer to the example below for usage.
 
 [//]: # (marker_inf_rn50_imp_fp32_alt)
 [//]: # (marker_inf_rn50_imp_fp32_alt)
 
 ## C++
 
-Intel® Extension for PyTorch\* provides its C++ dynamic library to allow users to implement custom DPC++ kernels to run on the XPU device. Refer to the [DPC++ extension](./features/DPC++_Extension.md) for the details.
+To work with libtorch, the PyTorch C++ library, Intel® Extension for PyTorch\* provides its own C++ dynamic library. The C++ library only handles inference workloads, such as service deployment. For regular development, use the Python interface. Unlike using libtorch, no specific code changes are required. Compilation follows the recommended methodology with CMake. Detailed instructions can be found in the [PyTorch tutorial](https://pytorch.org/tutorials/advanced/cpp_export.html#depending-on-libtorch-and-building-the-application).
+
+During compilation, Intel optimizations will be activated automatically after the C++ dynamic library of Intel® Extension for PyTorch\* is linked.
+
+The example code below works for all data types.
+
+### Basic Usage
+
+**example-app.cpp**
+
+[//]: # (marker_cppsdk_sample_app)
+[//]: # (marker_cppsdk_sample_app)
+
+**CMakeLists.txt**
+
+[//]: # (marker_cppsdk_cmake_app)
+[//]: # (marker_cppsdk_cmake_app)
+
+**Command for compilation**
+
+```bash
+$ cd examples/gpu/inference/cpp/example-app
+$ mkdir build
+$ cd build
+$ CC=icx CXX=icpx cmake -DCMAKE_PREFIX_PATH=<LIBPYTORCH_PATH> ..
+$ make
+```
+
+If *Found IPEX* is shown as dynamic library paths, the extension was linked into the binary. This can be verified with the Linux command *ldd*.
+
+```bash
+$ CC=icx CXX=icpx cmake -DCMAKE_PREFIX_PATH=/workspace/libtorch ..
+-- The C compiler identification is IntelLLVM 2023.2.0
+-- The CXX compiler identification is IntelLLVM 2023.2.0
+-- Detecting C compiler ABI info
+-- Detecting C compiler ABI info - done
+-- Check for working C compiler: /workspace/intel/oneapi/compiler/2023.2.0/linux/bin/icx - skipped
+-- Detecting C compile features
+-- Detecting C compile features - done
+-- Detecting CXX compiler ABI info
+-- Detecting CXX compiler ABI info - done
+-- Check for working CXX compiler: /workspace/intel/oneapi/compiler/2023.2.0/linux/bin/icpx - skipped
+-- Detecting CXX compile features
+-- Detecting CXX compile features - done
+-- Looking for pthread.h
+-- Looking for pthread.h - found
+-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
+-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
+-- Found Threads: TRUE
+-- Found Torch: /workspace/libtorch/lib/libtorch.so
+-- Found IPEX: /workspace/libtorch/lib/libintel-ext-pt-cpu.so;/workspace/libtorch/lib/libintel-ext-pt-gpu.so
+-- Configuring done
+-- Generating done
+-- Build files have been written to: examples/gpu/inference/cpp/example-app/build
+
+$ ldd example-app
+        ...
+        libtorch.so => /workspace/libtorch/lib/libtorch.so (0x00007fd5bb927000)
+        libc10.so => /workspace/libtorch/lib/libc10.so (0x00007fd5bb895000)
+        libtorch_cpu.so => /workspace/libtorch/lib/libtorch_cpu.so (0x00007fd5a44d8000)
+        libintel-ext-pt-cpu.so => /workspace/libtorch/lib/libintel-ext-pt-cpu.so (0x00007fd5a1a1b000)
+        libintel-ext-pt-gpu.so => /workspace/libtorch/lib/libintel-ext-pt-gpu.so (0x00007fd5862b0000)
+        ...
+        libmkl_intel_lp64.so.2 => /workspace/intel/oneapi/mkl/2023.2.0/lib/intel64/libmkl_intel_lp64.so.2 (0x00007fd584ab0000)
+        libmkl_core.so.2 => /workspace/intel/oneapi/mkl/2023.2.0/lib/intel64/libmkl_core.so.2 (0x00007fd5806cc000)
+        libmkl_gnu_thread.so.2 => /workspace/intel/oneapi/mkl/2023.2.0/lib/intel64/libmkl_gnu_thread.so.2 (0x00007fd57eb1d000)
+        libmkl_sycl.so.3 => /workspace/intel/oneapi/mkl/2023.2.0/lib/intel64/libmkl_sycl.so.3 (0x00007fd55512c000)
+        libOpenCL.so.1 => /workspace/intel/oneapi/compiler/2023.2.0/linux/lib/libOpenCL.so.1 (0x00007fd55511d000)
+        libsvml.so => /workspace/intel/oneapi/compiler/2023.2.0/linux/compiler/lib/intel64_lin/libsvml.so (0x00007fd553b11000)
+        libirng.so => /workspace/intel/oneapi/compiler/2023.2.0/linux/compiler/lib/intel64_lin/libirng.so (0x00007fd553600000)
+        libimf.so => /workspace/intel/oneapi/compiler/2023.2.0/linux/compiler/lib/intel64_lin/libimf.so (0x00007fd55321b000)
+        libintlc.so.5 => /workspace/intel/oneapi/compiler/2023.2.0/linux/compiler/lib/intel64_lin/libintlc.so.5 (0x00007fd553a9c000)
+        libsycl.so.6 => /workspace/intel/oneapi/compiler/2023.2.0/linux/lib/libsycl.so.6 (0x00007fd552f36000)
+        ...
+```
+
+### Use SYCL codes
+
+Using SYCL codes in an C++ application is also possible. The example below shows how to invoke SYCL codes. You need to explicitly pass `-fsycl` into `CMAKE_CXX_FLAGS`.
+
+**example-usm.cpp**
+
+[//]: # (marker_cppsdk_sample_usm)
+[//]: # (marker_cppsdk_sample_usm)
+
+**CMakeLists.txt**
+
+[//]: # (marker_cppsdk_cmake_usm)
+[//]: # (marker_cppsdk_cmake_usm)
+
+### Customize DPC++ kernels
+
+Intel® Extension for PyTorch\* provides its C++ dynamic library to allow users to implement custom DPC++ kernels to run on the XPU device. Refer to the [DPC++ extension](./features/DPC++_Extension.md) for details.
 
 ## Model Zoo
 
-Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/v2.11.0). A bunch of PyTorch use cases for benchmarking are also available on the [GitHub page](https://github.com/IntelAI/models/tree/v2.11.0#use-cases). Models verified on Intel dGPUs are marked in `Model Documentation` Column. You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
+Use cases that have already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/v2.12.0). A number of PyTorch use cases for benchmarking are also available on the [GitHub page](https://github.com/IntelAI/models/tree/v2.12.0#use-cases). Models verified on Intel GPUs are marked in the `Model Documentation` Column. You can get performance benefits out-of-box by simply running scripts in the Model Zoo.
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+{% extends '!footer.html' %} {% block extrafooter %} {{super}}`
	`2`	`+ <p></p><div><a href='https://www.intel.com/content/www/us/en/privacy/intel-cookie-notice.html' data-cookie-notice='true'>Cookies</a> <a href='https://www.intel.com/content/www/us/en/privacy/intel-privacy-notice.html'>\| Privacy</a></div>`
	`3`	`+{% endblock %}`