Skip to content

Commit adda5fe

Browse files
authored
Add ITT recipe (#2072)
* Add Profiling PyTorch workloads with the Instrumentation and Tracing Technology (ITT) API recipe
1 parent 0863302 commit adda5fe

File tree

5 files changed

+168
-0
lines changed

5 files changed

+168
-0
lines changed
108 KB
Loading
211 KB
Loading
116 KB
Loading

recipes_source/profile_with_itt.rst

Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
Profiling PyTorch workloads with The Instrumentation and Tracing Technology (ITT) API
2+
=====================================================================================
3+
4+
In this recipe, you will learn:
5+
6+
* What is Intel® VTune™ Profiler
7+
* What is Instrumentation and Tracing Technology (ITT) API
8+
* How to visualize PyTorch model hierarchy in Intel® VTune™ Profiler
9+
* A short sample code showcasing how to use PyTorch ITT APIs
10+
11+
12+
Requirements
13+
------------
14+
15+
* PyTorch 1.13 or later
16+
* Intel® VTune™ Profiler
17+
18+
The instructions for installing PyTorch are available at `pytorch.org <https://pytorch.org/get-started/locally/>`__.
19+
20+
21+
What is Intel® VTune™ Profiler
22+
------------------------------
23+
24+
Intel® VTune™ Profiler is a performance analysis tool for serial and multithreaded applications. For those who are familiar with Intel Architecture, Intel® VTune™ Profiler provides a rich set of metrics to help users understand how the application executed on Intel platforms, and thus have an idea where the performance bottleneck is.
25+
26+
More detailed information, including a Getting Started guide, are available `on the Intel website <https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html>`__.
27+
28+
What is Instrumentation and Tracing Technology (ITT) API
29+
--------------------------------------------------------
30+
31+
`The Instrumentation and Tracing Technology API (ITT API) <https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/api-support/instrumentation-and-tracing-technology-apis.html>`_ provided by the Intel® VTune™ Profiler enables target application to generate and control the collection of trace data during its execution.
32+
33+
The advantage of ITT feature is to label time span of individual PyTorch operators, as well as customized regions, on Intel® VTune™ Profiler GUI. When users find anything abnormal, it will be very helpful to locate which operator behaved unexpectedly.
34+
35+
.. note::
36+
37+
The ITT API had been integrated into PyTorch since 1.13. Users don't need to invoke the original ITT C/C++ APIs, but only need to invoke the Python APIs in PyTorch. More detailed information can be found at `PyTorch Docs <https://pytorch.org/docs/stable/profiler.html#intel-instrumentation-and-tracing-technology-apis>`__.
38+
39+
How to visualize PyTorch model hierarchy in Intel® VTune™ Profiler
40+
------------------------------------------------------------------
41+
42+
Two types of usage are provided in PyTorch:
43+
44+
1. Implicit invocation: By default, all operators that are registered by following the PyTorch operator registration mechanism will be labeled by ITT feature automatically when its feature is enabled.
45+
46+
2. Explicit invocation: If customized labeling is needed, users can use APIs mentioned at `PyTorch Docs <https://pytorch.org/docs/stable/profiler.html#intel-instrumentation-and-tracing-technology-apis>`__ explicitly to label a desired range.
47+
48+
49+
To enable explicit invocation, code which are expected to be labeled should be invoked under a `torch.autograd.profiler.emit_itt()` scope. For example:
50+
51+
.. code:: python3
52+
53+
with torch.autograd.profiler.emit_itt():
54+
<code-to-be-profiled...>
55+
56+
Launch Intel® VTune™ Profiler
57+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
58+
59+
To verify the functionality, you need to start an Intel® VTune™ Profiler instance. Please check the `Intel® VTune™ Profiler User Guide <https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/launch.html>`__ for steps to launch Intel® VTune™ Profiler.
60+
61+
Once you get the Intel® VTune™ Profiler GUI launched, you should see a user interface as below:
62+
63+
.. figure:: /_static/img/itt_tutorial/vtune_start.png
64+
:width: 100%
65+
:align: center
66+
67+
Three sample results are available on the left side navigation bar under `sample (matrix)` project. If you do not want profiling results appear in this default sample project, you can create a new project via the button `New Project...` under the blue `Configure Analysis...` button. To start a new profiling, click the blue `Configure Analysis...` button to initiate configuration of the profiling.
68+
69+
Configure Profiling
70+
~~~~~~~~~~~~~~~~~~~
71+
72+
Once you click the `Configure Analysis...` button, you should see the screen below:
73+
74+
.. figure:: /_static/img/itt_tutorial/vtune_config.png
75+
:width: 100%
76+
:align: center
77+
78+
The right side of the windows is split into 3 parts: `WHERE` (top left), `WHAT` (bottom left), and `HOW` (right). With `WHERE`, you can assign a machine where you want to run the profiling on. With `WHAT`, you can set the path of the application that you want to profile. To profile a PyTorch script, it is recommended to wrap all manual steps, including activating a Python environment and setting required environment variables, into a bash script, then profile this bash script. In the screenshot above, we wrapped all steps into the `launch.sh` bash script and profile `bash` with the parameter to be `<path_of_launch.sh>`. On the right side `HOW`, you can choose whatever type that you would like to profile. Intel® VTune™ Profiler provides a bunch of profiling types that you can choose from. Details can be found at `Intel® VTune™ Profiler User Guide <https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/analyze-performance.html>`__.
79+
80+
Read Profiling Result
81+
~~~~~~~~~~~~~~~~~~~~~
82+
83+
With a successful profiling with ITT, you can open `Platform` tab of the profiling result to see labels in the Intel® VTune™ Profiler timeline.
84+
85+
.. figure:: /_static/img/itt_tutorial/vtune_timeline.png
86+
:width: 100%
87+
:align: center
88+
89+
90+
The timeline shows the main thread as a `python` thread on the top, and individual OpenMP threads below. Labeled PyTorch operators and customized regions are shown in the main thread row. All operators starting with `aten::` are operators labeled implicitly by the ITT feature in PyTorch. Labels `iteration_N` are explicitly labeled with specific APIs `torch.profiler.itt.range_push()`, `torch.profiler.itt.range_pop()` or `torch.profiler.itt.range()` scope. Please check the sample code in the next section for details.
91+
92+
.. note::
93+
94+
Red boxes marked with `convolution` and `reorder` are labeled from Intel® oneAPI Deep Neural Network Library (oneDNN).
95+
96+
As illustrated on the right side navigation bar, brown portions in the timeline rows show CPU usage of individual threads. The percerntage of height of a thread row that the brown portion occupies at a timestamp aligns with that of the CPU usage in that thread at that timestamp. Thus, it is intuitive from this timeline to understand the followings:
97+
98+
1. How well CPU cores are utilized on each thread.
99+
2. How balance CPU cores are utilized on all threads. Do all threads have good CPU usage?
100+
3. How well OpenMP threads are synchronized. Are there jitters when starting OpenMP threads or OpenMP threads finish.
101+
102+
Of course there are much more enriched sets of profiling features that Intel® VTune™ Profiler provides to help you understand a performance issue. When you understand the root cause of a performance issue, you can get it fixed. More detailed usage instructions are available at `Intel® VTune™ Profiler User Guide <https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/analyze-performance.html>`__.
103+
104+
A short sample code showcasing how to use PyTorch ITT APIs
105+
----------------------------------------------------------
106+
107+
The sample code below is the script that was used for profiling in the screenshots above.
108+
109+
The topology is formed by two operators, `Conv2d` and `Linear`. Three iterations of inference were performed. Each iteration was labeled by PyTorch ITT APIs as text string `iteration_N`. Either pair of `torch.profile.itt.range_push` and `torch.profile.itt.range_pop` or `torch.profile.itt.range` scope does the customized labeling feature.
110+
111+
.. code:: python3
112+
113+
# sample.py
114+
115+
import torch
116+
import torch.nn as nn
117+
118+
class ITTSample(nn.Module):
119+
def __init__(self):
120+
super(ITTSample, self).__init__()
121+
self.conv = nn.Conv2d(3, 5, 3)
122+
self.linear = nn.Linear(292820, 1000)
123+
124+
def forward(self, x):
125+
x = self.conv(x)
126+
x = x.view(x.shape[0], -1)
127+
x = self.linear(x)
128+
return x
129+
130+
def main():
131+
m = ITTSample()
132+
x = torch.rand(10, 3, 244, 244)
133+
with torch.autograd.profiler.emit_itt():
134+
for i in range(3)
135+
# Labeling a region with pair of range_push and range_pop
136+
#torch.profiler.itt.range_push(f'iteration_{i}')
137+
#m(x)
138+
#torch.profiler.itt.range_pop()
139+
140+
# Labeling a region with range scope
141+
with torch.profiler.itt.range(f'iteration_{i}'):
142+
m(x)
143+
144+
if __name__ == '__main__':
145+
main()
146+
147+
148+
The `launch.sh` bash script, mentioned in the Intel® VTune™ Profiler GUI screenshot, to wrap all manual steps is shown below.
149+
150+
.. code:: bash
151+
152+
# launch.sh
153+
154+
#!/bin/bash
155+
156+
# Retrieve the directory path where the path contains both the sample.py and launch.sh so that this bash script can be invoked from any directory
157+
BASEFOLDER=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
158+
<Activate a Python environment>
159+
cd ${BASEFOLDER}
160+
python sample.py

recipes_source/recipes_index.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,13 @@ Recipes are bite-sized, actionable examples of how to use specific PyTorch featu
116116
:link: ../recipes/recipes/profiler_recipe.html
117117
:tags: Basics
118118

119+
.. customcarditem::
120+
:header: PyTorch Profiler with Instrumentation and Tracing Technology API (ITT API) support
121+
:card_description: Learn how to use PyTorch's profiler with Instrumentation and Tracing Technology API (ITT API) to visualize operators labeling in Intel® VTune™ Profiler GUI
122+
:image: ../_static/img/thumbnails/cropped/profiler.png
123+
:link: ../recipes/recipes/profile_with_itt.html
124+
:tags: Basics
125+
119126
.. Interpretability
120127
121128
.. customcarditem::
@@ -308,6 +315,7 @@ Recipes are bite-sized, actionable examples of how to use specific PyTorch featu
308315
/recipes/recipes/save_load_across_devices
309316
/recipes/recipes/zeroing_out_gradients
310317
/recipes/recipes/profiler_recipe
318+
/recipes/recipes/profile_with_itt
311319
/recipes/recipes/Captum_Recipe
312320
/recipes/recipes/tensorboard_with_pytorch
313321
/recipes/recipes/dynamic_quantization

0 commit comments

Comments
 (0)