1
1
"""
2
2
PyTorch TensorBoard Profiler
3
3
====================================
4
- This recipe explains how to use PyTorch TensorBoard Profiler
5
- and measure the performance bottleneck of the model.
4
+ This recipe demonstrates how to use PyTorch Profiler
5
+ to detect performance bottlenecks of the model.
6
6
7
7
.. note::
8
8
PyTorch 1.8 introduces the new API that will replace the older profiler API
9
9
in the future releases. Check the new API at `this page <https://pytorch.org/docs/master/profiler.html>`__.
10
10
11
11
Introduction
12
12
------------
13
- PyTorch 1.8 includes an updated profiler API that could help user
14
- record both the operators running on CPU side and the CUDA kernels running on GPU side.
15
- Given the profiling information,
16
- we can use this TensorBoard Plugin to visualize it and analyze the performance bottleneck .
13
+ PyTorch 1.8 includes an updated profiler API capable of
14
+ recording the CPU side operations as well as the CUDA kernel launches on the GPU side.
15
+ The profiler can visualize this information
16
+ in TensorBoard Plugin and provide analysis of the performance bottlenecks .
17
17
18
18
In this recipe, we will use a simple Resnet model to demonstrate how to
19
19
use profiler to analyze model performance.
37
37
# 1. Prepare the data and model
38
38
# 2. Use profiler to record execution events
39
39
# 3. Run the profiler
40
- # 4. Use TensorBoard to view and analyze performance
40
+ # 4. Use TensorBoard to view results and analyze performance
41
41
# 5. Improve performance with the help of profiler
42
42
#
43
43
# 1. Prepare the data and model
44
44
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
45
45
#
46
- # Firstly, let’s import all necessary libraries:
46
+ # First, import all necessary libraries:
47
47
#
48
48
49
49
import torch
57
57
58
58
######################################################################
59
59
# Then prepare the input data. For this tutorial, we use the CIFAR10 dataset.
60
- # We transform it to desired format and use DataLoader to load each batch.
60
+ # Transform it to the desired format and use DataLoader to load each batch.
61
61
62
62
transform = T .Compose (
63
63
[T .Resize (224 ),
64
64
T .ToTensor (),
65
65
T .Normalize ((0.5 , 0.5 , 0.5 ), (0.5 , 0.5 , 0.5 ))])
66
66
train_set = torchvision .datasets .CIFAR10 (root = './data' , train = True , download = True , transform = transform )
67
- train_loader = torch .utils .data .DataLoader (train_set , batch_size = 32 , shuffle = True ) # num_workers=4
67
+ train_loader = torch .utils .data .DataLoader (train_set , batch_size = 32 , shuffle = True )
68
68
69
69
######################################################################
70
- # Let’s create an instance of a Resnet model, an instance of loss, and an instance of optimizer.
71
- # To run on GPU, we put model and loss to GPU device.
70
+ # Next, create Resnet model, loss function , and optimizer objects .
71
+ # To run on GPU, move model and loss to GPU device.
72
72
73
73
device = torch .device ("cuda:0" )
74
74
model = torchvision .models .resnet18 (pretrained = True ).cuda (device )
78
78
79
79
80
80
######################################################################
81
- # We define the training step for each batch of input data.
81
+ # Define the training step for each batch of input data.
82
82
83
83
def train (data ):
84
84
inputs , labels = data [0 ].to (device = device ), data [1 ].to (device = device )
@@ -93,7 +93,7 @@ def train(data):
93
93
# 2. Use profiler to record execution events
94
94
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
95
95
#
96
- # The profiler is enabled through the context manager and accepts a number of parameters,
96
+ # The profiler is enabled through the context manager and accepts several parameters,
97
97
# some of the most useful are:
98
98
#
99
99
# - ``schedule`` - callable that takes step (int) as a single parameter
@@ -111,8 +111,8 @@ def train(data):
111
111
# During ``active`` steps, the profiler works and record events.
112
112
# - ``on_trace_ready`` - callable that is called at the end of each cycle;
113
113
# In this example we use ``torch.profiler.tensorboard_trace_handler`` to generate result files for TensorBoard.
114
- # After profiling, result files can be generated in the ``./log/resnet18`` directory,
115
- # which could be specified to open and analyzed in TensorBoard.
114
+ # After profiling, result files will be saved into the ``./log/resnet18`` directory.
115
+ # Specify this directory as a ``logdir`` parameter to analyze profile in TensorBoard.
116
116
# - ``record_shapes`` - whether to record shapes of the operator inputs.
117
117
118
118
with torch .profiler .profile (
@@ -135,18 +135,18 @@ def train(data):
135
135
136
136
137
137
######################################################################
138
- # 4. Use TensorBoard to view and analyze performance
138
+ # 4. Use TensorBoard to view results and analyze performance
139
139
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
140
140
#
141
- # This requires the latest versions of PyTorch TensorBoard Profiler .
141
+ # Install PyTorch Profiler TensorBoard Plugin .
142
142
#
143
143
# ::
144
144
#
145
145
# pip install torch_tb_profiler
146
146
#
147
147
148
148
######################################################################
149
- # Launch the TensorBoard Profiler .
149
+ # Launch the TensorBoard.
150
150
#
151
151
# ::
152
152
#
@@ -158,21 +158,21 @@ def train(data):
158
158
#
159
159
# ::
160
160
#
161
- # http://localhost:6006/#torch_profiler
161
+ # http://localhost:6006/#pytorch_profiler
162
162
#
163
163
164
164
######################################################################
165
- # The profiler’s front page is as below.
165
+ # You should see Profiler plugin page as shown below.
166
166
#
167
167
# .. image:: ../../_static/img/profiler_overview1.png
168
168
# :scale: 25 %
169
169
#
170
- # This overview shows a high-level summary of performance.
170
+ # The overview shows a high-level summary of model performance.
171
171
#
172
- # The "Step Time Breakdown" break the time spent on each step into multiple categories.
173
- # In this example, you can see the ``DataLoader`` costs a lot of time .
172
+ # The "Step Time Breakdown" shows distribution of time spent in each step over different categories of execution .
173
+ # In this example, you can see the ``DataLoader`` overhead is significant .
174
174
#
175
- # The bottom "Performance Recommendation" leverages the profiling result
175
+ # The bottom "Performance Recommendation" uses the profiling data
176
176
# to automatically highlight likely bottlenecks,
177
177
# and gives you actionable optimization suggestions.
178
178
#
@@ -187,7 +187,7 @@ def train(data):
187
187
# The GPU kernel view shows all kernels’ time spent on GPU.
188
188
#
189
189
# The trace view shows timeline of profiled operators and GPU kernels.
190
- # You can select it to see detail as below.
190
+ # You can select it to see details as below.
191
191
#
192
192
# .. image:: ../../_static/img/profiler_trace_view1.png
193
193
# :scale: 25 %
0 commit comments