Skip to content

Commit af8d8b4

Browse files
committed
Merge branch 'deprecating-tutorials' of github.com:pytorch/tutorials into deprecating-tutorials
2 parents 45baad8 + 3c3be62 commit af8d8b4

16 files changed

+134
-235
lines changed

.lycheeignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,8 @@
11
# Used for links to be ignored during the link check.
22
# Add link to file along with comment as to why it should be ignored
3+
4+
#Example link in some of the tutorials that should be ignored
5+
file:///f:/libtmp/some_file
6+
7+
#Ignore links with "file:///" to catch any other example links
8+
file:\/\/\/.*
93.3 KB
Loading
Loading

_static/img/trace_xpu_img.png

88.3 KB
Loading

advanced_source/cpp_export.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
Loading a TorchScript Model in C++
22
=====================================
33

4+
.. note:: TorchScript is no longer in active development.
5+
46
As its name suggests, the primary interface to PyTorch is the Python
57
programming language. While Python is a suitable and preferred language for
68
many scenarios requiring dynamism and ease of iteration, there are equally many

advanced_source/torch-script-parallelism.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
Dynamic Parallelism in TorchScript
22
==================================
33

4+
.. note:: TorchScript is no longer in active development.
5+
46
In this tutorial, we introduce the syntax for doing *dynamic inter-op parallelism*
57
in TorchScript. This parallelism has the following properties:
68

advanced_source/torch_script_custom_classes.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
Extending TorchScript with Custom C++ Classes
22
===============================================
33

4+
.. note:: TorchScript is no longer in active development.
5+
46
This tutorial is a follow-on to the
57
:doc:`custom operator <torch_script_custom_ops>`
68
tutorial, and introduces the API we've built for binding C++ classes into TorchScript

beginner_source/Intro_to_TorchScript_tutorial.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44
55
**Authors:** James Reed (jamesreed@fb.com), Michael Suo (suo@fb.com), rev2
66
7+
.. note:: TorchScript is no longer in active development.
8+
79
This tutorial is an introduction to TorchScript, an intermediate
810
representation of a PyTorch model (subclass of ``nn.Module``) that
911
can then be run in a high-performance environment such as C++.

beginner_source/deploy_seq2seq_hybrid_frontend_tutorial.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
Deploying a Seq2Seq Model with TorchScript
44
==================================================
55
**Author:** `Matthew Inkawhich <https://github.com/MatthewInkawhich>`_
6+
7+
.. note:: TorchScript is no longer in active development.
68
"""
79

810

en-wordlist.txt

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ DyNet
6868
EOS
6969
EPS
7070
Ecker
71+
ExecuTorch
7172
ExportDB
7273
FC
7374
FGSM
@@ -647,4 +648,11 @@ url
647648
colab
648649
sharders
649650
Criteo
650-
torchrec
651+
torchrec
652+
_batch_norm_impl_index
653+
convolution_overrideable
654+
aten
655+
XPU
656+
XPUs
657+
impl
658+
overrideable

prototype_source/torchscript_freezing.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22
Model Freezing in TorchScript
33
=============================
44
5+
.. note:: TorchScript is no longer in active development.
6+
57
In this tutorial, we introduce the syntax for *model freezing* in TorchScript.
68
Freezing is the process of inlining Pytorch module parameters and attributes
79
values into the TorchScript internal representation. Parameter and attribute

recipes_source/distributed_optim_torchscript.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
11
Distributed Optimizer with TorchScript support
22
==============================================================
33

4-
.. note:: Distributed Optimizer with TorchScript support is introduced in PyTorch 1.8
5-
as a beta feature. This API is subject to change.
4+
.. note:: TorchScript is no longer in active development.
65

76
In this recipe, you will learn:
87

recipes_source/profile_with_itt.rst

Lines changed: 33 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,10 @@ Launch Intel® VTune™ Profiler
5858

5959
To verify the functionality, you need to start an Intel® VTune™ Profiler instance. Please check the `Intel® VTune™ Profiler User Guide <https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/launch.html>`__ for steps to launch Intel® VTune™ Profiler.
6060

61+
.. note::
62+
Users can also use web-server-ui by following `Intel® VTune™ Profiler Web Server UI Guide <https://www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2024-1/web-server-ui.html>`__
63+
ex : vtune-backend --web-port=8080 --allow-remote-access --enable-server-profiling
64+
6165
Once you get the Intel® VTune™ Profiler GUI launched, you should see a user interface as below:
6266

6367
.. figure:: /_static/img/itt_tutorial/vtune_start.png
@@ -66,8 +70,8 @@ Once you get the Intel® VTune™ Profiler GUI launched, you should see a user i
6670

6771
Three sample results are available on the left side navigation bar under `sample (matrix)` project. If you do not want profiling results appear in this default sample project, you can create a new project via the button `New Project...` under the blue `Configure Analysis...` button. To start a new profiling, click the blue `Configure Analysis...` button to initiate configuration of the profiling.
6872

69-
Configure Profiling
70-
~~~~~~~~~~~~~~~~~~~
73+
Configure Profiling for CPU
74+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
7175

7276
Once you click the `Configure Analysis...` button, you should see the screen below:
7377

@@ -77,6 +81,16 @@ Once you click the `Configure Analysis...` button, you should see the screen bel
7781

7882
The right side of the windows is split into 3 parts: `WHERE` (top left), `WHAT` (bottom left), and `HOW` (right). With `WHERE`, you can assign a machine where you want to run the profiling on. With `WHAT`, you can set the path of the application that you want to profile. To profile a PyTorch script, it is recommended to wrap all manual steps, including activating a Python environment and setting required environment variables, into a bash script, then profile this bash script. In the screenshot above, we wrapped all steps into the `launch.sh` bash script and profile `bash` with the parameter to be `<path_of_launch.sh>`. On the right side `HOW`, you can choose whatever type that you would like to profile. Intel® VTune™ Profiler provides a bunch of profiling types that you can choose from. Details can be found at `Intel® VTune™ Profiler User Guide <https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/analyze-performance.html>`__.
7983

84+
85+
Configure Profiling for XPU
86+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
87+
Pick GPU Offload Profiling Type instead of Hotspots, and follow the same instructions as CPU to Launch the Application.
88+
89+
.. figure:: /_static/img/itt_tutorial/vtune_xpu_config.png
90+
:width: 100%
91+
:align: center
92+
93+
8094
Read Profiling Result
8195
~~~~~~~~~~~~~~~~~~~~~
8296

@@ -101,6 +115,18 @@ As illustrated on the right side navigation bar, brown portions in the timeline
101115

102116
Of course there are much more enriched sets of profiling features that Intel® VTune™ Profiler provides to help you understand a performance issue. When you understand the root cause of a performance issue, you can get it fixed. More detailed usage instructions are available at `Intel® VTune™ Profiler User Guide <https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/analyze-performance.html>`__.
103117

118+
Read XPU Profiling Result
119+
~~~~~~~~~~~~~~~~~~~~~~~~~
120+
121+
With a successful profiling with ITT, you can open `Platform` tab of the profiling result to see labels in the Intel® VTune™ Profiler timeline.
122+
123+
.. figure:: /_static/img/itt_tutorial/vtune_xpu_timeline.png
124+
:width: 100%
125+
:align: center
126+
127+
128+
The timeline shows the main thread as a `python` thread on the top. Labeled PyTorch operators and customized regions are shown in the main thread row. All operators starting with `aten::` are operators labeled implicitly by the ITT feature in PyTorch. The timeline also shows the GPU Computing Queue on the top, and users could see different XPU Kernels dispatched into GPU Queue.
129+
104130
A short sample code showcasing how to use PyTorch ITT APIs
105131
----------------------------------------------------------
106132

@@ -128,8 +154,12 @@ The topology is formed by two operators, `Conv2d` and `Linear`. Three iterations
128154
return x
129155
130156
def main():
131-
m = ITTSample()
157+
m = ITTSample
158+
# unmark below code for XPU
159+
# m = m.to("xpu")
132160
x = torch.rand(10, 3, 244, 244)
161+
# unmark below code for XPU
162+
# x = x.to("xpu")
133163
with torch.autograd.profiler.emit_itt():
134164
for i in range(3)
135165
# Labeling a region with pair of range_push and range_pop

recipes_source/recipes/profiler_recipe.py

Lines changed: 67 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@
7070
# - ``ProfilerActivity.CPU`` - PyTorch operators, TorchScript functions and
7171
# user-defined code labels (see ``record_function`` below);
7272
# - ``ProfilerActivity.CUDA`` - on-device CUDA kernels;
73+
# - ``ProfilerActivity.XPU`` - on-device XPU kernels;
7374
# - ``record_shapes`` - whether to record shapes of the operator inputs;
7475
# - ``profile_memory`` - whether to report amount of memory consumed by
7576
# model's Tensors;
@@ -160,17 +161,28 @@
160161
# Note the occurrence of ``aten::convolution`` twice with different input shapes.
161162

162163
######################################################################
163-
# Profiler can also be used to analyze performance of models executed on GPUs:
164-
165-
model = models.resnet18().cuda()
166-
inputs = torch.randn(5, 3, 224, 224).cuda()
167-
168-
with profile(activities=[
169-
ProfilerActivity.CPU, ProfilerActivity.CUDA], record_shapes=True) as prof:
164+
# Profiler can also be used to analyze performance of models executed on GPUs and XPUs:
165+
# Users could switch between cpu, cuda and xpu
166+
if torch.cuda.is_available():
167+
device = 'cuda'
168+
elif torch.xpu.is_available():
169+
device = 'xpu'
170+
else:
171+
print('Neither CUDA nor XPU devices are available to demonstrate profiling on acceleration devices')
172+
import sys
173+
sys.exit(0)
174+
175+
activities = [ProfilerActivity.CPU, ProfilerActivity.CUDA, ProfilerActivity.XPU]
176+
sort_by_keyword = device + "_time_total"
177+
178+
model = models.resnet18().to(device)
179+
inputs = torch.randn(5, 3, 224, 224).to(device)
180+
181+
with profile(activities=activities, record_shapes=True) as prof:
170182
with record_function("model_inference"):
171183
model(inputs)
172184

173-
print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=10))
185+
print(prof.key_averages().table(sort_by=sort_by_keyword, row_limit=10))
174186

175187
######################################################################
176188
# (Note: the first use of CUDA profiling may bring an extra overhead.)
@@ -197,6 +209,36 @@
197209
# Self CPU time total: 23.015ms
198210
# Self CUDA time total: 11.666ms
199211
#
212+
######################################################################
213+
214+
215+
######################################################################
216+
# (Note: the first use of XPU profiling may bring an extra overhead.)
217+
218+
######################################################################
219+
# The resulting table output (omitting some columns):
220+
#
221+
# .. code-block:: sh
222+
#
223+
#------------------------------------------------------- ------------ ------------ ------------ ------------ ------------
224+
# Name Self XPU Self XPU % XPU total XPU time avg # of Calls
225+
# ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------
226+
# model_inference 0.000us 0.00% 2.567ms 2.567ms 1
227+
# aten::conv2d 0.000us 0.00% 1.871ms 93.560us 20
228+
# aten::convolution 0.000us 0.00% 1.871ms 93.560us 20
229+
# aten::_convolution 0.000us 0.00% 1.871ms 93.560us 20
230+
# aten::convolution_overrideable 1.871ms 72.89% 1.871ms 93.560us 20
231+
# gen_conv 1.484ms 57.82% 1.484ms 74.216us 20
232+
# aten::batch_norm 0.000us 0.00% 432.640us 21.632us 20
233+
# aten::_batch_norm_impl_index 0.000us 0.00% 432.640us 21.632us 20
234+
# aten::native_batch_norm 432.640us 16.85% 432.640us 21.632us 20
235+
# conv_reorder 386.880us 15.07% 386.880us 6.448us 60
236+
# ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------
237+
# Self CPU time total: 712.486ms
238+
# Self XPU time total: 2.567ms
239+
240+
#
241+
200242

201243
######################################################################
202244
# Note the occurrence of on-device kernels in the output (e.g. ``sgemm_32x32x32_NN``).
@@ -266,17 +308,22 @@
266308
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
267309
#
268310
# Profiling results can be outputted as a ``.json`` trace file:
311+
# Tracing CUDA or XPU kernels
312+
# Users could switch between cpu, cuda and xpu
313+
device = 'cuda'
314+
315+
activities = [ProfilerActivity.CPU, ProfilerActivity.CUDA, ProfilerActivity.XPU]
269316

270-
model = models.resnet18().cuda()
271-
inputs = torch.randn(5, 3, 224, 224).cuda()
317+
model = models.resnet18().to(device)
318+
inputs = torch.randn(5, 3, 224, 224).to(device)
272319

273-
with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA]) as prof:
320+
with profile(activities=activities) as prof:
274321
model(inputs)
275322

276323
prof.export_chrome_trace("trace.json")
277324

278325
######################################################################
279-
# You can examine the sequence of profiled operators and CUDA kernels
326+
# You can examine the sequence of profiled operators and CUDA/XPU kernels
280327
# in Chrome trace viewer (``chrome://tracing``):
281328
#
282329
# .. image:: ../../_static/img/trace_img.png
@@ -287,15 +334,16 @@
287334
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
288335
#
289336
# Profiler can be used to analyze Python and TorchScript stack traces:
337+
sort_by_keyword = "self_" + device + "_time_total"
290338

291339
with profile(
292-
activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
340+
activities=activities,
293341
with_stack=True,
294342
) as prof:
295343
model(inputs)
296344

297345
# Print aggregated stats
298-
print(prof.key_averages(group_by_stack_n=5).table(sort_by="self_cuda_time_total", row_limit=2))
346+
print(prof.key_averages(group_by_stack_n=5).table(sort_by=sort_by_keyword, row_limit=2))
299347

300348
#################################################################################
301349
# The output might look like this (omitting some columns):
@@ -384,15 +432,17 @@
384432
# To send the signal to the profiler that the next step has started, call ``prof.step()`` function.
385433
# The current profiler step is stored in ``prof.step_num``.
386434
#
387-
# The following example shows how to use all of the concepts above:
435+
# The following example shows how to use all of the concepts above for CUDA and XPU Kernels:
436+
437+
sort_by_keyword = "self_" + device + "_time_total"
388438

389439
def trace_handler(p):
390-
output = p.key_averages().table(sort_by="self_cuda_time_total", row_limit=10)
440+
output = p.key_averages().table(sort_by=sort_by_keyword, row_limit=10)
391441
print(output)
392442
p.export_chrome_trace("/tmp/trace_" + str(p.step_num) + ".json")
393443

394444
with profile(
395-
activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
445+
activities=activities,
396446
schedule=torch.profiler.schedule(
397447
wait=1,
398448
warmup=1,
@@ -403,7 +453,6 @@ def trace_handler(p):
403453
model(inputs)
404454
p.step()
405455

406-
407456
######################################################################
408457
# Learn More
409458
# ----------

0 commit comments

Comments
 (0)