Skip to content

Commit 0dfcdba

Browse files
authored
Merge branch 'main' into main
2 parents d3d9681 + 3b97695 commit 0dfcdba

File tree

8 files changed

+157
-19
lines changed

8 files changed

+157
-19
lines changed

_static/img/cat_resized.jpg

39.2 KB
Loading

advanced_source/super_resolution_with_onnxruntime.py

Lines changed: 39 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ def _initialize_weights(self):
107107

108108
# Load pretrained model weights
109109
model_url = 'https://s3.amazonaws.com/pytorch/test_data/export/superres_epoch100-44c6958e.pth'
110-
batch_size = 1 # just a random number
110+
batch_size = 64 # just a random number
111111

112112
# Initialize model with the pretrained weights
113113
map_location = lambda storage, loc: storage
@@ -218,6 +218,32 @@ def to_numpy(tensor):
218218
# ONNX exporter, so please contact us in that case.
219219
#
220220

221+
######################################################################
222+
# Timing Comparison Between Models
223+
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
224+
#
225+
226+
######################################################################
227+
# Since ONNX models optimize for inference speed, running the same
228+
# data on an ONNX model instead of a native pytorch model should result in an
229+
# improvement of up to 2x. Improvement is more pronounced with higher batch sizes.
230+
231+
232+
import time
233+
234+
x = torch.randn(batch_size, 1, 224, 224, requires_grad=True)
235+
236+
start = time.time()
237+
torch_out = torch_model(x)
238+
end = time.time()
239+
print(f"Inference of Pytorch model used {end - start} seconds")
240+
241+
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(x)}
242+
start = time.time()
243+
ort_outs = ort_session.run(None, ort_inputs)
244+
end = time.time()
245+
print(f"Inference of ONNX model used {end - start} seconds")
246+
221247

222248
######################################################################
223249
# Running the model on an image using ONNX Runtime
@@ -301,10 +327,20 @@ def to_numpy(tensor):
301327
# Save the image, we will compare this with the output image from mobile device
302328
final_img.save("./_static/img/cat_superres_with_ort.jpg")
303329

330+
# Save resized original image (without super-resolution)
331+
img = transforms.Resize([img_out_y.size[0], img_out_y.size[1]])(img)
332+
img.save("cat_resized.jpg")
304333

305334
######################################################################
335+
# Here is the comparison between the two images:
336+
#
337+
# .. figure:: /_static/img/cat_resized.jpg
338+
#
339+
# Low-resolution image
340+
#
306341
# .. figure:: /_static/img/cat_superres_with_ort.jpg
307-
# :alt: output\_cat
342+
#
343+
# Image after super-resolution
308344
#
309345
#
310346
# ONNX Runtime being a cross platform engine, you can run it across
@@ -313,7 +349,7 @@ def to_numpy(tensor):
313349
# ONNX Runtime can also be deployed to the cloud for model inferencing
314350
# using Azure Machine Learning Services. More information `here <https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-onnx>`__.
315351
#
316-
# More information about ONNX Runtime's performance `here <https://github.com/microsoft/onnxruntime#high-performance>`__.
352+
# More information about ONNX Runtime's performance `here <https://onnxruntime.ai/docs/performance>`__.
317353
#
318354
#
319355
# For more information about ONNX Runtime `here <https://github.com/microsoft/onnxruntime>`__.

beginner_source/introyt/tensors_deeper_tutorial.py

Lines changed: 1 addition & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -228,18 +228,7 @@
228228
# integer with the ``.to()`` method. Note that ``c`` contains all the same
229229
# values as ``b``, but truncated to integers.
230230
#
231-
# Available data types include:
232-
#
233-
# - ``torch.bool``
234-
# - ``torch.int8``
235-
# - ``torch.uint8``
236-
# - ``torch.int16``
237-
# - ``torch.int32``
238-
# - ``torch.int64``
239-
# - ``torch.half``
240-
# - ``torch.float``
241-
# - ``torch.double``
242-
# - ``torch.bfloat``
231+
# For more information, see the `data types documentation <https://pytorch.org/docs/stable/tensor_attributes.html#torch.dtype>`__.
243232
#
244233
# Math & Logic with PyTorch Tensors
245234
# ---------------------------------
Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
Language Modeling with ``nn.Transformer`` and torchtext
2-
===========================================================
2+
=======================================================
33

44
The content is deprecated.
55

66
.. raw:: html
7-
<meta http-equiv="refresh" content="0; url=https://pytorch.org/tutorials/">
7+
8+
<meta http-equiv="Refresh" content="1; url='https://pytorch.org/tutorials/'" />

docathon-leaderboard.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,39 @@
1+
# 🎉 Docathon H1 2024 Leaderboard 🎉
2+
3+
This is the list of the docathon contributors that have participated and contributed to the PyTorch H1 2024 docathon.
4+
A big shout out to everyone who have participated! We have awarded points for each merged PR.
5+
For the **easy** label, we have awarded 2 points. For the **medium** label, we have awarded 5 points.
6+
For the **advanced** label, we have awarded 10 points. In some cases, we have awarded credit for the PRs that
7+
were not merged or issues that have been closed without a merged PR.
8+
9+
| Author | Points | PR |
10+
|--- | --- | ---|
11+
| ahoblitz | 34 | https://github.com/pytorch/pytorch/pull/128566, https://github.com/pytorch/pytorch/pull/128408, https://github.com/pytorch/pytorch/pull/128171, https://github.com/pytorch/pytorch/pull/128083, https://github.com/pytorch/pytorch/pull/128082, https://github.com/pytorch/pytorch/pull/127983, https://github.com/pytorch/xla/pull/7214 |
12+
| afrittoli | 25 | https://github.com/pytorch/pytorch/pull/128139, https://github.com/pytorch/pytorch/pull/128133, https://github.com/pytorch/pytorch/pull/128132, https://github.com/pytorch/pytorch/pull/128129, https://github.com/pytorch/pytorch/pull/128127 |
13+
| kiszk | 20 | https://github.com/pytorch/pytorch/pull/128337, https://github.com/pytorch/pytorch/pull/128123, https://github.com/pytorch/pytorch/pull/128022, https://github.com/pytorch/pytorch/pull/128312 |
14+
| loganthomas | 19 | https://github.com/pytorch/pytorch/pull/128676, https://github.com/pytorch/pytorch/pull/128192, https://github.com/pytorch/pytorch/pull/128189, https://github.com/pytorch/tutorials/pull/2922, https://github.com/pytorch/tutorials/pull/2910, https://github.com/pytorch/xla/pull/7195 |
15+
| ignaciobartol | 17 | https://github.com/pytorch/pytorch/pull/128741, https://github.com/pytorch/pytorch/pull/128135, https://github.com/pytorch/pytorch/pull/127938, https://github.com/pytorch/tutorials/pull/2936 |
16+
| arunppsg | 17 | https://github.com/pytorch/pytorch/pull/128391, https://github.com/pytorch/pytorch/pull/128021, https://github.com/pytorch/pytorch/pull/128018, https://github.com/pytorch-labs/torchfix/pull/59 |
17+
| alperenunlu | 17 | https://github.com/pytorch/tutorials/pull/2934, https://github.com/pytorch/tutorials/pull/2909, https://github.com/pytorch/pytorch/pull/104043 |
18+
| anandptl84 | 10 | https://github.com/pytorch/pytorch/pull/128196, https://github.com/pytorch/pytorch/pull/128098 |
19+
| GdoongMathew | 10 | https://github.com/pytorch/pytorch/pull/128136, https://github.com/pytorch/pytorch/pull/128051 |
20+
| ZhaoqiongZ | 10 | https://github.com/pytorch/pytorch/pull/127872 |
21+
| ZailiWang | 10 | https://github.com/pytorch/tutorials/pull/2931 |
22+
| jingxu10 | 8 | https://github.com/pytorch/pytorch/pull/127280, https://github.com/pytorch/pytorch/pull/127279, https://github.com/pytorch/pytorch/pull/127278, https://github.com/pytorch/tutorials/pull/2919 |
23+
| sitamgithub-MSIT | 7 | https://github.com/pytorch/tutorials/pull/2900, https://github.com/pytorch/xla/pull/7208 |
24+
| spzala | 5 | https://github.com/pytorch/pytorch/pull/128679, https://github.com/pytorch/pytorch/pull/128657 |
25+
| TharinduRusira | 5 | https://github.com/pytorch/pytorch/pull/128197 |
26+
| zabboud | 5 | https://github.com/pytorch/pytorch/pull/128055 |
27+
| orion160 | 5 | https://github.com/pytorch/tutorials/pull/2912 |
28+
| Ricktho1 | 5 | https://github.com/pytorch/xla/pull/7273 |
29+
| IvanLauLinTiong | 4 | https://github.com/pytorch/pytorch/pull/128526, https://github.com/pytorch/tutorials/pull/2849 |
30+
| sshkhr | 2 | https://github.com/pytorch/pytorch/pull/128155 |
31+
| rk7697 | 2 | https://github.com/pytorch/pytorch/pull/127993 |
32+
| hippocookie | 2 | https://github.com/pytorch/tutorials/pull/2937 |
33+
| diningeachox | 2 | https://github.com/pytorch/tutorials/pull/2935 |
34+
| akhil-maker | 2 | https://github.com/pytorch/tutorials/pull/2899 |
35+
| saurabhkthakur | 2 | https://github.com/pytorch/tutorials/pull/2896 |
36+
137
# 🎉 Docathon H2 2023 Leaderboard 🎉
238

339
This is the list of the docathon contributors that have participated and contributed to the H2 2023 PyTorch docathon.

intermediate_source/reinforcement_q_learning.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@
99
This tutorial shows how to use PyTorch to train a Deep Q Learning (DQN) agent
1010
on the CartPole-v1 task from `Gymnasium <https://gymnasium.farama.org>`__.
1111
12+
You might find it helpful to read the original `Deep Q Learning (DQN) <https://arxiv.org/abs/1312.5602>`__ paper
13+
1214
**Task**
1315
1416
The agent has to decide between two actions - moving the cart left or
@@ -83,7 +85,11 @@
8385
plt.ion()
8486

8587
# if GPU is to be used
86-
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
88+
device = torch.device(
89+
"cuda" if torch.cuda.is_available() else
90+
"mps" if torch.backends.mps.is_available() else
91+
"cpu"
92+
)
8793

8894

8995
######################################################################
@@ -397,7 +403,7 @@ def optimize_model():
397403
# can produce better results if convergence is not observed.
398404
#
399405

400-
if torch.cuda.is_available():
406+
if torch.cuda.is_available() or torch.backends.mps.is_available():
401407
num_episodes = 600
402408
else:
403409
num_episodes = 50

recipes_source/recipes_index.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -317,6 +317,15 @@ Recipes are bite-sized, actionable examples of how to use specific PyTorch featu
317317
:link: ../recipes/torch_compile_user_defined_triton_kernel_tutorial.html
318318
:tags: Model-Optimization
319319

320+
.. Compile Time Caching in ``torch.compile``
321+
322+
.. customcarditem::
323+
:header: Compile Time Caching in ``torch.compile``
324+
:card_description: Learn how to configure compile time caching in ``torch.compile``
325+
:image: ../_static/img/thumbnails/cropped/generic-pytorch-logo.png
326+
:link: ../recipes/torch_compile_caching_tutorial.html
327+
:tags: Model-Optimization
328+
320329
.. Intel(R) Extension for PyTorch*
321330
322331
.. customcarditem::
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
Compile Time Caching in ``torch.compile``
2+
=========================================================
3+
**Authors:** `Oguz Ulgen <https://github.com/oulgen>`_ and `Sam Larsen <https://github.com/masnesral>`_
4+
5+
Introduction
6+
------------------
7+
8+
PyTorch Inductor implements several caches to reduce compilation latency.
9+
This recipe demonstrates how you can configure various parts of the caching in ``torch.compile``.
10+
11+
Prerequisites
12+
-------------------
13+
14+
Before starting this recipe, make sure that you have the following:
15+
16+
* Basic understanding of ``torch.compile``. See:
17+
18+
* `torch.compiler API documentation <https://pytorch.org/docs/stable/torch.compiler.html#torch-compiler>`__
19+
* `Introduction to torch.compile <https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html>`__
20+
21+
* PyTorch 2.4 or later
22+
23+
Inductor Cache Settings
24+
----------------------------
25+
26+
Most of these caches are in-memory, only used within the same process, and are transparent to the user. An exception is the FX graph cache that stores compiled FX graphs. This cache allows Inductor to avoid recompilation across process boundaries when it encounters the same graph with the same Tensor input shapes (and the same configuration). The default implementation stores compiled artifacts in the system temp directory. An optional feature also supports sharing those artifacts within a cluster by storing them in a Redis database.
27+
28+
There are a few settings relevant to caching and to FX graph caching in particular.
29+
The settings are accessible via environment variables listed below or can be hard-coded in Inductor’s config file.
30+
31+
TORCHINDUCTOR_FX_GRAPH_CACHE
32+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
33+
This setting enables the local FX graph cache feature, i.e., by storing artifacts in the host’s temp directory. ``1`` enables, and any other value disables it. By default, the disk location is per username, but users can enable sharing across usernames by specifying ``TORCHINDUCTOR_CACHE_DIR`` (below).
34+
35+
TORCHINDUCTOR_CACHE_DIR
36+
~~~~~~~~~~~~~~~~~~~~~~~~
37+
This setting specifies the location of all on-disk caches. By default, the location is in the system temp directory under ``torchinductor_<username>``, for example, ``/tmp/torchinductor_myusername``.
38+
39+
Note that if ``TRITON_CACHE_DIR`` is not set in the environment, Inductor sets the Triton cache directory to this same temp location, under the Triton subdirectory.
40+
41+
TORCHINDUCTOR_FX_GRAPH_REMOTE_CACHE
42+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
43+
This setting enables the remote FX graph cache feature. The current implementation uses Redis. ``1`` enables caching, and any other value disables it. The following environment variables configure the host and port of the Redis server:
44+
45+
``TORCHINDUCTOR_REDIS_HOST`` (defaults to ``localhost``)
46+
``TORCHINDUCTOR_REDIS_PORT`` (defaults to ``6379``)
47+
48+
Note that if Inductor locates a remote cache entry, it stores the compiled artifact in the local on-disk cache; that local artifact would be served on subsequent runs on the same machine.
49+
50+
TORCHINDUCTOR_AUTOTUNE_REMOTE_CACHE
51+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
52+
This setting enables a remote cache for Inductor’s autotuner. As with the remote FX graph cache, the current implementation uses Redis. ``1`` enables caching, and any other value disables it. The same host / port environment variables listed above apply to this cache.
53+
54+
TORCHINDUCTOR_FORCE_DISABLE_CACHES
55+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56+
Set this value to ``1`` to disable all Inductor caching. This setting is useful for tasks like experimenting with cold-start compile times or forcing recompilation for debugging purposes.
57+
58+
Conclusion
59+
-------------
60+
In this recipe, we have learned that PyTorch Inductor's caching mechanisms significantly reduce compilation latency by utilizing both local and remote caches, which operate seamlessly in the background without requiring user intervention.
61+
Additionally, we explored the various settings and environment variables that allow users to configure and optimize these caching features according to their specific needs.

0 commit comments

Comments
 (0)