Skip to content

Commit a899070

Browse files
committed
make some updates to torch.compile mode and explain why 2nd run is slower
1 parent 8b1ed83 commit a899070

File tree

3 files changed

+39
-2
lines changed

3 files changed

+39
-2
lines changed

intermediate_source/ipex_test.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
import torch
2+
import torchvision.models as models
3+
model = models.resnet50(weights='ResNet50_Weights.DEFAULT')
4+
model.eval()
5+
data = torch.rand(1, 3, 224, 224)
6+
#################### code changes ####################
7+
import intel_extension_for_pytorch as ipex
8+
# Invoke the following API optionally, to apply frontend optimizations
9+
model = ipex.optimize(model, weights_prepack=False)
10+
compile_model = torch.compile(model, backend="ipex")
11+
######################################################
12+
with torch.no_grad():
13+
print(compile_model(data))

intermediate_source/local_test.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
import torch
2+
from torch.export import dynamic_dim, export
3+
4+
def fn(x, y):
5+
z = x.clone()
6+
z.copy_(y)
7+
return z
8+
9+
inp1 = torch.randn(10, 10)
10+
inp2 = torch.randn(1, 10)
11+
constraints = (
12+
[dynamic_dim(inp1, i) for i in range(inp1.dim())] +
13+
[dynamic_dim(inp2, i) for i in range(inp2.dim())]
14+
)
15+
exp1 = export(fn, (inp1, inp2))
16+
# exp1 = export(fn, (inp1, inp2), constraints=constraints)
17+
exp1.graph_module.print_readable()
18+
# exp(torch.randn(10, 10), torch.randn(10, 10))
19+
exp2 = export(fn, (torch.randn(10, 10), torch.randn(10, 10)))
20+
exp2.graph_module.print_readable()

intermediate_source/torch_compile_tutorial.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -195,11 +195,15 @@ def init_model():
195195
# GPU compute and the observed speedup may be less significant.
196196
#
197197
# You may also see different speedup results depending on the chosen ``mode``
198-
# argument. Since our model and data are small, we want to reduce overhead as
199-
# much as possible, and so we chose ``"reduce-overhead"``. For your own models,
198+
# argument. The ``"reduce-overhead"`` mode uses CUDA graphs to further reduce
199+
# the overhead of Python. For your own models,
200200
# you may need to experiment with different modes to maximize speedup. You can
201201
# read more about modes `here <https://pytorch.org/get-started/pytorch-2.0/#user-experience>`__.
202202
#
203+
# You may might also notice that the second time we run our model with ``torch.compile`` is significantly
204+
# slower than the other runs, although it is much faster than the first run. This is because the ``"reduce-overhead"``
205+
# mode runs a few warm-up iterations for CUDA graphs.
206+
#
203207
# For general PyTorch benchmarking, you can try using ``torch.utils.benchmark`` instead of the ``timed``
204208
# function we defined above. We wrote our own timing function in this tutorial to show
205209
# ``torch.compile``'s compilation latency.

0 commit comments

Comments
 (0)