|
3 | 3 | """
|
4 | 4 | .. meta::
|
5 | 5 | :description: An end-to-end example of how to use AOTInductor for Python runtime.
|
6 |
| - :keywords: torch.export, AOTInductor, torch._inductor.aot_compile, torch._export.aot_load |
| 6 | + :keywords: torch.export, AOTInductor, torch._inductor.aoti_compile_and_package, aot_compile, torch._export.aoti_load_package |
7 | 7 |
|
8 | 8 | ``torch.export`` AOTInductor Tutorial for Python runtime (Beta)
|
9 | 9 | ===============================================================
|
|
14 | 14 | #
|
15 | 15 | # .. warning::
|
16 | 16 | #
|
17 |
| -# ``torch._inductor.aot_compile`` and ``torch._export.aot_load`` are in Beta status and are subject to backwards compatibility |
18 |
| -# breaking changes. This tutorial provides an example of how to use these APIs for model deployment using Python runtime. |
| 17 | +# ``torch._inductor.aoti_compile_and_package`` and |
| 18 | +# ``torch._inductor.aoti_load_package`` are in Beta status and are subject |
| 19 | +# to backwards compatibility breaking changes. This tutorial provides an |
| 20 | +# example of how to use these APIs for model deployment using Python |
| 21 | +# runtime. |
19 | 22 | #
|
20 |
| -# It has been shown `previously <https://pytorch.org/docs/stable/torch.compiler_aot_inductor.html#>`__ how AOTInductor can be used |
21 |
| -# to do Ahead-of-Time compilation of PyTorch exported models by creating |
22 |
| -# a shared library that can be run in a non-Python environment. |
23 |
| -# |
24 |
| -# |
25 |
| -# In this tutorial, you will learn an end-to-end example of how to use AOTInductor for Python runtime. |
26 |
| -# We will look at how to use :func:`torch._inductor.aot_compile` along with :func:`torch.export.export` to generate a |
27 |
| -# shared library. Additionally, we will examine how to execute the shared library in Python runtime using :func:`torch._export.aot_load`. |
28 |
| -# You will learn about the speed up seen in the first inference time using AOTInductor, especially when using |
29 |
| -# ``max-autotune`` mode which can take some time to execute. |
| 23 | +# It has been shown `previously |
| 24 | +# <https://pytorch.org/docs/stable/torch.compiler_aot_inductor.html#>`__ how |
| 25 | +# AOTInductor can be used to do Ahead-of-Time compilation of PyTorch exported |
| 26 | +# models by creating an artifact that can be run in a non-Python environment. |
| 27 | +# In this tutorial, you will learn an end-to-end example of how to use |
| 28 | +# AOTInductor for Python runtime. |
30 | 29 | #
|
31 | 30 | # **Contents**
|
32 | 31 | #
|
|
36 | 35 | ######################################################################
|
37 | 36 | # Prerequisites
|
38 | 37 | # -------------
|
39 |
| -# * PyTorch 2.4 or later |
| 38 | +# * PyTorch 2.6 or later |
40 | 39 | # * Basic understanding of ``torch.export`` and AOTInductor
|
41 | 40 | # * Complete the `AOTInductor: Ahead-Of-Time Compilation for Torch.Export-ed Models <https://pytorch.org/docs/stable/torch.compiler_aot_inductor.html#>`_ tutorial
|
42 | 41 |
|
43 | 42 | ######################################################################
|
44 | 43 | # What you will learn
|
45 | 44 | # ----------------------
|
46 |
| -# * How to use AOTInductor for python runtime. |
47 |
| -# * How to use :func:`torch._inductor.aot_compile` along with :func:`torch.export.export` to generate a shared library |
48 |
| -# * How to run a shared library in Python runtime using :func:`torch._export.aot_load`. |
49 |
| -# * When do you use AOTInductor for python runtime |
| 45 | +# * How to use AOTInductor for Python runtime. |
| 46 | +# * How to use :func:`torch._inductor.aoti_compile_and_package` along with :func:`torch.export.export` to generate a compiled artifact |
| 47 | +# * How to load and run the artifact in a Python runtime using :func:`torch._export.aot_load`. |
| 48 | +# * When to you use AOTInductor with a Python runtime |
50 | 49 |
|
51 | 50 | ######################################################################
|
52 | 51 | # Model Compilation
|
53 | 52 | # -----------------
|
54 | 53 | #
|
55 |
| -# We will use the TorchVision pretrained `ResNet18` model and TorchInductor on the |
56 |
| -# exported PyTorch program using :func:`torch._inductor.aot_compile`. |
| 54 | +# We will use the TorchVision pretrained ``ResNet18`` model as an example. |
57 | 55 | #
|
58 |
| -# .. note:: |
| 56 | +# The first step is to export the model to a graph representation using |
| 57 | +# :func:`torch.export.export`. To learn more about using this function, you can |
| 58 | +# check out the `docs <https://pytorch.org/docs/main/export.html>`_ or the |
| 59 | +# `tutorial <https://pytorch.org/tutorials/intermediate/torch_export_tutorial.html>`_. |
59 | 60 | #
|
60 |
| -# This API also supports :func:`torch.compile` options like ``mode`` |
61 |
| -# This means that if used on a CUDA enabled device, you can, for example, set ``"max_autotune": True`` |
62 |
| -# which leverages Triton based matrix multiplications & convolutions, and enables CUDA graphs by default. |
| 61 | +# Once we have exported the PyTorch model and obtained an ``ExportedProgram``, |
| 62 | +# we can apply :func:`torch._inductor.aoti_compile_and_package` to AOTInductor |
| 63 | +# to compile the program to a specified device, and save the generated contents |
| 64 | +# into a ".pt2" artifact. |
63 | 65 | #
|
64 |
| -# We also specify ``dynamic_shapes`` for the batch dimension. In this example, ``min=2`` is not a bug and is |
65 |
| -# explained in `The 0/1 Specialization Problem <https://docs.google.com/document/d/16VPOa3d-Liikf48teAOmxLc92rgvJdfosIy-yoT38Io/edit?fbclid=IwAR3HNwmmexcitV0pbZm_x1a4ykdXZ9th_eJWK-3hBtVgKnrkmemz6Pm5jRQ#heading=h.ez923tomjvyk>`__ |
66 |
| - |
| 66 | +# .. note:: |
| 67 | +# |
| 68 | +# This API supports the same available options that :func:`torch.compile` |
| 69 | +# has, such as ``mode`` and ``max_autotune`` (for those who want to enable |
| 70 | +# CUDA graphs and leverage Triton based matrix multiplications and |
| 71 | +# convolutions) |
67 | 72 |
|
68 | 73 | import os
|
69 | 74 | import torch
|
| 75 | +import torch._inductor |
70 | 76 | from torchvision.models import ResNet18_Weights, resnet18
|
71 | 77 |
|
72 | 78 | model = resnet18(weights=ResNet18_Weights.DEFAULT)
|
73 | 79 | model.eval()
|
74 | 80 |
|
75 | 81 | with torch.inference_mode():
|
| 82 | + inductor_configs = {} |
76 | 83 |
|
77 |
| - # Specify the generated shared library path |
78 |
| - aot_compile_options = { |
79 |
| - "aot_inductor.output_path": os.path.join(os.getcwd(), "resnet18_pt2.so"), |
80 |
| - } |
81 | 84 | if torch.cuda.is_available():
|
82 | 85 | device = "cuda"
|
83 |
| - aot_compile_options.update({"max_autotune": True}) |
| 86 | + inductor_configs["max_autotune"] = True |
84 | 87 | else:
|
85 | 88 | device = "cpu"
|
86 | 89 |
|
87 | 90 | model = model.to(device=device)
|
88 | 91 | example_inputs = (torch.randn(2, 3, 224, 224, device=device),)
|
89 | 92 |
|
90 |
| - # min=2 is not a bug and is explained in the 0/1 Specialization Problem |
91 |
| - batch_dim = torch.export.Dim("batch", min=2, max=32) |
92 | 93 | exported_program = torch.export.export(
|
93 | 94 | model,
|
94 | 95 | example_inputs,
|
95 |
| - # Specify the first dimension of the input x as dynamic |
96 |
| - dynamic_shapes={"x": {0: batch_dim}}, |
97 | 96 | )
|
98 |
| - so_path = torch._inductor.aot_compile( |
99 |
| - exported_program.module(), |
100 |
| - example_inputs, |
101 |
| - # Specify the generated shared library path |
102 |
| - options=aot_compile_options |
| 97 | + path = torch._inductor.aoti_compile_and_package( |
| 98 | + exported_program, |
| 99 | + package_path=os.path.join(os.getcwd(), "resnet18.pt2"), |
| 100 | + inductor_configs=inductor_configs |
103 | 101 | )
|
104 | 102 |
|
| 103 | +###################################################################### |
| 104 | +# The result of :func:`aoti_compile_and_package` is an artifact "resnet18.pt2" |
| 105 | +# which can be loaded and executed in Python and C++. |
| 106 | +# |
| 107 | +# The artifact itself contains a bunch of AOTInductor generated code, such as |
| 108 | +# a generated C++ runner file, a shared library compiled from the C++ file, and |
| 109 | +# CUDA binary files, aka cubin files, if optimizing for CUDA. |
| 110 | +# |
| 111 | +# Structure-wise, the artifact is a structured ``.zip`` file, with the following |
| 112 | +# specification: |
| 113 | +# |
| 114 | +# .. code:: |
| 115 | +# . |
| 116 | +# ├── archive_format |
| 117 | +# ├── version |
| 118 | +# ├── data |
| 119 | +# │ ├── aotinductor |
| 120 | +# │ │ └── model |
| 121 | +# │ │ ├── xxx.cpp # AOTInductor generated cpp file |
| 122 | +# │ │ ├── xxx.so # AOTInductor generated shared library |
| 123 | +# │ │ ├── xxx.cubin # Cubin files (if running on CUDA) |
| 124 | +# │ │ └── xxx_metadata.json # Additional metadata to save |
| 125 | +# │ ├── weights |
| 126 | +# │ │ └── TBD |
| 127 | +# │ └── constants |
| 128 | +# │ └── TBD |
| 129 | +# └── extra |
| 130 | +# └── metadata.json |
| 131 | +# |
| 132 | +# We can use the following command to inspect the artifact contents: |
| 133 | +# |
| 134 | +# .. code:: bash |
| 135 | +# |
| 136 | +# $ unzip -l resnet18.pt2 |
| 137 | +# |
| 138 | +# .. code:: |
| 139 | +# |
| 140 | +# Archive: resnet18.pt2 |
| 141 | +# Length Date Time Name |
| 142 | +# --------- ---------- ----- ---- |
| 143 | +# 1 01-08-2025 16:40 version |
| 144 | +# 3 01-08-2025 16:40 archive_format |
| 145 | +# 10088 01-08-2025 16:40 data/aotinductor/model/cagzt6akdaczvxwtbvqe34otfe5jlorktbqlojbzqjqvbfsjlge4.cubin |
| 146 | +# 17160 01-08-2025 16:40 data/aotinductor/model/c6oytfjmt5w4c7onvtm6fray7clirxt7q5xjbwx3hdydclmwoujz.cubin |
| 147 | +# 16616 01-08-2025 16:40 data/aotinductor/model/c7ydp7nocyz323hij4tmlf2kcedmwlyg6r57gaqzcsy3huneamu6.cubin |
| 148 | +# 17776 01-08-2025 16:40 data/aotinductor/model/cyqdf46ordevqhiddvpdpp3uzwatfbzdpl3auj2nx23uxvplnne2.cubin |
| 149 | +# 10856 01-08-2025 16:40 data/aotinductor/model/cpzfebfgrusqslui7fxsuoo4tvwulmrxirc5tmrpa4mvrbdno7kn.cubin |
| 150 | +# 14608 01-08-2025 16:40 data/aotinductor/model/c5ukeoz5wmaszd7vczdz2qhtt6n7tdbl3b6wuy4rb2se24fjwfoy.cubin |
| 151 | +# 11376 01-08-2025 16:40 data/aotinductor/model/csu3nstcp56tsjfycygaqsewpu64l5s6zavvz7537cm4s4cv2k3r.cubin |
| 152 | +# 10984 01-08-2025 16:40 data/aotinductor/model/cp76lez4glmgq7gedf2u25zvvv6rksv5lav4q22dibd2zicbgwj3.cubin |
| 153 | +# 14736 01-08-2025 16:40 data/aotinductor/model/c2bb5p6tnwz4elgujqelsrp3unvkgsyiv7xqxmpvuxcm4jfl7pc2.cubin |
| 154 | +# 11376 01-08-2025 16:40 data/aotinductor/model/c6eopmb2b4ngodwsayae4r5q6ni3jlfogfbdk3ypg56tgpzhubfy.cubin |
| 155 | +# 11624 01-08-2025 16:40 data/aotinductor/model/chmwe6lvoekzfowdbiizitm3haiiuad5kdm6sd2m6mv6dkn2zk32.cubin |
| 156 | +# 15632 01-08-2025 16:40 data/aotinductor/model/c3jop5g344hj3ztsu4qm6ibxyaaerlhkzh2e6emak23rxfje6jam.cubin |
| 157 | +# 25472 01-08-2025 16:40 data/aotinductor/model/chaiixybeiuuitm2nmqnxzijzwgnn2n7uuss4qmsupgblfh3h5hk.cubin |
| 158 | +# 139389 01-08-2025 16:40 data/aotinductor/model/cvk6qzuybruhwxtfblzxiov3rlrziv5fkqc4mdhbmantfu3lmd6t.cpp |
| 159 | +# 27 01-08-2025 16:40 data/aotinductor/model/cvk6qzuybruhwxtfblzxiov3rlrziv5fkqc4mdhbmantfu3lmd6t_metadata.json |
| 160 | +# 47195424 01-08-2025 16:40 data/aotinductor/model/cvk6qzuybruhwxtfblzxiov3rlrziv5fkqc4mdhbmantfu3lmd6t.so |
| 161 | +# --------- ------- |
| 162 | +# 47523148 18 files |
| 163 | + |
105 | 164 |
|
106 | 165 | ######################################################################
|
107 | 166 | # Model Inference in Python
|
108 | 167 | # -------------------------
|
109 | 168 | #
|
110 |
| -# Typically, the shared object generated above is used in a non-Python environment. In PyTorch 2.3, |
111 |
| -# we added a new API called :func:`torch._export.aot_load` to load the shared library in the Python runtime. |
112 |
| -# The API follows a structure similar to the :func:`torch.jit.load` API . You need to specify the path |
113 |
| -# of the shared library and the device where it should be loaded. |
| 169 | +# To load and run the artifact in Python, we can use :func:`torch._inductor.aoti_load_package`. |
114 | 170 | #
|
115 |
| -# .. note:: |
116 |
| -# In the example above, we specified ``batch_size=1`` for inference and it still functions correctly even though we specified ``min=2`` in |
117 |
| -# :func:`torch.export.export`. |
118 |
| - |
119 | 171 |
|
120 | 172 | import os
|
121 | 173 | import torch
|
| 174 | +import torch._inductor |
122 | 175 |
|
123 |
| -device = "cuda" if torch.cuda.is_available() else "cpu" |
124 |
| -model_so_path = os.path.join(os.getcwd(), "resnet18_pt2.so") |
| 176 | +model_path = os.path.join(os.getcwd(), "resnet18.pt2") |
125 | 177 |
|
126 |
| -model = torch._export.aot_load(model_so_path, device) |
127 |
| -example_inputs = (torch.randn(1, 3, 224, 224, device=device),) |
| 178 | +compiled_model = torch._inductor.aoti_load_package(model_path) |
| 179 | +example_inputs = (torch.randn(2, 3, 224, 224, device=device),) |
128 | 180 |
|
129 | 181 | with torch.inference_mode():
|
130 |
| - output = model(example_inputs) |
| 182 | + output = compiled_model(example_inputs) |
| 183 | + |
131 | 184 |
|
132 | 185 | ######################################################################
|
133 |
| -# When to use AOTInductor for Python Runtime |
134 |
| -# ------------------------------------------ |
| 186 | +# When to use AOTInductor with a Python Runtime |
| 187 | +# --------------------------------------------- |
135 | 188 | #
|
136 |
| -# One of the requirements for using AOTInductor is that the model shouldn't have any graph breaks. |
137 |
| -# Once this requirement is met, the primary use case for using AOTInductor Python Runtime is for |
138 |
| -# model deployment using Python. |
139 |
| -# There are mainly two reasons why you would use AOTInductor Python Runtime: |
| 189 | +# There are mainly two reasons why one would use AOTInductor with a Python Runtime: |
140 | 190 | #
|
141 |
| -# - ``torch._inductor.aot_compile`` generates a shared library. This is useful for model |
142 |
| -# versioning for deployments and tracking model performance over time. |
| 191 | +# - ``torch._inductor.aoti_compile_and_package`` generates a singular |
| 192 | +# serialized artifact. This is useful for model versioning for deployments |
| 193 | +# and tracking model performance over time. |
143 | 194 | # - With :func:`torch.compile` being a JIT compiler, there is a warmup
|
144 |
| -# cost associated with the first compilation. Your deployment needs to account for the |
145 |
| -# compilation time taken for the first inference. With AOTInductor, the compilation is |
146 |
| -# done offline using ``torch.export.export`` & ``torch._indutor.aot_compile``. The deployment |
147 |
| -# would only load the shared library using ``torch._export.aot_load`` and run inference. |
| 195 | +# cost associated with the first compilation. Your deployment needs to |
| 196 | +# account for the compilation time taken for the first inference. With |
| 197 | +# AOTInductor, the compilation is done ahead of time using |
| 198 | +# ``torch.export.export`` and ``torch._inductor.aoti_compile_and_package``. |
| 199 | +# At deployment time, after loading the model, running inference does not |
| 200 | +# have any additional cost. |
148 | 201 | #
|
149 | 202 | #
|
150 | 203 | # The section below shows the speedup achieved with AOTInductor for first inference
|
@@ -185,7 +238,7 @@ def timed(fn):
|
185 | 238 |
|
186 | 239 | torch._dynamo.reset()
|
187 | 240 |
|
188 |
| -model = torch._export.aot_load(model_so_path, device) |
| 241 | +model = torch._inductor.aoti_load_package(model_path) |
189 | 242 | example_inputs = (torch.randn(1, 3, 224, 224, device=device),)
|
190 | 243 |
|
191 | 244 | with torch.inference_mode():
|
@@ -217,8 +270,7 @@ def timed(fn):
|
217 | 270 | # ----------
|
218 | 271 | #
|
219 | 272 | # In this recipe, we have learned how to effectively use the AOTInductor for Python runtime by
|
220 |
| -# compiling and loading a pretrained ``ResNet18`` model using the ``torch._inductor.aot_compile`` |
221 |
| -# and ``torch._export.aot_load`` APIs. This process demonstrates the practical application of |
222 |
| -# generating a shared library and running it within a Python environment, even with dynamic shape |
223 |
| -# considerations and device-specific optimizations. We also looked at the advantage of using |
| 273 | +# compiling and loading a pretrained ``ResNet18`` model. This process |
| 274 | +# demonstrates the practical application of generating a compiled artifact and |
| 275 | +# running it within a Python environment. We also looked at the advantage of using |
224 | 276 | # AOTInductor in model deployments, with regards to speed up in first inference time.
|
0 commit comments