From 53aed842e507368c353ae420b4354b687bcd83c5 Mon Sep 17 00:00:00 2001
From: Jane Xu <janeyx@meta.com>
Date: Fri, 27 Dec 2024 14:38:39 -0800
Subject: [PATCH 1/3] Mention Python agnosticism in custom ops tutorial

---
 advanced_source/cpp_custom_ops.rst | 50 ++++++++++++++++++++++++++----
 index.rst                          |  4 +--
 2 files changed, 46 insertions(+), 8 deletions(-)

diff --git a/advanced_source/cpp_custom_ops.rst b/advanced_source/cpp_custom_ops.rst
index ffabd6eff77..d22d3364610 100644
--- a/advanced_source/cpp_custom_ops.rst
+++ b/advanced_source/cpp_custom_ops.rst
@@ -63,9 +63,42 @@ Using ``cpp_extension`` is as simple as writing the following ``setup.py``:
 
 If you need to compile CUDA code (for example, ``.cu`` files), then instead use
 `torch.utils.cpp_extension.CUDAExtension <https://pytorch.org/docs/stable/cpp_extension.html#torch.utils.cpp_extension.CUDAExtension>`_.
-Please see how
-`extension-cpp <https://github.com/pytorch/extension-cpp>`_ for an example for
-how this is set up.
+Please see `extension-cpp <https://github.com/pytorch/extension-cpp>`_ for an
+example for how this is set up.
+
+In PyTorch 2.6 and later, if your custom library adheres to the `CPython stable
+Limited API <https://docs.python.org/3/c-api/stable.html>`_ or avoids CPython
+entirely, you can build one python agnostic wheel against a minimum supported
+CPython version through setuptools' ``py_limited_api`` flag, like so:
+
+.. code-block:: python
+
+  from setuptools import setup, Extension
+  from torch.utils import cpp_extension
+
+  setup(name="extension_cpp",
+        ext_modules=[
+            cpp_extension.CppExtension("extension_cpp", ["muladd.cpp"], py_limited_api=True)],
+        cmdclass={'build_ext': cpp_extension.BuildExtension},
+        options={"bdist_wheel": {"py_limited_api": "cp39"}}
+  )
+
+Note that you must specify ``py_limited_api=True`` both within ``setup``:
+and also as an option to the ``"bdist_wheel"`` command with the minimal supported
+Python version (in this case, 3.9). This ``setup`` would build one wheel that could
+be installed across multiple Python versions ``python>=3.9``. Please see
+`torchao <https://github.com/pytorch/ao>`_ for an example.
+
+.. note::
+
+  You must verify independently that the built wheel is truly Python agnostic.
+  Specifying ``py_limited_api`` does not check for any guarantees, so it is possible
+  to build a wheel that looks Python agnostic but will crash, or worse, be silently
+  incorrect, in another Python environment. Take care to avoid using unstable CPython
+  APIs, for example APIs from libtorch_python (in particular pytorch/python bindings)
+  and to only use APIs from libtorch (aten objects, operators and the dispatcher).
+  For example, to give access to custom ops from python, the library should register
+  the ops through the dispatcher (covered below!).
 
 Defining the custom op and adding backend implementations
 ---------------------------------------------------------
@@ -177,7 +210,7 @@ operator specifies how to compute the metadata of output tensors given the metad
 The FakeTensor kernel should return dummy Tensors of your choice with
 the correct Tensor metadata (shape/strides/``dtype``/device).
 
-We recommend that this be done from Python via the `torch.library.register_fake` API,
+We recommend that this be done from Python via the ``torch.library.register_fake`` API,
 though it is possible to do this from C++ as well (see
 `The Custom Operators Manual <https://pytorch.org/docs/main/notes/custom_operators.html>`_
 for more details).
@@ -188,7 +221,9 @@ for more details).
   # before calling ``torch.library`` APIs that add registrations for the
   # C++ custom operator(s). The following import loads our
   # C++ custom operator definitions.
-  # See the next section for more details.
+  # Note that if you are striving for Python agnosticism, you should use
+  # the ``load_library(...)`` API call instead. See the next section for
+  # more details.
   from . import _C
 
   @torch.library.register_fake("extension_cpp::mymuladd")
@@ -214,7 +249,10 @@ of two ways:
 1. If you're following this tutorial, importing the Python C extension module
    we created will load the C++ custom operator definitions.
 2. If your C++ custom operator is located in a shared library object, you can
-   also use ``torch.ops.load_library("/path/to/library.so")`` to load it.
+   also use ``torch.ops.load_library("/path/to/library.so")`` to load it. This
+   is the blessed path for Python agnosticism, as you will not have a Python C
+   extension module to import. See `torchao __init__.py <https://github.com/pytorch/ao/blob/881e84b4398eddcea6fee4d911fc329a38b5cd69/torchao/__init__.py#L26-L28>`
+   for an example.
 
 
 Adding training (autograd) support for an operator
diff --git a/index.rst b/index.rst
index 385e589de3b..133230611dd 100644
--- a/index.rst
+++ b/index.rst
@@ -426,14 +426,14 @@ Welcome to PyTorch Tutorials
 
 .. customcarditem::
    :header: Custom C++ and CUDA Extensions
-   :card_description:  Create a neural network layer with no parameters using numpy. Then use scipy to create a neural network layer that has learnable weights.
+   :card_description: Create a neural network layer with no parameters using numpy. Then use scipy to create a neural network layer that has learnable weights.
    :image: _static/img/thumbnails/cropped/Custom-Cpp-and-CUDA-Extensions.png
    :link: advanced/cpp_extension.html
    :tags: Extending-PyTorch,Frontend-APIs,C++,CUDA
 
 .. customcarditem::
    :header: Extending TorchScript with Custom C++ Operators
-   :card_description:  Implement a custom TorchScript operator in C++, how to build it into a shared library, how to use it in Python to define TorchScript models and lastly how to load it into a C++ application for inference workloads.
+   :card_description: Implement a custom TorchScript operator in C++, how to build it into a shared library, how to use it in Python to define TorchScript models and lastly how to load it into a C++ application for inference workloads.
    :image: _static/img/thumbnails/cropped/Extending-TorchScript-with-Custom-Cpp-Operators.png
    :link: advanced/torch_script_custom_ops.html
    :tags: Extending-PyTorch,Frontend-APIs,TorchScript,C++

From d0b7505754a66a18104275b81ae02795b2f71ed2 Mon Sep 17 00:00:00 2001
From: Jane Xu <janeyx@meta.com>
Date: Fri, 27 Dec 2024 17:47:13 -0800
Subject: [PATCH 2/3] light dusting on grammar

---
 advanced_source/cpp_custom_ops.rst | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/advanced_source/cpp_custom_ops.rst b/advanced_source/cpp_custom_ops.rst
index d22d3364610..a328a895434 100644
--- a/advanced_source/cpp_custom_ops.rst
+++ b/advanced_source/cpp_custom_ops.rst
@@ -78,12 +78,15 @@ CPython version through setuptools' ``py_limited_api`` flag, like so:
 
   setup(name="extension_cpp",
         ext_modules=[
-            cpp_extension.CppExtension("extension_cpp", ["muladd.cpp"], py_limited_api=True)],
+            cpp_extension.CppExtension(
+              "extension_cpp",
+              ["python_agnostic_code.cpp"],
+              py_limited_api=True)],
         cmdclass={'build_ext': cpp_extension.BuildExtension},
         options={"bdist_wheel": {"py_limited_api": "cp39"}}
   )
 
-Note that you must specify ``py_limited_api=True`` both within ``setup``:
+Note that you must specify ``py_limited_api=True`` both within ``setup``
 and also as an option to the ``"bdist_wheel"`` command with the minimal supported
 Python version (in this case, 3.9). This ``setup`` would build one wheel that could
 be installed across multiple Python versions ``python>=3.9``. Please see
@@ -95,9 +98,9 @@ be installed across multiple Python versions ``python>=3.9``. Please see
   Specifying ``py_limited_api`` does not check for any guarantees, so it is possible
   to build a wheel that looks Python agnostic but will crash, or worse, be silently
   incorrect, in another Python environment. Take care to avoid using unstable CPython
-  APIs, for example APIs from libtorch_python (in particular pytorch/python bindings)
+  APIs, for example APIs from libtorch_python (in particular pytorch/python bindings,)
   and to only use APIs from libtorch (aten objects, operators and the dispatcher).
-  For example, to give access to custom ops from python, the library should register
+  For example, to give access to custom ops from Python, the library should register
   the ops through the dispatcher (covered below!).
 
 Defining the custom op and adding backend implementations
@@ -251,7 +254,7 @@ of two ways:
 2. If your C++ custom operator is located in a shared library object, you can
    also use ``torch.ops.load_library("/path/to/library.so")`` to load it. This
    is the blessed path for Python agnosticism, as you will not have a Python C
-   extension module to import. See `torchao __init__.py <https://github.com/pytorch/ao/blob/881e84b4398eddcea6fee4d911fc329a38b5cd69/torchao/__init__.py#L26-L28>`
+   extension module to import. See `torchao __init__.py <https://github.com/pytorch/ao/blob/881e84b4398eddcea6fee4d911fc329a38b5cd69/torchao/__init__.py#L26-L28>`_
    for an example.
 
 

From 1d07c79dcb9e2787ef9fc6692a3de3c41787095a Mon Sep 17 00:00:00 2001
From: Jane Xu <janeyx@meta.com>
Date: Mon, 30 Dec 2024 08:18:09 -0800
Subject: [PATCH 3/3] be explicit

---
 advanced_source/cpp_custom_ops.rst | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/advanced_source/cpp_custom_ops.rst b/advanced_source/cpp_custom_ops.rst
index a328a895434..2b7aa2a1a9c 100644
--- a/advanced_source/cpp_custom_ops.rst
+++ b/advanced_source/cpp_custom_ops.rst
@@ -66,10 +66,12 @@ If you need to compile CUDA code (for example, ``.cu`` files), then instead use
 Please see `extension-cpp <https://github.com/pytorch/extension-cpp>`_ for an
 example for how this is set up.
 
-In PyTorch 2.6 and later, if your custom library adheres to the `CPython stable
-Limited API <https://docs.python.org/3/c-api/stable.html>`_ or avoids CPython
-entirely, you can build one python agnostic wheel against a minimum supported
-CPython version through setuptools' ``py_limited_api`` flag, like so:
+Starting with PyTorch 2.6, you can now build a single wheel for multiple CPython
+versions (similar to what you would do for pure python packages). In particular,
+if your custom library adheres to the `CPython Stable Limited API
+<https://docs.python.org/3/c-api/stable.html>`_ or avoids CPython entirely, you
+can build one Python agnostic wheel against a minimum supported CPython version
+through setuptools' ``py_limited_api`` flag, like so:
 
 .. code-block:: python