From 5b1c991d7a0e882865e08061c722eca665b36a3b Mon Sep 17 00:00:00 2001 From: Jane Xu Date: Wed, 22 Jan 2025 11:59:53 -0800 Subject: [PATCH 1/7] Recommend python agnosticism in cpp custom op tutorial --- advanced_source/cpp_custom_ops.rst | 81 ++++++++++++++++++------------ 1 file changed, 50 insertions(+), 31 deletions(-) diff --git a/advanced_source/cpp_custom_ops.rst b/advanced_source/cpp_custom_ops.rst index 9dc06daa6f4..afe3f950660 100644 --- a/advanced_source/cpp_custom_ops.rst +++ b/advanced_source/cpp_custom_ops.rst @@ -62,41 +62,30 @@ Using ``cpp_extension`` is as simple as writing the following ``setup.py``: setup(name="extension_cpp", ext_modules=[ - cpp_extension.CppExtension("extension_cpp", ["muladd.cpp"])], - cmdclass={'build_ext': cpp_extension.BuildExtension}) + cpp_extension.CppExtension( + "extension_cpp", + ["muladd.cpp"] + py_limited_api=True)], + cmdclass={'build_ext': cpp_extension.BuildExtension}, + options={"bdist_wheel": {"py_limited_api": "cp39"}} + ) If you need to compile CUDA code (for example, ``.cu`` files), then instead use `torch.utils.cpp_extension.CUDAExtension `_. Please see `extension-cpp `_ for an example for how this is set up. -Starting with PyTorch 2.6, you can now build a single wheel for multiple CPython -versions (similar to what you would do for pure python packages). In particular, +Note that you can build a single wheel for multiple CPython versions (similar to +what you would do for pure python packages) starting with PyTorch 2.6. Specifically, if your custom library adheres to the `CPython Stable Limited API `_ or avoids CPython entirely, you can build one Python agnostic wheel against a minimum supported CPython version -through setuptools' ``py_limited_api`` flag, like so: - -.. code-block:: python - - from setuptools import setup, Extension - from torch.utils import cpp_extension - - setup(name="extension_cpp", - ext_modules=[ - cpp_extension.CppExtension( - "extension_cpp", - ["python_agnostic_code.cpp"], - py_limited_api=True)], - cmdclass={'build_ext': cpp_extension.BuildExtension}, - options={"bdist_wheel": {"py_limited_api": "cp39"}} - ) +through setuptools' ``py_limited_api`` flag. -Note that you must specify ``py_limited_api=True`` both within ``setup`` +It is necessary to specify ``py_limited_api=True`` both within ``setup`` and also as an option to the ``"bdist_wheel"`` command with the minimal supported Python version (in this case, 3.9). This ``setup`` would build one wheel that could -be installed across multiple Python versions ``python>=3.9``. Please see -`torchao `_ for an example. +be installed across multiple Python versions ``python>=3.9``. .. note:: @@ -105,7 +94,7 @@ be installed across multiple Python versions ``python>=3.9``. Please see to build a wheel that looks Python agnostic but will crash, or worse, be silently incorrect, in another Python environment. Take care to avoid using unstable CPython APIs, for example APIs from libtorch_python (in particular pytorch/python bindings,) - and to only use APIs from libtorch (aten objects, operators and the dispatcher). + and to only use APIs from libtorch (ATen objects, operators and the dispatcher). For example, to give access to custom ops from Python, the library should register the ops through the dispatcher (covered below!). @@ -255,13 +244,43 @@ first load the C++ library that holds the custom operator definition and then call the ``torch.library`` registration APIs. This can happen in one of two ways: -1. If you're following this tutorial, importing the Python C extension module - we created will load the C++ custom operator definitions. -2. If your C++ custom operator is located in a shared library object, you can - also use ``torch.ops.load_library("/path/to/library.so")`` to load it. This - is the blessed path for Python agnosticism, as you will not have a Python C - extension module to import. See `torchao __init__.py `_ - for an example. + +1. In this tutorial, our C++ custom operator is located in a shared library object, + and we use ``torch.ops.load_library("/path/to/library.so")`` to load it. This + is the blessed path for Python agnosticism, and you will not have a Python C + extension module to import. See our `extension_cpp/__init__.py `_ + for an example: + +.. code-block:: python + + import torch + from pathlib import Path + + so_files = list(Path(__file__).parent.glob("_C*.so")) + assert ( + len(so_files) == 1 + ), f"Expected one _C*.so file, found {len(so_files)}" + torch.ops.load_library(so_files[0]) + + from . import ops + + +2. You may also see other custom extensions importing the Python C extension module. + The module would be created in C++ and then imported in Python, like the code below. + This code is not guaranteed to use the stable limited CPython API and would block + your extension from building a Python-agnostic wheel! AVOID the following: + +.. code-block:: cpp + + // in, say, not_agnostic/csrc/extension_BAD.cpp + PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {} + + and later imported in Python like so: + +.. code-block:: python + + # in, say, extension_BAD/__init__.py + from . import _C Adding training (autograd) support for an operator From db526160ca909d54eb163151b0bd34d95f9b2b63 Mon Sep 17 00:00:00 2001 From: Jane Xu Date: Wed, 22 Jan 2025 12:07:55 -0800 Subject: [PATCH 2/7] forgot to delete a line --- advanced_source/cpp_custom_ops.rst | 2 -- 1 file changed, 2 deletions(-) diff --git a/advanced_source/cpp_custom_ops.rst b/advanced_source/cpp_custom_ops.rst index afe3f950660..ddf18eefdc6 100644 --- a/advanced_source/cpp_custom_ops.rst +++ b/advanced_source/cpp_custom_ops.rst @@ -275,8 +275,6 @@ of two ways: // in, say, not_agnostic/csrc/extension_BAD.cpp PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {} - and later imported in Python like so: - .. code-block:: python # in, say, extension_BAD/__init__.py From d9e7cf8322fa90f0a3770754d71e63afc15891a2 Mon Sep 17 00:00:00 2001 From: Jane Xu Date: Fri, 24 Jan 2025 08:50:59 -0800 Subject: [PATCH 3/7] Fixed tutorial to be clearer and to recommend other path --- advanced_source/cpp_custom_ops.rst | 170 +++++++++++++++++++++-------- 1 file changed, 125 insertions(+), 45 deletions(-) diff --git a/advanced_source/cpp_custom_ops.rst b/advanced_source/cpp_custom_ops.rst index ddf18eefdc6..ea1d8073bc8 100644 --- a/advanced_source/cpp_custom_ops.rst +++ b/advanced_source/cpp_custom_ops.rst @@ -64,10 +64,13 @@ Using ``cpp_extension`` is as simple as writing the following ``setup.py``: ext_modules=[ cpp_extension.CppExtension( "extension_cpp", - ["muladd.cpp"] - py_limited_api=True)], + ["muladd.cpp"], + # define Py_LIMITED_API with min version 3.9 to expose only the stable + # limited API subset from Python.h + extra_compile_args={"cxx": ["-DPy_LIMITED_API=0x03090000"]}, + py_limited_api=True)], # Build 1 wheel across multiple Python versions cmdclass={'build_ext': cpp_extension.BuildExtension}, - options={"bdist_wheel": {"py_limited_api": "cp39"}} + options={"bdist_wheel": {"py_limited_api": "cp39"}} # 3.9 is minimum supported Python version ) If you need to compile CUDA code (for example, ``.cu`` files), then instead use @@ -75,28 +78,62 @@ If you need to compile CUDA code (for example, ``.cu`` files), then instead use Please see `extension-cpp `_ for an example for how this is set up. -Note that you can build a single wheel for multiple CPython versions (similar to -what you would do for pure python packages) starting with PyTorch 2.6. Specifically, -if your custom library adheres to the `CPython Stable Limited API -`_ or avoids CPython entirely, you -can build one Python agnostic wheel against a minimum supported CPython version -through setuptools' ``py_limited_api`` flag. +The above example represents what we refer to as a CPython agnostic wheel, meaning +we are building a single wheel that can be run across multiple CPython versions (similar +to pure Python packages). CPython agnosticism is desirable in minimizing the number of wheels your +custom library needs to support and release. To achieve this, there are three key lines to note. -It is necessary to specify ``py_limited_api=True`` both within ``setup`` -and also as an option to the ``"bdist_wheel"`` command with the minimal supported -Python version (in this case, 3.9). This ``setup`` would build one wheel that could -be installed across multiple Python versions ``python>=3.9``. +The first is the specification of ``Py_LIMITED_API`` in ``extra_compile_args`` to the +minimum CPython version you would like to support: -.. note:: +.. code-block:: python + extra_compile_args={"cxx": ["-DPy_LIMITED_API=0x03090000"]}, + +Defining the ``Py_LIMITED_API`` flag helps guarantee that the extension is in fact +only using the `CPython Stable Limited API `_, +which is a requirement for the building a CPython agnostic wheel. If this requirement +is not met, it is possible to build a wheel that looks CPython agnostic but will crash, +or worse, be silently incorrect, in another CPython environment. Take care to avoid +using unstable CPython APIs, for example APIs from libtorch_python (in particular +pytorch/python bindings,) and to only use APIs from libtorch (ATen objects, operators +and the dispatcher). We strongly recommend defining the ``Py_LIMITED_API`` flag to +ensure the extension is compliant and safe as a CPython agnostic wheel. + +The second and third lines inform setuptools that you intend to build a CPython agnostic +wheel and will influence the naming of the wheel accordingly. It is necessary to specify +``py_limited_api=True`` as an argument to CppExtension/CUDAExtension and also as an option +to the ``"bdist_wheel"`` command with the minimal supported CPython version (in this case, +3.9): + +.. code-block:: python + setup(name="extension_cpp", + ext_modules=[ + cpp_extension.CppExtension( + ..., + py_limited_api=True)], # Build 1 wheel across multiple Python versions + ..., + options={"bdist_wheel": {"py_limited_api": "cp39"}} # 3.9 is minimum supported Python version + ) + +This ``setup`` would build one wheel that could be installed across multiple CPython +versions ``>=3.9``. + +If your extension uses CPython APIs outside the stable limited set, then you should build +a wheel per CPython version instead, like so: + +.. code-block:: python + + from setuptools import setup, Extension + from torch.utils import cpp_extension + + setup(name="extension_cpp", + ext_modules=[ + cpp_extension.CppExtension( + "extension_cpp", + ["muladd.cpp"])], + cmdclass={'build_ext': cpp_extension.BuildExtension}, + ) - You must verify independently that the built wheel is truly Python agnostic. - Specifying ``py_limited_api`` does not check for any guarantees, so it is possible - to build a wheel that looks Python agnostic but will crash, or worse, be silently - incorrect, in another Python environment. Take care to avoid using unstable CPython - APIs, for example APIs from libtorch_python (in particular pytorch/python bindings,) - and to only use APIs from libtorch (ATen objects, operators and the dispatcher). - For example, to give access to custom ops from Python, the library should register - the ops through the dispatcher (covered below!). Defining the custom op and adding backend implementations --------------------------------------------------------- @@ -241,15 +278,74 @@ matters (importing in the wrong order will lead to an error). To use the custom operator with hybrid Python/C++ registrations, we must first load the C++ library that holds the custom operator definition -and then call the ``torch.library`` registration APIs. This can happen in one -of two ways: +and then call the ``torch.library`` registration APIs. This can happen in +three ways: + + +1. The first way to load the C++ library that holds the custom operator definition + is to define a dummy Python module for _C. Then, in Python, when you import the + module with ``import _C``, the ``.so``s corresponding to the extension will be + loaded and the ``TORCH_LIBRARY`` and ``TORCH_LIBRARY_IMPL`` static initializers + will run. One can create a dummy Python module with ``PYBIND11_MODULE`` like below, + but you will notice that this does not compile with ``Py_LIMITED_API``, because + ``pybind11`` does not promise to only use the stable limited CPython API! With + the below code, you sadly cannot build a CPython agnostic wheel for your extension! + (Foreshadowing: I wonder what the second way is ;)). + +.. code-block:: cpp + // in, say, not_agnostic/csrc/extension_BAD.cpp + #include + PYBIND11_MODULE("_C", m) {} -1. In this tutorial, our C++ custom operator is located in a shared library object, - and we use ``torch.ops.load_library("/path/to/library.so")`` to load it. This - is the blessed path for Python agnosticism, and you will not have a Python C - extension module to import. See our `extension_cpp/__init__.py `_ - for an example: +.. code-block:: python + + # in, say, extension/__init__.py + from . import _C + +2. In this tutorial, because we value being able to build a single wheel across multiple + CPython versions, we will replace the unstable ``PYBIND11`` call with stable API calls. + The below code compiles with ``-DPy_LIMITED_API=0x03090000`` and successfully creates + a dummy Python module for our ``_C`` extension so that it can be imported from Python. + See `extension_cpp/__init__.py `_ + and `extension_cpp/csrc/muladd.cpp `_ + for more details: + +.. code-block:: cpp + #include + + extern "C" { + /* Creates a dummy empty _C module that can be imported from Python. + The import from Python will load the .so consisting of this file + in this extension, so that the TORCH_LIBRARY static initializers + below are run. */ + PyObject* PyInit__C(void) + { + static struct PyModuleDef module_def = { + PyModuleDef_HEAD_INIT, + "_C", /* name of module */ + NULL, /* module documentation, may be NULL */ + -1, /* size of per-interpreter state of the module, + or -1 if the module keeps state in global variables. */ + NULL, /* methods */ + }; + return PyModule_Create(&module_def); + } + } + +.. code-block:: python + + # in, say, extension/__init__.py + from . import _C + +3. If you want to avoid ``Python.h`` entirely in your C++ custom operator, you may + use ``torch.ops.load_library("/path/to/library.so")`` in Python to load the ``.so`` + file(s) compiled from the extension. Note that, with this method, there is no ``_C`` + Python module created for the extension so you cannot call ``import _C`` from Python. + Instead of relying on the import statement to trigger the custom operators to be + registered, ``torch.ops.load_library("/path/to/library.so")`` will do the trick. + The challenge then is shifted towards understanding where the ``.so`` files are + located so that you can load them, which is not always trivial: .. code-block:: python @@ -265,22 +361,6 @@ of two ways: from . import ops -2. You may also see other custom extensions importing the Python C extension module. - The module would be created in C++ and then imported in Python, like the code below. - This code is not guaranteed to use the stable limited CPython API and would block - your extension from building a Python-agnostic wheel! AVOID the following: - -.. code-block:: cpp - - // in, say, not_agnostic/csrc/extension_BAD.cpp - PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {} - -.. code-block:: python - - # in, say, extension_BAD/__init__.py - from . import _C - - Adding training (autograd) support for an operator ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Use ``torch.library.register_autograd`` to add training support for an operator. Prefer From 22dac7eae9b2b6a00d4617c41cc2993d586a3132 Mon Sep 17 00:00:00 2001 From: Jane Xu Date: Fri, 24 Jan 2025 09:03:35 -0800 Subject: [PATCH 4/7] Switch to commits instead of master --- advanced_source/cpp_custom_ops.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/advanced_source/cpp_custom_ops.rst b/advanced_source/cpp_custom_ops.rst index ea1d8073bc8..5e4f7e6753f 100644 --- a/advanced_source/cpp_custom_ops.rst +++ b/advanced_source/cpp_custom_ops.rst @@ -307,8 +307,8 @@ three ways: CPython versions, we will replace the unstable ``PYBIND11`` call with stable API calls. The below code compiles with ``-DPy_LIMITED_API=0x03090000`` and successfully creates a dummy Python module for our ``_C`` extension so that it can be imported from Python. - See `extension_cpp/__init__.py `_ - and `extension_cpp/csrc/muladd.cpp `_ + See `extension_cpp/__init__.py `_ + and `extension_cpp/csrc/muladd.cpp `_ for more details: .. code-block:: cpp From c21f57651591c1127b718b26b94b4c7ad85fa9cc Mon Sep 17 00:00:00 2001 From: Jane Xu Date: Fri, 24 Jan 2025 10:23:12 -0800 Subject: [PATCH 5/7] Formatting code blocks --- advanced_source/cpp_custom_ops.rst | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/advanced_source/cpp_custom_ops.rst b/advanced_source/cpp_custom_ops.rst index 5e4f7e6753f..feb60db7b5a 100644 --- a/advanced_source/cpp_custom_ops.rst +++ b/advanced_source/cpp_custom_ops.rst @@ -284,15 +284,16 @@ three ways: 1. The first way to load the C++ library that holds the custom operator definition is to define a dummy Python module for _C. Then, in Python, when you import the - module with ``import _C``, the ``.so``s corresponding to the extension will be - loaded and the ``TORCH_LIBRARY`` and ``TORCH_LIBRARY_IMPL`` static initializers + module with ``import _C``, the ``.so`` files corresponding to the extension will + be loaded and the ``TORCH_LIBRARY`` and ``TORCH_LIBRARY_IMPL`` static initializers will run. One can create a dummy Python module with ``PYBIND11_MODULE`` like below, but you will notice that this does not compile with ``Py_LIMITED_API``, because ``pybind11`` does not promise to only use the stable limited CPython API! With the below code, you sadly cannot build a CPython agnostic wheel for your extension! - (Foreshadowing: I wonder what the second way is ;)). + (Foreshadowing: I wonder what the second way is ;) ). .. code-block:: cpp + // in, say, not_agnostic/csrc/extension_BAD.cpp #include @@ -312,6 +313,7 @@ three ways: for more details: .. code-block:: cpp + #include extern "C" { From 137255305008a2461843afae4f8101d83cf96577 Mon Sep 17 00:00:00 2001 From: Jane Xu Date: Fri, 24 Jan 2025 11:27:59 -0800 Subject: [PATCH 6/7] polish + i missed some code blocks earlier --- advanced_source/cpp_custom_ops.rst | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/advanced_source/cpp_custom_ops.rst b/advanced_source/cpp_custom_ops.rst index feb60db7b5a..2d2265bf9c8 100644 --- a/advanced_source/cpp_custom_ops.rst +++ b/advanced_source/cpp_custom_ops.rst @@ -87,6 +87,7 @@ The first is the specification of ``Py_LIMITED_API`` in ``extra_compile_args`` t minimum CPython version you would like to support: .. code-block:: python + extra_compile_args={"cxx": ["-DPy_LIMITED_API=0x03090000"]}, Defining the ``Py_LIMITED_API`` flag helps guarantee that the extension is in fact @@ -99,13 +100,11 @@ pytorch/python bindings,) and to only use APIs from libtorch (ATen objects, oper and the dispatcher). We strongly recommend defining the ``Py_LIMITED_API`` flag to ensure the extension is compliant and safe as a CPython agnostic wheel. -The second and third lines inform setuptools that you intend to build a CPython agnostic -wheel and will influence the naming of the wheel accordingly. It is necessary to specify -``py_limited_api=True`` as an argument to CppExtension/CUDAExtension and also as an option -to the ``"bdist_wheel"`` command with the minimal supported CPython version (in this case, -3.9): +The second and third lines specifying ``py_limited_api`` inform setuptools that you intend +to build a CPython agnostic wheel and will influence the naming of the wheel accordingly: .. code-block:: python + setup(name="extension_cpp", ext_modules=[ cpp_extension.CppExtension( @@ -115,11 +114,15 @@ to the ``"bdist_wheel"`` command with the minimal supported CPython version (in options={"bdist_wheel": {"py_limited_api": "cp39"}} # 3.9 is minimum supported Python version ) -This ``setup`` would build one wheel that could be installed across multiple CPython -versions ``>=3.9``. +It is necessary to specify ``py_limited_api=True`` as an argument to CppExtension/ +CUDAExtension and also as an option to the ``"bdist_wheel"`` command with the minimal +supported CPython version (in this case, 3.9, as it is the oldest supported version +currently). Consequently, the ``setup`` in our tutorial would build one wheel that could +be installed across multiple CPython versions ``>=3.9``. -If your extension uses CPython APIs outside the stable limited set, then you should build -a wheel per CPython version instead, like so: +If your extension uses CPython APIs outside the stable limited set, then you cannot +build a CPython agnostic wheel! You should build one wheel per CPython version instead, +like so: .. code-block:: python From 9a96b753244e6608a5680ed3015e779222ca3b36 Mon Sep 17 00:00:00 2001 From: Jane Xu Date: Mon, 27 Jan 2025 15:03:56 -0800 Subject: [PATCH 7/7] Adjust advice based on Sam Gross's knowlede --- advanced_source/cpp_custom_ops.rst | 29 ++++++++++++++++++++--------- 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/advanced_source/cpp_custom_ops.rst b/advanced_source/cpp_custom_ops.rst index 2d2265bf9c8..512c39b2a68 100644 --- a/advanced_source/cpp_custom_ops.rst +++ b/advanced_source/cpp_custom_ops.rst @@ -78,10 +78,16 @@ If you need to compile CUDA code (for example, ``.cu`` files), then instead use Please see `extension-cpp `_ for an example for how this is set up. -The above example represents what we refer to as a CPython agnostic wheel, meaning -we are building a single wheel that can be run across multiple CPython versions (similar -to pure Python packages). CPython agnosticism is desirable in minimizing the number of wheels your -custom library needs to support and release. To achieve this, there are three key lines to note. +The above example represents what we refer to as a CPython agnostic wheel, meaning we are +building a single wheel that can be run across multiple CPython versions (similar to pure +Python packages). CPython agnosticism is desirable in minimizing the number of wheels your +custom library needs to support and release. The minimum version we'd like to support is +3.9, since it is the oldest supported version currently, so we use the corresponding hexcode +and specifier throughout the setup code. We suggest building the extension in the same +environment as the minimum CPython version you'd like to support to minimize unknown behavior, +so, here, we build the extension in a CPython 3.9 environment. When built, this single wheel +will be runnable in any CPython environment 3.9+. To achieve this, there are three key lines +to note. The first is the specification of ``Py_LIMITED_API`` in ``extra_compile_args`` to the minimum CPython version you would like to support: @@ -90,7 +96,7 @@ minimum CPython version you would like to support: extra_compile_args={"cxx": ["-DPy_LIMITED_API=0x03090000"]}, -Defining the ``Py_LIMITED_API`` flag helps guarantee that the extension is in fact +Defining the ``Py_LIMITED_API`` flag helps verify that the extension is in fact only using the `CPython Stable Limited API `_, which is a requirement for the building a CPython agnostic wheel. If this requirement is not met, it is possible to build a wheel that looks CPython agnostic but will crash, @@ -98,7 +104,12 @@ or worse, be silently incorrect, in another CPython environment. Take care to av using unstable CPython APIs, for example APIs from libtorch_python (in particular pytorch/python bindings,) and to only use APIs from libtorch (ATen objects, operators and the dispatcher). We strongly recommend defining the ``Py_LIMITED_API`` flag to -ensure the extension is compliant and safe as a CPython agnostic wheel. +help ascertain the extension is compliant and safe as a CPython agnostic wheel. Note that +defining this flag is not a full guarantee that the built wheel is CPython agnostic, but +it is better than the wild wild west. There are several caveats mentioned in the +`Python docs `_, +and you should test and verify yourself that the wheel is truly agnostic for the relevant +CPython versions. The second and third lines specifying ``py_limited_api`` inform setuptools that you intend to build a CPython agnostic wheel and will influence the naming of the wheel accordingly: @@ -116,9 +127,9 @@ to build a CPython agnostic wheel and will influence the naming of the wheel acc It is necessary to specify ``py_limited_api=True`` as an argument to CppExtension/ CUDAExtension and also as an option to the ``"bdist_wheel"`` command with the minimal -supported CPython version (in this case, 3.9, as it is the oldest supported version -currently). Consequently, the ``setup`` in our tutorial would build one wheel that could -be installed across multiple CPython versions ``>=3.9``. +supported CPython version (in this case, 3.9). Consequently, the ``setup`` in our +tutorial would build one properly named wheel that could be installed across multiple +CPython versions ``>=3.9``. If your extension uses CPython APIs outside the stable limited set, then you cannot build a CPython agnostic wheel! You should build one wheel per CPython version instead,