diff --git a/advanced_source/cpp_custom_ops.rst b/advanced_source/cpp_custom_ops.rst index 9dc06daa6f4..512c39b2a68 100644 --- a/advanced_source/cpp_custom_ops.rst +++ b/advanced_source/cpp_custom_ops.rst @@ -62,20 +62,78 @@ Using ``cpp_extension`` is as simple as writing the following ``setup.py``: setup(name="extension_cpp", ext_modules=[ - cpp_extension.CppExtension("extension_cpp", ["muladd.cpp"])], - cmdclass={'build_ext': cpp_extension.BuildExtension}) + cpp_extension.CppExtension( + "extension_cpp", + ["muladd.cpp"], + # define Py_LIMITED_API with min version 3.9 to expose only the stable + # limited API subset from Python.h + extra_compile_args={"cxx": ["-DPy_LIMITED_API=0x03090000"]}, + py_limited_api=True)], # Build 1 wheel across multiple Python versions + cmdclass={'build_ext': cpp_extension.BuildExtension}, + options={"bdist_wheel": {"py_limited_api": "cp39"}} # 3.9 is minimum supported Python version + ) If you need to compile CUDA code (for example, ``.cu`` files), then instead use `torch.utils.cpp_extension.CUDAExtension `_. Please see `extension-cpp `_ for an example for how this is set up. -Starting with PyTorch 2.6, you can now build a single wheel for multiple CPython -versions (similar to what you would do for pure python packages). In particular, -if your custom library adheres to the `CPython Stable Limited API -`_ or avoids CPython entirely, you -can build one Python agnostic wheel against a minimum supported CPython version -through setuptools' ``py_limited_api`` flag, like so: +The above example represents what we refer to as a CPython agnostic wheel, meaning we are +building a single wheel that can be run across multiple CPython versions (similar to pure +Python packages). CPython agnosticism is desirable in minimizing the number of wheels your +custom library needs to support and release. The minimum version we'd like to support is +3.9, since it is the oldest supported version currently, so we use the corresponding hexcode +and specifier throughout the setup code. We suggest building the extension in the same +environment as the minimum CPython version you'd like to support to minimize unknown behavior, +so, here, we build the extension in a CPython 3.9 environment. When built, this single wheel +will be runnable in any CPython environment 3.9+. To achieve this, there are three key lines +to note. + +The first is the specification of ``Py_LIMITED_API`` in ``extra_compile_args`` to the +minimum CPython version you would like to support: + +.. code-block:: python + + extra_compile_args={"cxx": ["-DPy_LIMITED_API=0x03090000"]}, + +Defining the ``Py_LIMITED_API`` flag helps verify that the extension is in fact +only using the `CPython Stable Limited API `_, +which is a requirement for the building a CPython agnostic wheel. If this requirement +is not met, it is possible to build a wheel that looks CPython agnostic but will crash, +or worse, be silently incorrect, in another CPython environment. Take care to avoid +using unstable CPython APIs, for example APIs from libtorch_python (in particular +pytorch/python bindings,) and to only use APIs from libtorch (ATen objects, operators +and the dispatcher). We strongly recommend defining the ``Py_LIMITED_API`` flag to +help ascertain the extension is compliant and safe as a CPython agnostic wheel. Note that +defining this flag is not a full guarantee that the built wheel is CPython agnostic, but +it is better than the wild wild west. There are several caveats mentioned in the +`Python docs `_, +and you should test and verify yourself that the wheel is truly agnostic for the relevant +CPython versions. + +The second and third lines specifying ``py_limited_api`` inform setuptools that you intend +to build a CPython agnostic wheel and will influence the naming of the wheel accordingly: + +.. code-block:: python + + setup(name="extension_cpp", + ext_modules=[ + cpp_extension.CppExtension( + ..., + py_limited_api=True)], # Build 1 wheel across multiple Python versions + ..., + options={"bdist_wheel": {"py_limited_api": "cp39"}} # 3.9 is minimum supported Python version + ) + +It is necessary to specify ``py_limited_api=True`` as an argument to CppExtension/ +CUDAExtension and also as an option to the ``"bdist_wheel"`` command with the minimal +supported CPython version (in this case, 3.9). Consequently, the ``setup`` in our +tutorial would build one properly named wheel that could be installed across multiple +CPython versions ``>=3.9``. + +If your extension uses CPython APIs outside the stable limited set, then you cannot +build a CPython agnostic wheel! You should build one wheel per CPython version instead, +like so: .. code-block:: python @@ -86,28 +144,10 @@ through setuptools' ``py_limited_api`` flag, like so: ext_modules=[ cpp_extension.CppExtension( "extension_cpp", - ["python_agnostic_code.cpp"], - py_limited_api=True)], + ["muladd.cpp"])], cmdclass={'build_ext': cpp_extension.BuildExtension}, - options={"bdist_wheel": {"py_limited_api": "cp39"}} ) -Note that you must specify ``py_limited_api=True`` both within ``setup`` -and also as an option to the ``"bdist_wheel"`` command with the minimal supported -Python version (in this case, 3.9). This ``setup`` would build one wheel that could -be installed across multiple Python versions ``python>=3.9``. Please see -`torchao `_ for an example. - -.. note:: - - You must verify independently that the built wheel is truly Python agnostic. - Specifying ``py_limited_api`` does not check for any guarantees, so it is possible - to build a wheel that looks Python agnostic but will crash, or worse, be silently - incorrect, in another Python environment. Take care to avoid using unstable CPython - APIs, for example APIs from libtorch_python (in particular pytorch/python bindings,) - and to only use APIs from libtorch (aten objects, operators and the dispatcher). - For example, to give access to custom ops from Python, the library should register - the ops through the dispatcher (covered below!). Defining the custom op and adding backend implementations --------------------------------------------------------- @@ -252,16 +292,89 @@ matters (importing in the wrong order will lead to an error). To use the custom operator with hybrid Python/C++ registrations, we must first load the C++ library that holds the custom operator definition -and then call the ``torch.library`` registration APIs. This can happen in one -of two ways: - -1. If you're following this tutorial, importing the Python C extension module - we created will load the C++ custom operator definitions. -2. If your C++ custom operator is located in a shared library object, you can - also use ``torch.ops.load_library("/path/to/library.so")`` to load it. This - is the blessed path for Python agnosticism, as you will not have a Python C - extension module to import. See `torchao __init__.py `_ - for an example. +and then call the ``torch.library`` registration APIs. This can happen in +three ways: + + +1. The first way to load the C++ library that holds the custom operator definition + is to define a dummy Python module for _C. Then, in Python, when you import the + module with ``import _C``, the ``.so`` files corresponding to the extension will + be loaded and the ``TORCH_LIBRARY`` and ``TORCH_LIBRARY_IMPL`` static initializers + will run. One can create a dummy Python module with ``PYBIND11_MODULE`` like below, + but you will notice that this does not compile with ``Py_LIMITED_API``, because + ``pybind11`` does not promise to only use the stable limited CPython API! With + the below code, you sadly cannot build a CPython agnostic wheel for your extension! + (Foreshadowing: I wonder what the second way is ;) ). + +.. code-block:: cpp + + // in, say, not_agnostic/csrc/extension_BAD.cpp + #include + + PYBIND11_MODULE("_C", m) {} + +.. code-block:: python + + # in, say, extension/__init__.py + from . import _C + +2. In this tutorial, because we value being able to build a single wheel across multiple + CPython versions, we will replace the unstable ``PYBIND11`` call with stable API calls. + The below code compiles with ``-DPy_LIMITED_API=0x03090000`` and successfully creates + a dummy Python module for our ``_C`` extension so that it can be imported from Python. + See `extension_cpp/__init__.py `_ + and `extension_cpp/csrc/muladd.cpp `_ + for more details: + +.. code-block:: cpp + + #include + + extern "C" { + /* Creates a dummy empty _C module that can be imported from Python. + The import from Python will load the .so consisting of this file + in this extension, so that the TORCH_LIBRARY static initializers + below are run. */ + PyObject* PyInit__C(void) + { + static struct PyModuleDef module_def = { + PyModuleDef_HEAD_INIT, + "_C", /* name of module */ + NULL, /* module documentation, may be NULL */ + -1, /* size of per-interpreter state of the module, + or -1 if the module keeps state in global variables. */ + NULL, /* methods */ + }; + return PyModule_Create(&module_def); + } + } + +.. code-block:: python + + # in, say, extension/__init__.py + from . import _C + +3. If you want to avoid ``Python.h`` entirely in your C++ custom operator, you may + use ``torch.ops.load_library("/path/to/library.so")`` in Python to load the ``.so`` + file(s) compiled from the extension. Note that, with this method, there is no ``_C`` + Python module created for the extension so you cannot call ``import _C`` from Python. + Instead of relying on the import statement to trigger the custom operators to be + registered, ``torch.ops.load_library("/path/to/library.so")`` will do the trick. + The challenge then is shifted towards understanding where the ``.so`` files are + located so that you can load them, which is not always trivial: + +.. code-block:: python + + import torch + from pathlib import Path + + so_files = list(Path(__file__).parent.glob("_C*.so")) + assert ( + len(so_files) == 1 + ), f"Expected one _C*.so file, found {len(so_files)}" + torch.ops.load_library(so_files[0]) + + from . import ops Adding training (autograd) support for an operator