Merge branch 'main' into mlazos/compile-opt

malfet · web-flow · commit 86f884f92f3f · 2024-01-15T20:00:48.000-08:00
diff --git a/prototype_source/pt2e_quant_ptq_x86_inductor.rst b/prototype_source/pt2e_quant_ptq_x86_inductor.rst
@@ -8,6 +8,7 @@ Prerequisites
 
 -  `PyTorch 2 Export Post Training Quantization <https://pytorch.org/tutorials/prototype/pt2e_quant_ptq.html>`_
 -  `TorchInductor and torch.compile concepts in PyTorch <https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html>`_
+-  `Inductor C++ Wrapper concepts <https://pytorch.org/tutorials/prototype/inductor_cpp_wrapper_tutorial.html>`_
 
 Introduction
 ^^^^^^^^^^^^^^
@@ -161,7 +162,18 @@ After these steps, we finished running the quantization flow and we will get the
 3. Lower into Inductor
 ------------------------
 
-After we get the quantized model, we will further lower it to the inductor backend.
+After we get the quantized model, we will further lower it to the inductor backend. The default Inductor wrapper
+generates Python code to invoke both generated kernels and external kernels. Additionally, Inductor supports
+C++ wrapper that generates pure C++ code. This allows seamless integration of the generated and external kernels,
+effectively reducing Python overhead. In the future, leveraging the C++ wrapper, we can extend the capability
+to achieve pure C++ deployment. For more comprehensive details about C++ Wrapper in general, please refer to the
+dedicated tutorial on `Inductor C++ Wrapper Tutorial <https://pytorch.org/tutorials/prototype/inductor_cpp_wrapper_tutorial.html>`_.
+
+::
+
+    # Optional: using the C++ wrapper instead of default Python wrapper
+    import torch._inductor.config as config
+    config.cpp_wrapper = True
 
 ::