From 5a071770ea4f1ca301fc2b2daa350e91bc4c1db7 Mon Sep 17 00:00:00 2001 From: leslie-fang-intel Date: Tue, 19 Dec 2023 09:59:27 +0800 Subject: [PATCH 1/2] Add how to use C++ wrapper with X86InductorQuantizer --- prototype_source/pt2e_quant_ptq_x86_inductor.rst | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/prototype_source/pt2e_quant_ptq_x86_inductor.rst b/prototype_source/pt2e_quant_ptq_x86_inductor.rst index f2cabe88949..97e19e87f57 100644 --- a/prototype_source/pt2e_quant_ptq_x86_inductor.rst +++ b/prototype_source/pt2e_quant_ptq_x86_inductor.rst @@ -8,6 +8,7 @@ Prerequisites - `PyTorch 2 Export Post Training Quantization `_ - `TorchInductor and torch.compile concepts in PyTorch `_ +- `Inductor C++ Wrapper concepts `_ Introduction ^^^^^^^^^^^^^^ @@ -161,7 +162,15 @@ After these steps, we finished running the quantization flow and we will get the 3. Lower into Inductor ------------------------ -After we get the quantized model, we will further lower it to the inductor backend. +After we get the quantized model, we will further lower it to the inductor backend. The default Inductor wrapper +generates Python code to invoke both generated kernels and external kernels. Additionally, Inductor supports a C++ wrapper +that generates pure C++ code, seamlessly combining the generated and external kernels. + +:: + + # Optional: using the C++ wrapper instead of default Python wrapper + import torch._inductor.config as config + config.cpp_wrapper = True :: From 53c4fa865a0f29853d4da8dc9fe6851e17663353 Mon Sep 17 00:00:00 2001 From: leslie-fang-intel Date: Wed, 20 Dec 2023 10:31:22 +0800 Subject: [PATCH 2/2] add benefits of C++ wrapper --- prototype_source/pt2e_quant_ptq_x86_inductor.rst | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/prototype_source/pt2e_quant_ptq_x86_inductor.rst b/prototype_source/pt2e_quant_ptq_x86_inductor.rst index 97e19e87f57..60bd5ffa5a4 100644 --- a/prototype_source/pt2e_quant_ptq_x86_inductor.rst +++ b/prototype_source/pt2e_quant_ptq_x86_inductor.rst @@ -163,8 +163,11 @@ After these steps, we finished running the quantization flow and we will get the ------------------------ After we get the quantized model, we will further lower it to the inductor backend. The default Inductor wrapper -generates Python code to invoke both generated kernels and external kernels. Additionally, Inductor supports a C++ wrapper -that generates pure C++ code, seamlessly combining the generated and external kernels. +generates Python code to invoke both generated kernels and external kernels. Additionally, Inductor supports +C++ wrapper that generates pure C++ code. This allows seamless integration of the generated and external kernels, +effectively reducing Python overhead. In the future, leveraging the C++ wrapper, we can extend the capability +to achieve pure C++ deployment. For more comprehensive details about C++ Wrapper in general, please refer to the +dedicated tutorial on `Inductor C++ Wrapper Tutorial `_. ::