Skip to content

Enable Linear+ReLU fuse by OneDNNL #20

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 28, 2020
Merged

Conversation

zhuhaozhe
Copy link
Contributor

No description provided.

grad_input, grad_weight, grad_bias = core.linear_backward(input, grad_output, weight, output_mask)
return (grad_input, grad_weight, grad_bias)

class DNNLLRFuse(nn.Module):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about DNNLLinearFuseReLU? "LR" sounds vague to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

import math
import _torch_ipex as core

class DNNLFC(Function):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to move this into c++?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Xiaobing is trying this since the Python way can not work in JIT.

if (bias.has_value()) {
at::Tensor bias_vec = bias.value();
const dil::tensor b = dbl::comm::try_gen_dil_tensor(bias_vec);
dil::inner_product_forward::compute(x, w, b, y, true);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true here means enable relu as post-op? Can you add something like /* name_of_arg = */ true here for clarity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@zhuhaozhe zhuhaozhe requested a review from EikanWang May 25, 2020 05:51
import math
import _torch_ipex as core

class dilLinearFuseReluFC(Function):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use LinearFuseRelu directly, not expose dil to front-end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

grad_input, grad_weight, grad_bias = core.linear_backward(input, grad_output, weight, output_mask)
return (grad_input, grad_weight, grad_bias)

class dilLinearFuseRelu(nn.Module):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above Function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (bias.has_value()) {
at::Tensor bias_vec = bias.value();
const dil::tensor b = dbl::comm::try_gen_dil_tensor(bias_vec);
dil::inner_product_forward::compute(x, w, b, y, /*fuse_relu=*/true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please reuse attr parameter, and remove this new parameter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@hongzhen1
Copy link
Contributor

@zhuhaozhe could you add one UT to cover this integration?

Copy link
Contributor

@EikanWang EikanWang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As hongzhen's comments. Please add unit test cases.

@zhuhaozhe
Copy link
Contributor Author

zhuhaozhe commented May 26, 2020

@hongzhen1 @EikanWang I have already added an unit test by following up test_mlp.py. 804110a

@zhuhaozhe
Copy link
Contributor Author

@EikanWang Ok, but this patch will crash based on new commit, I am trying to fix now.

rename some FNs

hide dil from frontend and reuse attr args instead of fuse_relu

remove useless headfiles since relu' function body are moved to DevCPs.cpp

add unit test for linear fuse relu

move ut to test_lazy_reorder
@@ -12,7 +12,7 @@
import sys
import torch
import _torch_ipex as ipex
import intel_pytorch_extension
import intel_pytorch_extension_py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intel_pytorch_extension_py => intel_pytorch_extension

@EikanWang EikanWang merged commit fc686f6 into intel:master May 28, 2020
EikanWang added a commit to EikanWang/intel-extension-for-pytorch that referenced this pull request May 28, 2020
EikanWang added a commit to EikanWang/intel-extension-for-pytorch that referenced this pull request May 28, 2020
EikanWang added a commit that referenced this pull request May 28, 2020
@zhuhaozhe zhuhaozhe deleted the LRfuse branch August 18, 2020 09:27
EikanWang added a commit that referenced this pull request Oct 27, 2021
* Add AVX512 macro in CMake to enable AVX512

* Cannot use the input dil tensor to check is_public_format or not because it out of scope

* Fix build issue of PR #20

* Increase precision tolerance for ut

* Update for new 'oneDNN' GitHub URL (#146)

* Update default IPEX version to 1.2.0

* fallback to cpu for LSTM training with dropout

* Parse pytorch 1.8 registrationdeclarition.h to gen dense operators code and sparse operators code

* git commit -m

* 1. Replace TensorList by c10::List
2. Replace tensor size and stride by SizesAndStrides
TODO:
Needs to workaround the RegXXX.h that the function sig conflicts with NativeFunctions.h

* remove autocast from master

* Pass build for pytorch 1.8
TODO:
Add comments for gen-dense-cpu-ops.py
There might be potential issues for grad copying

* Enhance embedding bag last offset memory copy by using parallelized move_ker

* add UT for int8 LSTM

* add asymmetric quantization

* enable int8 for LSTM

* Port utils for ut from PyTorch 1.8

* Fix the issue that cannot fallback the tensor list wrapped by c10::list

* Enable upsample_bilinear2d to support the scale factor is vector

* Update README to clarify the IPEX version and PyTorch
Update the IPEX version in setup.py to 1.2.0

* enable bf16 layernorm

* Enable native layer norm signature matching

* Pass all the test cases of the committed test file except layer_norm. Because IPEX cannot capture the layer_norm.

* Capture layernorm on python side

* Replace ATen/Tensor.h to ATen/ATen.h to avoid undefined symbol

Conflicts:
	torch_ipex/csrc/utils.h

* Gen sparse operators

* Reorder to publice for slice in case throwing exception

* 1. Support NHWC
2. Remove recorder tensors to reduce pytorch profiler overhead

* 1. dependencies installation; 2. torch wheel file query and packaging; 3. doesn't require git anymore when compiling

* Added tutorial Performance Tuning.md in directory tutorials

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* update test_torch.py and align with common_utils.py

* bug fix in dockerfile (#164)

* Update Dockerfile to include pybind11-dev (#157)

As a fix for issue - #155. As suggested by @jingxu10, adding pybind11-dev allows for a successful build of the Docker container.

* fix pt-1.8's UT

* -	Installation for IPEX-1.8, to remove the recompilation for PT and add the installation for dependency package.
-	Add the supported customized ops & fusion patterns.

* tmp commit

* pass most UT

* modified _C.cpython.xxxx.so's rpath

* fix unexpected keyword argument 'prec' in test_torch.py

* Keep intel_pytorch_extension to ensure backward-compatibility

* fix test_int8.py's regression

* update the version to 1.8.0

* fix runtime undefined reference error caused by libstdc++ Dual ABI

* Updated README.md for v1.8.0

* Updated torch-ccl to fix libfabric.so not found issue

* setup.py: 1. fix include_paths and library_paths missing issue if torch is installed via setup.py. 2. sovled libstdc++ dual abi issue. 3. removed duplicated package importings. torch-ccl: 1. fixed oneCCL library path patching not taking effect issue

* Update README.md

* clean ipex installation folder structure

* clean ipex installation folder structure

* clean ipex installation folder structure

* Add a warning message of deprecation of intel_pytorch_extension

* fix rpath issue to libtorch_ccl.so after hierarchy adjustment

* 1. removed execute bit of libtorch_ipex.so permission 2. upgraded torch-ccl to make libtorch_ccl.so installed to torch_ccl folder

* Pass build for pytorch 1.9.0

* Enable batch_norm operator

* update ipex Dockerfile to use no-patch version (#170)

* update ipex Dockerfile to use no-patch version

* explicit pytorch version

* Exclude the operators that do not run into autograd

* Pass all test cases except test_torch

* Fix the issues
1. LSTM indents error
2. Check batch_normalization

* Fix the issue that the grad of nll_loss input is none

* update build version from 1.8.0.1 to 1.9.0 (along with pytorch version)

* fix dil_cat bug when concating empty tensors with customized shape

* 1. moved python codes out from libtorch_ipex.so to _C.so
2. removed pybind11 as denpendency library from third_party folder
3. changed "import intel_pytorch_extension" to "import torch_ipex" in tests folder, Readme.md, torch_ipex/ops/embeddingbag.py and torch_ipex/launch.py
4. commented "core.enable_torch_ccl()" out in torch_ipex/__init__.py, to avoid the following error when "import torch_ipex"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/jingxu1/dl/pytorch/srcs/venv_test_py38/lib/python3.8/site-packages/torch_ipex/__init__.py", line 14, in <module>
    core.enable_torch_ccl()
RuntimeError: arg(): could not convert default argument into a Python object (type not registered yet?). Compile in debug mode for more information.

* 1. removed torch-ccl 2. added debug info into version.py 3. removed pytorch wheel file binding in debug mode

* updated dockerfile to 1.9.0

* removed core.enable_torch_ccl()

* updated README.md for 1.9.0

* updated README.md for 1.9.0

* updated .gitignore to delete torch_ipex/version.py when performing clean

* V1.8.0 whl release (#171)

* Added wheel file release info to README.md

* Added wheel file release info to README.md

* Exclude flatten.using_ints and cross_entropy_loss because the two operators do not generate backward functions

* Does not capture batch_norm and _batch_norm_impl_index

* Exclude reshape and where

* Exclude nll_loss2d

* added denormal numbers section to performance_tuning.md

* Add installation guide for 1.9.0

* Add installation guide for 1.9.0

* Update README.md

The default IPEX and PyTorch versions are v1.9.0

* added avx512 note

* updated launch.py

* added launcher doc

* added launcher doc

* Add python interface c++ source file

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update LICENSE.txt

* Update README.md

* Remove useless files

* Fix format issue

Co-authored-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>
Co-authored-by: chunyuan-w <chunyuan.wu@intel.com>
Co-authored-by: leslie-fang-intel <leslie.fang@intel.com>
Co-authored-by: Chen, Jian Ping <jian.ping.chen@intel.com>
Co-authored-by: jiayisun <jiayi.sun@intel.com>
Co-authored-by: Jing Xu <jing.xu@intel.com>
Co-authored-by: Zhu, Jewel <jewel.zhu@intel.com>
Co-authored-by: tangleintel <lei1.tang@intel.com>
Co-authored-by: Chaitanya Hazarey <C24IO@users.noreply.github.com>
Co-authored-by: Ashok Emani <ashok.emani@intel.com>
Co-authored-by: Wang, Eikan <root@JF5300-B11A316T.jf.intel.com>
Co-authored-by: jianangu <jianan.gu@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants